Spark SQL# Apache Arrow in PySpark Ensure PyArrow Installed Conversion to/from Arrow Table Enabling for Conversion to/from Pandas Pandas UDFs (a.k.a. Vectorized UDFs) Pandas Function APIs Arrow Python UDFs Usage Notes Vectorized Python User-defined Table Functions (UDTFs) Vectorized Python UDTF Interface Defining the Output Schema Emitting Output Rows Usage Examples TABLE Argument PARTITION BY and ORDER BY Best Practices More Examples Python User-defined Table Functions (UDTFs) Implementing a Python UDTF Defining the Output Schema Emitting Output Rows Registering and Using Python UDTFs in SQL Arrow Optimization UDTF Examples with Scalar Arguments Accepting an Input Table Argument Python Data Source API Overview Simple Example: Data Source with Batch Reader Comprehensive Example: Data Source with Batch and Streaming Readers and Writers Serialization Requirement Using a Python Data Source Python Data Source Reader with direct Arrow Batch support for improved performance Usage Notes Python to Spark Type Conversions Browsing Type Conversions Configuration All Conversions Conversions in Practice - UDFs Conversions in Practice - Creating DataFrames Conversions in Practice - Nested Data Types