
pyspark.sql.functions.array_join(col, delimiter, null_replacement=None)[source]#

Array function: Returns a string column by concatenating the elements of the input array column using the delimiter. Null values within the array can be replaced with a specified string through the null_replacement argument. If null_replacement is not set, null values are ignored.

New in version 2.4.0.

Changed in version 3.4.0: Supports Spark Connect.

colColumn or str

The input column containing the arrays to be joined.


The string to be used as the delimiter when joining the array elements.

null_replacementstr, optional

The string to replace null values within the array. If not set, null values are ignored.


A new column of string type, where each value is the result of joining the corresponding array from the input column.


Example 1: Basic usage of array_join function.

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(["a", "b", "c"],), (["a", "b"],)], ['data'])
>>>, ",")).show()
|array_join(data, ,)|
|              a,b,c|
|                a,b|

Example 2: Usage of array_join function with null_replacement argument.

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(["a", None, "c"],)], ['data'])
>>>, ",", "NULL")).show()
|array_join(data, ,, NULL)|
|                 a,NULL,c|

Example 3: Usage of array_join function without null_replacement argument.

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(["a", None, "c"],)], ['data'])
>>>, ",")).show()
|array_join(data, ,)|
|                a,c|

Example 4: Usage of array_join function with an array that is null.

>>> from pyspark.sql import functions as sf
>>> from pyspark.sql.types import StructType, StructField, ArrayType, StringType
>>> schema = StructType([StructField("data", ArrayType(StringType()), True)])
>>> df = spark.createDataFrame([(None,)], schema)
>>>, ",")).show()
|array_join(data, ,)|
|               NULL|

Example 5: Usage of array_join function with an array containing only null values.

>>> from pyspark.sql import functions as sf
>>> from pyspark.sql.types import StructType, StructField, ArrayType, StringType
>>> schema = StructType([StructField("data", ArrayType(StringType()), True)])
>>> df = spark.createDataFrame([([None, None],)], schema)
>>>, ",", "NULL")).show()
|array_join(data, ,, NULL)|
|                NULL,NULL|