pyspark.sql.functions.array_intersect#
- pyspark.sql.functions.array_intersect(col1, col2)[source]#
Array function: returns a new array containing the intersection of elements in col1 and col2, without duplicates.
New in version 2.4.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- Returns
Column
A new array containing the intersection of elements in col1 and col2.
Notes
This function does not preserve the order of the elements in the input arrays.
Examples
Example 1: Basic usage
>>> from pyspark.sql import Row, functions as sf >>> df = spark.createDataFrame([Row(c1=["b", "a", "c"], c2=["c", "d", "a", "f"])]) >>> df.select(sf.sort_array(sf.array_intersect(df.c1, df.c2))).show() +-----------------------------------------+ |sort_array(array_intersect(c1, c2), true)| +-----------------------------------------+ | [a, c]| +-----------------------------------------+
Example 2: Intersection with no common elements
>>> from pyspark.sql import Row, functions as sf >>> df = spark.createDataFrame([Row(c1=["b", "a", "c"], c2=["d", "e", "f"])]) >>> df.select(sf.array_intersect(df.c1, df.c2)).show() +-----------------------+ |array_intersect(c1, c2)| +-----------------------+ | []| +-----------------------+
Example 3: Intersection with all common elements
>>> from pyspark.sql import Row, functions as sf >>> df = spark.createDataFrame([Row(c1=["a", "b", "c"], c2=["a", "b", "c"])]) >>> df.select(sf.sort_array(sf.array_intersect(df.c1, df.c2))).show() +-----------------------------------------+ |sort_array(array_intersect(c1, c2), true)| +-----------------------------------------+ | [a, b, c]| +-----------------------------------------+
Example 4: Intersection with null values
>>> from pyspark.sql import Row, functions as sf >>> df = spark.createDataFrame([Row(c1=["a", "b", None], c2=["a", None, "c"])]) >>> df.select(sf.sort_array(sf.array_intersect(df.c1, df.c2))).show() +-----------------------------------------+ |sort_array(array_intersect(c1, c2), true)| +-----------------------------------------+ | [NULL, a]| +-----------------------------------------+
Example 5: Intersection with empty arrays
>>> from pyspark.sql import Row, functions as sf >>> from pyspark.sql.types import ArrayType, StringType, StructField, StructType >>> data = [Row(c1=[], c2=["a", "b", "c"])] >>> schema = StructType([ ... StructField("c1", ArrayType(StringType()), True), ... StructField("c2", ArrayType(StringType()), True) ... ]) >>> df = spark.createDataFrame(data, schema) >>> df.select(sf.array_intersect(df.c1, df.c2)).show() +-----------------------+ |array_intersect(c1, c2)| +-----------------------+ | []| +-----------------------+