pyspark.sql.functions.vector_cosine_similarity#

pyspark.sql.functions.vector_cosine_similarity(left, right)[source]#

Returns the cosine similarity between two float vectors. The vectors must have the same dimension.

New in version 4.3.0.

Parameters
leftColumn or column name

first vector column.

rightColumn or column name

second vector column.

Returns
Column

cosine similarity as a float value.

Examples

>>> from pyspark.sql import functions as sf
>>> from pyspark.sql.types import ArrayType, FloatType, StructType, StructField
>>> schema = StructType([StructField('a', ArrayType(FloatType())), StructField('b', ArrayType(FloatType()))])
>>> df = spark.createDataFrame([([1.0, 2.0, 3.0], [4.0, 5.0, 6.0])], schema)
>>> df.select(sf.vector_cosine_similarity('a', 'b')).first()[0]
0.974631...