pyspark.sql.functions.vector_cosine_similarity#
- pyspark.sql.functions.vector_cosine_similarity(left, right)[source]#
Returns the cosine similarity between two float vectors. The vectors must have the same dimension.
New in version 4.3.0.
- Parameters
- Returns
Columncosine similarity as a float value.
Examples
>>> from pyspark.sql import functions as sf >>> from pyspark.sql.types import ArrayType, FloatType, StructType, StructField >>> schema = StructType([StructField('a', ArrayType(FloatType())), StructField('b', ArrayType(FloatType()))]) >>> df = spark.createDataFrame([([1.0, 2.0, 3.0], [4.0, 5.0, 6.0])], schema) >>> df.select(sf.vector_cosine_similarity('a', 'b')).first()[0] 0.974631...