Vector Functions¶
This page lists all vector functions available in Spark SQL.
vector_avg¶
vector_avg(array) - Returns the element-wise mean of float vectors in a group. All vectors must have the same dimension.
Examples:
> SELECT vector_avg(col) FROM VALUES (array(1.0F, 2.0F)), (array(3.0F, 4.0F)) AS tab(col);
[2.0,3.0]
Since: 4.2.0
vector_cosine_similarity¶
vector_cosine_similarity(array1, array2) - Returns the cosine similarity between two float vectors. The vectors must have the same dimension.
Examples:
> SELECT vector_cosine_similarity(array(1.0F, 2.0F, 3.0F), array(4.0F, 5.0F, 6.0F));
0.9746319
Since: 4.2.0
vector_inner_product¶
vector_inner_product(array1, array2) - Returns the inner product (dot product) between two float vectors. The vectors must have the same dimension.
Examples:
> SELECT vector_inner_product(array(1.0F, 2.0F, 3.0F), array(4.0F, 5.0F, 6.0F));
32.0
Since: 4.2.0
vector_l2_distance¶
vector_l2_distance(array1, array2) - Returns the Euclidean (L2) distance between two float vectors. The vectors must have the same dimension.
Examples:
> SELECT vector_l2_distance(array(1.0F, 2.0F, 3.0F), array(4.0F, 5.0F, 6.0F));
5.196152
Since: 4.2.0
vector_norm¶
vector_norm(vector, degree) - Returns the Lp norm of a float vector using the specified degree. Degree defaults to 2.0 (Euclidean norm) if unspecified. Supported values: 1.0 (L1 norm), 2.0 (L2 norm), float('inf') (infinity norm).
Examples:
> SELECT vector_norm(array(3.0F, 4.0F), 2.0F);
5.0
> SELECT vector_norm(array(3.0F, 4.0F), 1.0F);
7.0
> SELECT vector_norm(array(3.0F, 4.0F), float('inf'));
4.0
Since: 4.2.0
vector_normalize¶
vector_normalize(vector, degree) - Normalizes a float vector to unit length using the specified norm degree. Degree defaults to 2.0 (Euclidean norm) if unspecified. Supported values: 1.0 (L1 norm), 2.0 (L2 norm), float('inf') (infinity norm).
Examples:
> SELECT vector_normalize(array(3.0F, 4.0F), 2.0F);
[0.6,0.8]
> SELECT vector_normalize(array(3.0F, 4.0F), 1.0F);
[0.42857143,0.5714286]
> SELECT vector_normalize(array(3.0F, 4.0F), float('inf'));
[0.75,1.0]
Since: 4.2.0
vector_sum¶
vector_sum(array) - Returns the element-wise sum of float vectors in a group. All vectors must have the same dimension.
Examples:
> SELECT vector_sum(col) FROM VALUES (array(1.0F, 2.0F)), (array(3.0F, 4.0F)) AS tab(col);
[4.0,6.0]
Since: 4.2.0