Vector Functions

This page lists all vector functions available in Spark SQL.


vector_avg

vector_avg(array) - Returns the element-wise mean of float vectors in a group. All vectors must have the same dimension.

Examples:

> SELECT vector_avg(col) FROM VALUES (array(1.0F, 2.0F)), (array(3.0F, 4.0F)) AS tab(col);
 [2.0,3.0]

Since: 4.2.0


vector_cosine_similarity

vector_cosine_similarity(array1, array2) - Returns the cosine similarity between two float vectors. The vectors must have the same dimension.

Examples:

> SELECT vector_cosine_similarity(array(1.0F, 2.0F, 3.0F), array(4.0F, 5.0F, 6.0F));
 0.9746319

Since: 4.2.0


vector_inner_product

vector_inner_product(array1, array2) - Returns the inner product (dot product) between two float vectors. The vectors must have the same dimension.

Examples:

> SELECT vector_inner_product(array(1.0F, 2.0F, 3.0F), array(4.0F, 5.0F, 6.0F));
 32.0

Since: 4.2.0


vector_l2_distance

vector_l2_distance(array1, array2) - Returns the Euclidean (L2) distance between two float vectors. The vectors must have the same dimension.

Examples:

> SELECT vector_l2_distance(array(1.0F, 2.0F, 3.0F), array(4.0F, 5.0F, 6.0F));
 5.196152

Since: 4.2.0


vector_norm

vector_norm(vector, degree) - Returns the Lp norm of a float vector using the specified degree. Degree defaults to 2.0 (Euclidean norm) if unspecified. Supported values: 1.0 (L1 norm), 2.0 (L2 norm), float('inf') (infinity norm).

Examples:

> SELECT vector_norm(array(3.0F, 4.0F), 2.0F);
 5.0
> SELECT vector_norm(array(3.0F, 4.0F), 1.0F);
 7.0
> SELECT vector_norm(array(3.0F, 4.0F), float('inf'));
 4.0

Since: 4.2.0


vector_normalize

vector_normalize(vector, degree) - Normalizes a float vector to unit length using the specified norm degree. Degree defaults to 2.0 (Euclidean norm) if unspecified. Supported values: 1.0 (L1 norm), 2.0 (L2 norm), float('inf') (infinity norm).

Examples:

> SELECT vector_normalize(array(3.0F, 4.0F), 2.0F);
 [0.6,0.8]
> SELECT vector_normalize(array(3.0F, 4.0F), 1.0F);
 [0.42857143,0.5714286]
> SELECT vector_normalize(array(3.0F, 4.0F), float('inf'));
 [0.75,1.0]

Since: 4.2.0


vector_sum

vector_sum(array) - Returns the element-wise sum of float vectors in a group. All vectors must have the same dimension.

Examples:

> SELECT vector_sum(col) FROM VALUES (array(1.0F, 2.0F)), (array(3.0F, 4.0F)) AS tab(col);
 [4.0,6.0]

Since: 4.2.0