Array Functions

This page lists all array functions available in Spark SQL.


array

array(expr, ...) - Returns an array with the given elements.

Examples:

> SELECT array(1, 2, 3);
 [1,2,3]

Since: 1.1.0


array_append

array_append(array, element) - Add the element at the end of the array passed as first argument. Type of element should be similar to type of the elements of the array. Null element is also appended into the array. But if the array passed, is NULL output is NULL

Examples:

> SELECT array_append(array('b', 'd', 'c', 'a'), 'd');
 ["b","d","c","a","d"]
> SELECT array_append(array(1, 2, 3, null), null);
 [1,2,3,null,null]
> SELECT array_append(CAST(null as Array<Int>), 2);
 NULL

Since: 3.4.0


array_compact

array_compact(array) - Removes null values from the array.

Examples:

> SELECT array_compact(array(1, 2, 3, null));
 [1,2,3]
> SELECT array_compact(array("a", "b", "c"));
 ["a","b","c"]

Since: 3.4.0


array_contains

array_contains(array, value) - Returns true if the array contains the value.

Examples:

> SELECT array_contains(array(1, 2, 3), 2);
 true

Since: 1.5.0


array_distinct

array_distinct(array) - Removes duplicate values from the array.

Examples:

> SELECT array_distinct(array(1, 2, 3, null, 3));
 [1,2,3,null]

Since: 2.4.0


array_except

array_except(array1, array2) - Returns an array of the elements in array1 but not in array2, without duplicates.

Examples:

> SELECT array_except(array(1, 2, 3), array(1, 3, 5));
 [2]

Since: 2.4.0


array_insert

array_insert(x, pos, val) - Places val into index pos of array x. Array indices start at 1. The maximum negative index is -1 for which the function inserts new element after the current last element. Index above array size appends the array, or prepends the array if index is negative, with 'null' elements.

Examples:

> SELECT array_insert(array(1, 2, 3, 4), 5, 5);
 [1,2,3,4,5]
> SELECT array_insert(array(5, 4, 3, 2), -1, 1);
 [5,4,3,2,1]
> SELECT array_insert(array(5, 3, 2, 1), -4, 4);
 [5,4,3,2,1]

Since: 3.4.0


array_intersect

array_intersect(array1, array2) - Returns an array of the elements in the intersection of array1 and array2, without duplicates.

Examples:

> SELECT array_intersect(array(1, 2, 3), array(1, 3, 5));
 [1,3]

Since: 2.4.0


array_join

array_join(array, delimiter[, nullReplacement]) - Concatenates the elements of the given array using the delimiter and an optional string to replace nulls. If no value is set for nullReplacement, any null value is filtered.

Examples:

> SELECT array_join(array('hello', 'world'), ' ');
 hello world
> SELECT array_join(array('hello', null ,'world'), ' ');
 hello world
> SELECT array_join(array('hello', null ,'world'), ' ', ',');
 hello , world

Since: 2.4.0


array_max

array_max(array) - Returns the maximum value in the array. NaN is greater than any non-NaN elements for double/float type. NULL elements are skipped.

Examples:

> SELECT array_max(array(1, 20, null, 3));
 20

Since: 2.4.0


array_min

array_min(array) - Returns the minimum value in the array. NaN is greater than any non-NaN elements for double/float type. NULL elements are skipped.

Examples:

> SELECT array_min(array(1, 20, null, 3));
 1

Since: 2.4.0


array_position

array_position(array, element) - Returns the (1-based) index of the first matching element of the array as long, or 0 if no match is found.

Examples:

> SELECT array_position(array(312, 773, 708, 708), 708);
 3
> SELECT array_position(array(312, 773, 708, 708), 414);
 0

Since: 2.4.0


array_prepend

array_prepend(array, element) - Add the element at the beginning of the array passed as first argument. Type of element should be the same as the type of the elements of the array. Null element is also prepended to the array. But if the array passed is NULL output is NULL

Examples:

> SELECT array_prepend(array('b', 'd', 'c', 'a'), 'd');
 ["d","b","d","c","a"]
> SELECT array_prepend(array(1, 2, 3, null), null);
 [null,1,2,3,null]
> SELECT array_prepend(CAST(null as Array<Int>), 2);
 NULL

Since: 3.5.0


array_remove

array_remove(array, element) - Remove all elements that equal to element from array.

Examples:

> SELECT array_remove(array(1, 2, 3, null, 3), 3);
 [1,2,null]

Since: 2.4.0


array_repeat

array_repeat(element, count) - Returns the array containing element count times.

Examples:

> SELECT array_repeat('123', 2);
 ["123","123"]

Since: 2.4.0


array_size

array_size(expr) - Returns the size of an array. The function returns null for null input.

Examples:

> SELECT array_size(array('b', 'd', 'c', 'a'));
 4

Since: 3.3.0


array_union

array_union(array1, array2) - Returns an array of the elements in the union of array1 and array2, without duplicates.

Examples:

> SELECT array_union(array(1, 2, 3), array(1, 3, 5));
 [1,2,3,5]

Since: 2.4.0


arrays_overlap

arrays_overlap(a1, a2) - Returns true if a1 contains at least a non-null element present also in a2. If the arrays have no common element and they are both non-empty and either of them contains a null element null is returned, false otherwise.

Examples:

> SELECT arrays_overlap(array(1, 2, 3), array(3, 4, 5));
 true

Since: 2.4.0


arrays_zip

arrays_zip(a1, a2, ...) - Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.

Examples:

> SELECT arrays_zip(array(1, 2, 3), array(2, 3, 4));
 [{"0":1,"1":2},{"0":2,"1":3},{"0":3,"1":4}]
> SELECT arrays_zip(array(1, 2), array(2, 3), array(3, 4));
 [{"0":1,"1":2,"2":3},{"0":2,"1":3,"2":4}]

Since: 2.4.0


flatten

flatten(arrayOfArrays) - Transforms an array of arrays into a single array.

Examples:

> SELECT flatten(array(array(1, 2), array(3, 4)));
 [1,2,3,4]

Since: 2.4.0


get

get(array, index) - Returns element of array at given (0-based) index. If the index points outside of the array boundaries, then this function returns NULL.

Examples:

> SELECT get(array(1, 2, 3), 0);
 1
> SELECT get(array(1, 2, 3), 3);
 NULL
> SELECT get(array(1, 2, 3), -1);
 NULL

Since: 3.4.0


sequence

sequence(start, stop, step) - Generates an array of elements from start to stop (inclusive), incrementing by step. The type of the returned elements is the same as the type of argument expressions.

Supported types are: byte, short, integer, long, date, timestamp.

The start and stop expressions must resolve to the same type. If start and stop expressions resolve to the 'date' or 'timestamp' type then the step expression must resolve to the 'interval' or 'year-month interval' or 'day-time interval' type, otherwise to the same type as the start and stop expressions.

Arguments:

  • start - an expression. The start of the range.
  • stop - an expression. The end the range (inclusive).
  • step - an optional expression. The step of the range. By default step is 1 if start is less than or equal to stop, otherwise -1. For the temporal sequences it's 1 day and -1 day respectively. If start is greater than stop then the step must be negative, and vice versa.

Examples:

> SELECT sequence(1, 5);
 [1,2,3,4,5]
> SELECT sequence(5, 1);
 [5,4,3,2,1]
> SELECT sequence(to_date('2018-01-01'), to_date('2018-03-01'), interval 1 month);
 [2018-01-01,2018-02-01,2018-03-01]
> SELECT sequence(to_date('2018-01-01'), to_date('2018-03-01'), interval '0-1' year to month);
 [2018-01-01,2018-02-01,2018-03-01]

Since: 2.4.0


shuffle

shuffle(array) - Returns a random permutation of the given array.

Examples:

> SELECT shuffle(array(1, 20, 3, 5));
 [3,1,5,20]
> SELECT shuffle(array(1, 20, null, 3));
 [20,null,3,1]

Note:

The function is non-deterministic.

Since: 2.4.0


slice

slice(x, start, length) - Subsets array x starting from index start (array indices start at 1, or starting from the end if start is negative) with the specified length.

Examples:

> SELECT slice(array(1, 2, 3, 4), 2, 2);
 [2,3]
> SELECT slice(array(1, 2, 3, 4), -2, 2);
 [3,4]

Since: 2.4.0


sort_array

sort_array(array[, ascendingOrder]) - Sorts the input array in ascending or descending order according to the natural ordering of the array elements. NaN is greater than any non-NaN elements for double/float type. Null elements will be placed at the beginning of the returned array in ascending order or at the end of the returned array in descending order.

Examples:

> SELECT sort_array(array('b', 'd', null, 'c', 'a'), true);
 [null,"a","b","c","d"]
> SELECT sort_array(array('b', 'd', null, 'c', 'a'), false);
 ["d","c","b","a",null]

Since: 1.5.0