Array Functions¶
This page lists all array functions available in Spark SQL.
array¶
array(expr, ...) - Returns an array with the given elements.
Examples:
> SELECT array(1, 2, 3);
[1,2,3]
Since: 1.1.0
array_append¶
array_append(array, element) - Add the element at the end of the array passed as first argument. Type of element should be similar to type of the elements of the array. Null element is also appended into the array. But if the array passed, is NULL output is NULL
Examples:
> SELECT array_append(array('b', 'd', 'c', 'a'), 'd');
["b","d","c","a","d"]
> SELECT array_append(array(1, 2, 3, null), null);
[1,2,3,null,null]
> SELECT array_append(CAST(null as Array<Int>), 2);
NULL
Since: 3.4.0
array_compact¶
array_compact(array) - Removes null values from the array.
Examples:
> SELECT array_compact(array(1, 2, 3, null));
[1,2,3]
> SELECT array_compact(array("a", "b", "c"));
["a","b","c"]
Since: 3.4.0
array_contains¶
array_contains(array, value) - Returns true if the array contains the value.
Examples:
> SELECT array_contains(array(1, 2, 3), 2);
true
Since: 1.5.0
array_distinct¶
array_distinct(array) - Removes duplicate values from the array.
Examples:
> SELECT array_distinct(array(1, 2, 3, null, 3));
[1,2,3,null]
Since: 2.4.0
array_except¶
array_except(array1, array2) - Returns an array of the elements in array1 but not in array2, without duplicates.
Examples:
> SELECT array_except(array(1, 2, 3), array(1, 3, 5));
[2]
Since: 2.4.0
array_insert¶
array_insert(x, pos, val) - Places val into index pos of array x. Array indices start at 1. The maximum negative index is -1 for which the function inserts new element after the current last element. Index above array size appends the array, or prepends the array if index is negative, with 'null' elements.
Examples:
> SELECT array_insert(array(1, 2, 3, 4), 5, 5);
[1,2,3,4,5]
> SELECT array_insert(array(5, 4, 3, 2), -1, 1);
[5,4,3,2,1]
> SELECT array_insert(array(5, 3, 2, 1), -4, 4);
[5,4,3,2,1]
Since: 3.4.0
array_intersect¶
array_intersect(array1, array2) - Returns an array of the elements in the intersection of array1 and array2, without duplicates.
Examples:
> SELECT array_intersect(array(1, 2, 3), array(1, 3, 5));
[1,3]
Since: 2.4.0
array_join¶
array_join(array, delimiter[, nullReplacement]) - Concatenates the elements of the given array using the delimiter and an optional string to replace nulls. If no value is set for nullReplacement, any null value is filtered.
Examples:
> SELECT array_join(array('hello', 'world'), ' ');
hello world
> SELECT array_join(array('hello', null ,'world'), ' ');
hello world
> SELECT array_join(array('hello', null ,'world'), ' ', ',');
hello , world
Since: 2.4.0
array_max¶
array_max(array) - Returns the maximum value in the array. NaN is greater than any non-NaN elements for double/float type. NULL elements are skipped.
Examples:
> SELECT array_max(array(1, 20, null, 3));
20
Since: 2.4.0
array_min¶
array_min(array) - Returns the minimum value in the array. NaN is greater than any non-NaN elements for double/float type. NULL elements are skipped.
Examples:
> SELECT array_min(array(1, 20, null, 3));
1
Since: 2.4.0
array_position¶
array_position(array, element) - Returns the (1-based) index of the first matching element of the array as long, or 0 if no match is found.
Examples:
> SELECT array_position(array(312, 773, 708, 708), 708);
3
> SELECT array_position(array(312, 773, 708, 708), 414);
0
Since: 2.4.0
array_prepend¶
array_prepend(array, element) - Add the element at the beginning of the array passed as first argument. Type of element should be the same as the type of the elements of the array. Null element is also prepended to the array. But if the array passed is NULL output is NULL
Examples:
> SELECT array_prepend(array('b', 'd', 'c', 'a'), 'd');
["d","b","d","c","a"]
> SELECT array_prepend(array(1, 2, 3, null), null);
[null,1,2,3,null]
> SELECT array_prepend(CAST(null as Array<Int>), 2);
NULL
Since: 3.5.0
array_remove¶
array_remove(array, element) - Remove all elements that equal to element from array.
Examples:
> SELECT array_remove(array(1, 2, 3, null, 3), 3);
[1,2,null]
Since: 2.4.0
array_repeat¶
array_repeat(element, count) - Returns the array containing element count times.
Examples:
> SELECT array_repeat('123', 2);
["123","123"]
Since: 2.4.0
array_size¶
array_size(expr) - Returns the size of an array. The function returns null for null input.
Examples:
> SELECT array_size(array('b', 'd', 'c', 'a'));
4
Since: 3.3.0
array_union¶
array_union(array1, array2) - Returns an array of the elements in the union of array1 and array2, without duplicates.
Examples:
> SELECT array_union(array(1, 2, 3), array(1, 3, 5));
[1,2,3,5]
Since: 2.4.0
arrays_overlap¶
arrays_overlap(a1, a2) - Returns true if a1 contains at least a non-null element present also in a2. If the arrays have no common element and they are both non-empty and either of them contains a null element null is returned, false otherwise.
Examples:
> SELECT arrays_overlap(array(1, 2, 3), array(3, 4, 5));
true
Since: 2.4.0
arrays_zip¶
arrays_zip(a1, a2, ...) - Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.
Examples:
> SELECT arrays_zip(array(1, 2, 3), array(2, 3, 4));
[{"0":1,"1":2},{"0":2,"1":3},{"0":3,"1":4}]
> SELECT arrays_zip(array(1, 2), array(2, 3), array(3, 4));
[{"0":1,"1":2,"2":3},{"0":2,"1":3,"2":4}]
Since: 2.4.0
flatten¶
flatten(arrayOfArrays) - Transforms an array of arrays into a single array.
Examples:
> SELECT flatten(array(array(1, 2), array(3, 4)));
[1,2,3,4]
Since: 2.4.0
get¶
get(array, index) - Returns element of array at given (0-based) index. If the index points outside of the array boundaries, then this function returns NULL.
Examples:
> SELECT get(array(1, 2, 3), 0);
1
> SELECT get(array(1, 2, 3), 3);
NULL
> SELECT get(array(1, 2, 3), -1);
NULL
Since: 3.4.0
sequence¶
sequence(start, stop, step) - Generates an array of elements from start to stop (inclusive), incrementing by step. The type of the returned elements is the same as the type of argument expressions.
Supported types are: byte, short, integer, long, date, timestamp.
The start and stop expressions must resolve to the same type. If start and stop expressions resolve to the 'date' or 'timestamp' type then the step expression must resolve to the 'interval' or 'year-month interval' or 'day-time interval' type, otherwise to the same type as the start and stop expressions.
Arguments:
- start - an expression. The start of the range.
- stop - an expression. The end the range (inclusive).
- step - an optional expression. The step of the range. By default step is 1 if start is less than or equal to stop, otherwise -1. For the temporal sequences it's 1 day and -1 day respectively. If start is greater than stop then the step must be negative, and vice versa.
Examples:
> SELECT sequence(1, 5);
[1,2,3,4,5]
> SELECT sequence(5, 1);
[5,4,3,2,1]
> SELECT sequence(to_date('2018-01-01'), to_date('2018-03-01'), interval 1 month);
[2018-01-01,2018-02-01,2018-03-01]
> SELECT sequence(to_date('2018-01-01'), to_date('2018-03-01'), interval '0-1' year to month);
[2018-01-01,2018-02-01,2018-03-01]
Since: 2.4.0
shuffle¶
shuffle(array) - Returns a random permutation of the given array.
Examples:
> SELECT shuffle(array(1, 20, 3, 5));
[3,1,5,20]
> SELECT shuffle(array(1, 20, null, 3));
[20,null,3,1]
Note:
The function is non-deterministic.
Since: 2.4.0
slice¶
slice(x, start, length) - Subsets array x starting from index start (array indices start at 1, or starting from the end if start is negative) with the specified length.
Examples:
> SELECT slice(array(1, 2, 3, 4), 2, 2);
[2,3]
> SELECT slice(array(1, 2, 3, 4), -2, 2);
[3,4]
Since: 2.4.0
sort_array¶
sort_array(array[, ascendingOrder]) - Sorts the input array in ascending or descending order according to the natural ordering of the array elements. NaN is greater than any non-NaN elements for double/float type. Null elements will be placed at the beginning of the returned array in ascending order or at the end of the returned array in descending order.
Examples:
> SELECT sort_array(array('b', 'd', null, 'c', 'a'), true);
[null,"a","b","c","d"]
> SELECT sort_array(array('b', 'd', null, 'c', 'a'), false);
["d","c","b","a",null]
Since: 1.5.0