Misc Functions¶
This page lists all misc functions available in Spark SQL.
aes_decrypt¶
aes_decrypt(expr, key[, mode[, padding[, aad]]]) - Returns a decrypted value of expr using AES in mode with padding.
Key lengths of 16, 24 and 32 bits are supported. Supported combinations of (mode, padding) are ('ECB', 'PKCS'), ('GCM', 'NONE') and ('CBC', 'PKCS').
Optional additional authenticated data (AAD) is only supported for GCM. If provided for encryption, the identical AAD value must be provided for decryption.
The default mode is GCM.
Arguments:
- expr - The binary value to decrypt.
- key - The passphrase to use to decrypt the data.
- mode - Specifies which block cipher mode should be used to decrypt messages. Valid modes: ECB, GCM, CBC.
- padding - Specifies how to pad messages whose length is not a multiple of the block size. Valid values: PKCS, NONE, DEFAULT. The DEFAULT padding means PKCS for ECB, NONE for GCM and PKCS for CBC.
- aad - Optional additional authenticated data. Only supported for GCM mode. This can be any free-form input and must be provided for both encryption and decryption.
Examples:
> SELECT aes_decrypt(unhex('83F16B2AA704794132802D248E6BFD4E380078182D1544813898AC97E709B28A94'), '0000111122223333');
Spark
> SELECT aes_decrypt(unhex('6E7CA17BBB468D3084B5744BCA729FB7B2B7BCB8E4472847D02670489D95FA97DBBA7D3210'), '0000111122223333', 'GCM');
Spark SQL
> SELECT aes_decrypt(unbase64('3lmwu+Mw0H3fi5NDvcu9lg=='), '1234567890abcdef', 'ECB', 'PKCS');
Spark SQL
> SELECT aes_decrypt(unbase64('2NYmDCjgXTbbxGA3/SnJEfFC/JQ7olk2VQWReIAAFKo='), '1234567890abcdef', 'CBC');
Apache Spark
> SELECT aes_decrypt(unbase64('AAAAAAAAAAAAAAAAAAAAAPSd4mWyMZ5mhvjiAPQJnfg='), 'abcdefghijklmnop12345678ABCDEFGH', 'CBC', 'DEFAULT');
Spark
> SELECT aes_decrypt(unbase64('AAAAAAAAAAAAAAAAQiYi+sTLm7KD9UcZ2nlRdYDe/PX4'), 'abcdefghijklmnop12345678ABCDEFGH', 'GCM', 'DEFAULT', 'This is an AAD mixed into the input');
Spark
Since: 3.3.0
aes_encrypt¶
aes_encrypt(expr, key[, mode[, padding[, iv[, aad]]]]) - Returns an encrypted value of expr using AES in given mode with the specified padding.
Key lengths of 16, 24 and 32 bits are supported. Supported combinations of (mode, padding) are ('ECB', 'PKCS'), ('GCM', 'NONE') and ('CBC', 'PKCS').
Optional initialization vectors (IVs) are only supported for CBC and GCM modes. These must be 16 bytes for CBC and 12 bytes for GCM. If not provided, a random vector will be generated and prepended to the output.
Optional additional authenticated data (AAD) is only supported for GCM. If provided for encryption, the identical AAD value must be provided for decryption.
The default mode is GCM.
Arguments:
- expr - The binary value to encrypt.
- key - The passphrase to use to encrypt the data.
- mode - Specifies which block cipher mode should be used to encrypt messages. Valid modes: ECB, GCM, CBC.
- padding - Specifies how to pad messages whose length is not a multiple of the block size. Valid values: PKCS, NONE, DEFAULT. The DEFAULT padding means PKCS for ECB, NONE for GCM and PKCS for CBC.
- iv - Optional initialization vector. Only supported for CBC and GCM modes. Valid values: None or ''. 16-byte array for CBC mode. 12-byte array for GCM mode.
- aad - Optional additional authenticated data. Only supported for GCM mode. This can be any free-form input and must be provided for both encryption and decryption.
Examples:
> SELECT hex(aes_encrypt('Spark', '0000111122223333'));
83F16B2AA704794132802D248E6BFD4E380078182D1544813898AC97E709B28A94
> SELECT hex(aes_encrypt('Spark SQL', '0000111122223333', 'GCM'));
6E7CA17BBB468D3084B5744BCA729FB7B2B7BCB8E4472847D02670489D95FA97DBBA7D3210
> SELECT base64(aes_encrypt('Spark SQL', '1234567890abcdef', 'ECB', 'PKCS'));
3lmwu+Mw0H3fi5NDvcu9lg==
> SELECT base64(aes_encrypt('Apache Spark', '1234567890abcdef', 'CBC', 'DEFAULT'));
2NYmDCjgXTbbxGA3/SnJEfFC/JQ7olk2VQWReIAAFKo=
> SELECT base64(aes_encrypt('Spark', 'abcdefghijklmnop12345678ABCDEFGH', 'CBC', 'DEFAULT', unhex('00000000000000000000000000000000')));
AAAAAAAAAAAAAAAAAAAAAPSd4mWyMZ5mhvjiAPQJnfg=
> SELECT base64(aes_encrypt('Spark', 'abcdefghijklmnop12345678ABCDEFGH', 'GCM', 'DEFAULT', unhex('000000000000000000000000'), 'This is an AAD mixed into the input'));
AAAAAAAAAAAAAAAAQiYi+sTLm7KD9UcZ2nlRdYDe/PX4
Since: 3.3.0
assert_true¶
assert_true(expr [, message]) - Throws an exception if expr is not true.
Examples:
> SELECT assert_true(0 < 1);
NULL
Since: 2.0.0
bitmap_bit_position¶
bitmap_bit_position(child) - Returns the bit position for the given input child expression.
Examples:
> SELECT bitmap_bit_position(1);
0
> SELECT bitmap_bit_position(123);
122
Since: 3.5.0
bitmap_bucket_number¶
bitmap_bucket_number(child) - Returns the bucket number for the given input child expression.
Examples:
> SELECT bitmap_bucket_number(123);
1
> SELECT bitmap_bucket_number(0);
0
Since: 3.5.0
bitmap_count¶
bitmap_count(child) - Returns the number of set bits in the child bitmap.
Examples:
> SELECT bitmap_count(X '1010');
2
> SELECT bitmap_count(X 'FFFF');
16
> SELECT bitmap_count(X '0');
0
Since: 3.5.0
current_catalog¶
current_catalog() - Returns the current catalog.
Examples:
> SELECT current_catalog();
spark_catalog
Since: 3.1.0
current_database¶
current_database() - Returns the current database.
Examples:
> SELECT current_database();
default
Since: 1.6.0
current_schema¶
current_schema() - Returns the current database.
Examples:
> SELECT current_schema();
default
Since: 3.4.0
current_user¶
current_user() - user name of current execution context.
Examples:
> SELECT current_user();
mockingjay
Since: 3.2.0
input_file_block_length¶
input_file_block_length() - Returns the length of the block being read, or -1 if not available.
Examples:
> SELECT input_file_block_length();
-1
Since: 2.2.0
input_file_block_start¶
input_file_block_start() - Returns the start offset of the block being read, or -1 if not available.
Examples:
> SELECT input_file_block_start();
-1
Since: 2.2.0
input_file_name¶
input_file_name() - Returns the name of the file being read, or empty string if not available.
Examples:
> SELECT input_file_name();
Since: 1.5.0
java_method¶
java_method(class, method[, arg1[, arg2 ..]]) - Calls a method with reflection.
Examples:
> SELECT java_method('java.util.UUID', 'randomUUID');
c33fb387-8500-4bfa-81d2-6e0e3e930df2
> SELECT java_method('java.util.UUID', 'fromString', 'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2');
a5cf6c42-0c85-418f-af6c-3e4e5b1328f2
Since: 2.0.0
monotonically_increasing_id¶
monotonically_increasing_id() - Returns monotonically increasing 64-bit integers. The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the lower 33 bits represent the record number within each partition. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records. The function is non-deterministic because its result depends on partition IDs.
Examples:
> SELECT monotonically_increasing_id();
0
Since: 1.4.0
raise_error¶
raise_error( expr ) - Throws a USER_RAISED_EXCEPTION with expr as message.
Examples:
> SELECT raise_error('custom error message');
[USER_RAISED_EXCEPTION] custom error message
Since: 3.1.0
reflect¶
reflect(class, method[, arg1[, arg2 ..]]) - Calls a method with reflection.
Examples:
> SELECT reflect('java.util.UUID', 'randomUUID');
c33fb387-8500-4bfa-81d2-6e0e3e930df2
> SELECT reflect('java.util.UUID', 'fromString', 'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2');
a5cf6c42-0c85-418f-af6c-3e4e5b1328f2
Since: 2.0.0
session_user¶
session_user() - user name of current execution context.
Examples:
> SELECT session_user();
mockingjay
Since: 4.0.0
spark_partition_id¶
spark_partition_id() - Returns the current partition id.
Examples:
> SELECT spark_partition_id();
0
Since: 1.4.0
try_aes_decrypt¶
try_aes_decrypt(expr, key[, mode[, padding[, aad]]]) - This is a special version of aes_decrypt that performs the same operation, but returns a NULL value instead of raising an error if the decryption cannot be performed.
Examples:
> SELECT try_aes_decrypt(unhex('6E7CA17BBB468D3084B5744BCA729FB7B2B7BCB8E4472847D02670489D95FA97DBBA7D3210'), '0000111122223333', 'GCM');
Spark SQL
> SELECT try_aes_decrypt(unhex('----------468D3084B5744BCA729FB7B2B7BCB8E4472847D02670489D95FA97DBBA7D3210'), '0000111122223333', 'GCM');
NULL
Since: 3.5.0
try_reflect¶
try_reflect(class, method[, arg1[, arg2 ..]]) - This is a special version of reflect that performs the same operation, but returns a NULL value instead of raising an error if the invoke method thrown exception.
Examples:
> SELECT try_reflect('java.util.UUID', 'randomUUID');
c33fb387-8500-4bfa-81d2-6e0e3e930df2
> SELECT try_reflect('java.util.UUID', 'fromString', 'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2');
a5cf6c42-0c85-418f-af6c-3e4e5b1328f2
> SELECT try_reflect('java.net.URLDecoder', 'decode', '%');
NULL
Since: 4.0.0
typeof¶
typeof(expr) - Return DDL-formatted type string for the data type of the input.
Examples:
> SELECT typeof(1);
int
> SELECT typeof(array(1));
array<int>
Since: 3.0.0
user¶
user() - user name of current execution context.
Examples:
> SELECT user();
mockingjay
Since: 3.4.0
uuid¶
uuid() - Returns an universally unique identifier (UUID) string. The value is returned as a canonical UUID 36-character string.
Examples:
> SELECT uuid();
46707d92-02f4-4817-8116-a4c3b23e6266
Note:
The function is non-deterministic.
Since: 2.3.0
version¶
version() - Returns the Spark version. The string contains 2 fields, the first being a release version and the second being a git revision.
Examples:
> SELECT version();
3.1.0 a6d6ea3efedbad14d99c24143834cd4e2e52fb40
Since: 3.0.0