Misc Functions

This page lists all misc functions available in Spark SQL.


aes_decrypt

aes_decrypt(expr, key[, mode[, padding[, aad]]]) - Returns a decrypted value of expr using AES in mode with padding. Key lengths of 16, 24 and 32 bits are supported. Supported combinations of (mode, padding) are ('ECB', 'PKCS'), ('GCM', 'NONE') and ('CBC', 'PKCS'). Optional additional authenticated data (AAD) is only supported for GCM. If provided for encryption, the identical AAD value must be provided for decryption. The default mode is GCM.

Arguments:

  • expr - The binary value to decrypt.
  • key - The passphrase to use to decrypt the data.
  • mode - Specifies which block cipher mode should be used to decrypt messages. Valid modes: ECB, GCM, CBC.
  • padding - Specifies how to pad messages whose length is not a multiple of the block size. Valid values: PKCS, NONE, DEFAULT. The DEFAULT padding means PKCS for ECB, NONE for GCM and PKCS for CBC.
  • aad - Optional additional authenticated data. Only supported for GCM mode. This can be any free-form input and must be provided for both encryption and decryption.

Examples:

> SELECT aes_decrypt(unhex('83F16B2AA704794132802D248E6BFD4E380078182D1544813898AC97E709B28A94'), '0000111122223333');
 Spark
> SELECT aes_decrypt(unhex('6E7CA17BBB468D3084B5744BCA729FB7B2B7BCB8E4472847D02670489D95FA97DBBA7D3210'), '0000111122223333', 'GCM');
 Spark SQL
> SELECT aes_decrypt(unbase64('3lmwu+Mw0H3fi5NDvcu9lg=='), '1234567890abcdef', 'ECB', 'PKCS');
 Spark SQL
> SELECT aes_decrypt(unbase64('2NYmDCjgXTbbxGA3/SnJEfFC/JQ7olk2VQWReIAAFKo='), '1234567890abcdef', 'CBC');
 Apache Spark
> SELECT aes_decrypt(unbase64('AAAAAAAAAAAAAAAAAAAAAPSd4mWyMZ5mhvjiAPQJnfg='), 'abcdefghijklmnop12345678ABCDEFGH', 'CBC', 'DEFAULT');
 Spark
> SELECT aes_decrypt(unbase64('AAAAAAAAAAAAAAAAQiYi+sTLm7KD9UcZ2nlRdYDe/PX4'), 'abcdefghijklmnop12345678ABCDEFGH', 'GCM', 'DEFAULT', 'This is an AAD mixed into the input');
 Spark

Since: 3.3.0


aes_encrypt

aes_encrypt(expr, key[, mode[, padding[, iv[, aad]]]]) - Returns an encrypted value of expr using AES in given mode with the specified padding. Key lengths of 16, 24 and 32 bits are supported. Supported combinations of (mode, padding) are ('ECB', 'PKCS'), ('GCM', 'NONE') and ('CBC', 'PKCS'). Optional initialization vectors (IVs) are only supported for CBC and GCM modes. These must be 16 bytes for CBC and 12 bytes for GCM. If not provided, a random vector will be generated and prepended to the output. Optional additional authenticated data (AAD) is only supported for GCM. If provided for encryption, the identical AAD value must be provided for decryption. The default mode is GCM.

Arguments:

  • expr - The binary value to encrypt.
  • key - The passphrase to use to encrypt the data.
  • mode - Specifies which block cipher mode should be used to encrypt messages. Valid modes: ECB, GCM, CBC.
  • padding - Specifies how to pad messages whose length is not a multiple of the block size. Valid values: PKCS, NONE, DEFAULT. The DEFAULT padding means PKCS for ECB, NONE for GCM and PKCS for CBC.
  • iv - Optional initialization vector. Only supported for CBC and GCM modes. Valid values: None or ''. 16-byte array for CBC mode. 12-byte array for GCM mode.
  • aad - Optional additional authenticated data. Only supported for GCM mode. This can be any free-form input and must be provided for both encryption and decryption.

Examples:

> SELECT hex(aes_encrypt('Spark', '0000111122223333'));
 83F16B2AA704794132802D248E6BFD4E380078182D1544813898AC97E709B28A94
> SELECT hex(aes_encrypt('Spark SQL', '0000111122223333', 'GCM'));
 6E7CA17BBB468D3084B5744BCA729FB7B2B7BCB8E4472847D02670489D95FA97DBBA7D3210
> SELECT base64(aes_encrypt('Spark SQL', '1234567890abcdef', 'ECB', 'PKCS'));
 3lmwu+Mw0H3fi5NDvcu9lg==
> SELECT base64(aes_encrypt('Apache Spark', '1234567890abcdef', 'CBC', 'DEFAULT'));
 2NYmDCjgXTbbxGA3/SnJEfFC/JQ7olk2VQWReIAAFKo=
> SELECT base64(aes_encrypt('Spark', 'abcdefghijklmnop12345678ABCDEFGH', 'CBC', 'DEFAULT', unhex('00000000000000000000000000000000')));
 AAAAAAAAAAAAAAAAAAAAAPSd4mWyMZ5mhvjiAPQJnfg=
> SELECT base64(aes_encrypt('Spark', 'abcdefghijklmnop12345678ABCDEFGH', 'GCM', 'DEFAULT', unhex('000000000000000000000000'), 'This is an AAD mixed into the input'));
 AAAAAAAAAAAAAAAAQiYi+sTLm7KD9UcZ2nlRdYDe/PX4

Since: 3.3.0


assert_true

assert_true(expr [, message]) - Throws an exception if expr is not true.

Examples:

> SELECT assert_true(0 < 1);
 NULL

Since: 2.0.0


bitmap_bit_position

bitmap_bit_position(child) - Returns the bit position for the given input child expression.

Examples:

> SELECT bitmap_bit_position(1);
 0
> SELECT bitmap_bit_position(123);
 122

Since: 3.5.0


bitmap_bucket_number

bitmap_bucket_number(child) - Returns the bucket number for the given input child expression.

Examples:

> SELECT bitmap_bucket_number(123);
 1
> SELECT bitmap_bucket_number(0);
 0

Since: 3.5.0


bitmap_count

bitmap_count(child) - Returns the number of set bits in the child bitmap.

Examples:

> SELECT bitmap_count(X '1010');
 2
> SELECT bitmap_count(X 'FFFF');
 16
> SELECT bitmap_count(X '0');
 0

Since: 3.5.0


current_catalog

current_catalog() - Returns the current catalog.

Examples:

> SELECT current_catalog();
 spark_catalog

Since: 3.1.0


current_database

current_database() - Returns the current database.

Examples:

> SELECT current_database();
 default

Since: 1.6.0


current_schema

current_schema() - Returns the current database.

Examples:

> SELECT current_schema();
 default

Since: 3.4.0


current_user

current_user() - user name of current execution context.

Examples:

> SELECT current_user();
 mockingjay

Since: 3.2.0


input_file_block_length

input_file_block_length() - Returns the length of the block being read, or -1 if not available.

Examples:

> SELECT input_file_block_length();
 -1

Since: 2.2.0


input_file_block_start

input_file_block_start() - Returns the start offset of the block being read, or -1 if not available.

Examples:

> SELECT input_file_block_start();
 -1

Since: 2.2.0


input_file_name

input_file_name() - Returns the name of the file being read, or empty string if not available.

Examples:

> SELECT input_file_name();

Since: 1.5.0


java_method

java_method(class, method[, arg1[, arg2 ..]]) - Calls a method with reflection.

Examples:

> SELECT java_method('java.util.UUID', 'randomUUID');
 c33fb387-8500-4bfa-81d2-6e0e3e930df2
> SELECT java_method('java.util.UUID', 'fromString', 'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2');
 a5cf6c42-0c85-418f-af6c-3e4e5b1328f2

Since: 2.0.0


monotonically_increasing_id

monotonically_increasing_id() - Returns monotonically increasing 64-bit integers. The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the lower 33 bits represent the record number within each partition. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records. The function is non-deterministic because its result depends on partition IDs.

Examples:

> SELECT monotonically_increasing_id();
 0

Since: 1.4.0


raise_error

raise_error( expr ) - Throws a USER_RAISED_EXCEPTION with expr as message.

Examples:

> SELECT raise_error('custom error message');
 [USER_RAISED_EXCEPTION] custom error message

Since: 3.1.0


reflect

reflect(class, method[, arg1[, arg2 ..]]) - Calls a method with reflection.

Examples:

> SELECT reflect('java.util.UUID', 'randomUUID');
 c33fb387-8500-4bfa-81d2-6e0e3e930df2
> SELECT reflect('java.util.UUID', 'fromString', 'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2');
 a5cf6c42-0c85-418f-af6c-3e4e5b1328f2

Since: 2.0.0


session_user

session_user() - user name of current execution context.

Examples:

> SELECT session_user();
 mockingjay

Since: 4.0.0


spark_partition_id

spark_partition_id() - Returns the current partition id.

Examples:

> SELECT spark_partition_id();
 0

Since: 1.4.0


try_aes_decrypt

try_aes_decrypt(expr, key[, mode[, padding[, aad]]]) - This is a special version of aes_decrypt that performs the same operation, but returns a NULL value instead of raising an error if the decryption cannot be performed.

Examples:

> SELECT try_aes_decrypt(unhex('6E7CA17BBB468D3084B5744BCA729FB7B2B7BCB8E4472847D02670489D95FA97DBBA7D3210'), '0000111122223333', 'GCM');
 Spark SQL
> SELECT try_aes_decrypt(unhex('----------468D3084B5744BCA729FB7B2B7BCB8E4472847D02670489D95FA97DBBA7D3210'), '0000111122223333', 'GCM');
 NULL

Since: 3.5.0


try_reflect

try_reflect(class, method[, arg1[, arg2 ..]]) - This is a special version of reflect that performs the same operation, but returns a NULL value instead of raising an error if the invoke method thrown exception.

Examples:

> SELECT try_reflect('java.util.UUID', 'randomUUID');
 c33fb387-8500-4bfa-81d2-6e0e3e930df2
> SELECT try_reflect('java.util.UUID', 'fromString', 'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2');
 a5cf6c42-0c85-418f-af6c-3e4e5b1328f2
> SELECT try_reflect('java.net.URLDecoder', 'decode', '%');
 NULL

Since: 4.0.0


typeof

typeof(expr) - Return DDL-formatted type string for the data type of the input.

Examples:

> SELECT typeof(1);
 int
> SELECT typeof(array(1));
 array<int>

Since: 3.0.0


user

user() - user name of current execution context.

Examples:

> SELECT user();
 mockingjay

Since: 3.4.0


uuid

uuid() - Returns an universally unique identifier (UUID) string. The value is returned as a canonical UUID 36-character string.

Examples:

> SELECT uuid();
 46707d92-02f4-4817-8116-a4c3b23e6266

Note:

The function is non-deterministic.

Since: 2.3.0


version

version() - Returns the Spark version. The string contains 2 fields, the first being a release version and the second being a git revision.

Examples:

> SELECT version();
 3.1.0 a6d6ea3efedbad14d99c24143834cd4e2e52fb40

Since: 3.0.0