Agg Functions ¶

This page lists all agg functions available in Spark SQL.

any_value ¶

any_value(expr[, isIgnoreNull]) - Returns some value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values.

Examples:

> SELECT any_value(col) FROM VALUES (10), (5), (20) AS tab(col);
 10
> SELECT any_value(col) FROM VALUES (NULL), (5), (20) AS tab(col);
 NULL
> SELECT any_value(col, true) FROM VALUES (NULL), (5), (20) AS tab(col);
 5

Note:

The function is non-deterministic.

Since: 3.4.0

approx_percentile(col, percentage [, accuracy]) - Returns the approximate percentile of the numeric or ansi interval column col which is the smallest value in the ordered col values (sorted from least to greatest) such that no more than percentage of col values is less than the value or equal to that value. The value of percentage must be between 0.0 and 1.0. The accuracy parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of accuracy yields better accuracy, 1.0/accuracy is the relative error of the approximation. When percentage is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column col at the given percentage array.

Examples:

> SELECT approx_percentile(col, array(0.5, 0.4, 0.1), 100) FROM VALUES (0), (1), (2), (10) AS tab(col);
 [1,1,0]
> SELECT approx_percentile(col, 0.5, 100) FROM VALUES (0), (6), (7), (9), (10) AS tab(col);
 7
> SELECT approx_percentile(col, 0.5, 100) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '1' MONTH), (INTERVAL '2' MONTH), (INTERVAL '10' MONTH) AS tab(col);
 0-1
> SELECT approx_percentile(col, array(0.5, 0.7), 100) FROM VALUES (INTERVAL '0' SECOND), (INTERVAL '1' SECOND), (INTERVAL '2' SECOND), (INTERVAL '10' SECOND) AS tab(col);
 [0 00:00:01.000000000,0 00:00:02.000000000]

Since: 2.1.0

approx_top_k ¶

approx_top_k(expr, k, maxItemsTracked) - Returns top k items with their frequency. k An optional INTEGER literal greater than 0. If k is not specified, it defaults to 5. maxItemsTracked An optional INTEGER literal greater than or equal to k and has upper limit of 1000000. If maxItemsTracked is not specified, it defaults to 10000.

Examples:

> SELECT approx_top_k(expr) FROM VALUES (0), (0), (1), (1), (2), (3), (4), (4) AS tab(expr);
 [{"item":0,"count":2},{"item":4,"count":2},{"item":1,"count":2},{"item":2,"count":1},{"item":3,"count":1}]

> SELECT approx_top_k(expr, 2) FROM VALUES 'a', 'b', 'c', 'c', 'c', 'c', 'd', 'd' AS tab(expr);
 [{"item":"c","count":4},{"item":"d","count":2}]

> SELECT approx_top_k(expr, 10, 100) FROM VALUES (0), (1), (1), (2), (2), (2) AS tab(expr);
 [{"item":2,"count":3},{"item":1,"count":2},{"item":0,"count":1}]

Since: 4.1.0

approx_top_k_accumulate ¶

approx_top_k_accumulate(expr, maxItemsTracked) - Accumulates items into a sketch. maxItemsTracked An optional positive INTEGER literal with upper limit of 1000000. If maxItemsTracked is not specified, it defaults to 10000.

Examples:

> SELECT approx_top_k_estimate(approx_top_k_accumulate(expr)) FROM VALUES (0), (0), (1), (1), (2), (3), (4), (4) AS tab(expr);
 [{"item":0,"count":2},{"item":4,"count":2},{"item":1,"count":2},{"item":2,"count":1},{"item":3,"count":1}]

> SELECT approx_top_k_estimate(approx_top_k_accumulate(expr, 100), 2) FROM VALUES 'a', 'b', 'c', 'c', 'c', 'c', 'd', 'd' AS tab(expr);
 [{"item":"c","count":4},{"item":"d","count":2}]

Since: 4.1.0

approx_top_k_combine ¶

approx_top_k_combine(state, maxItemsTracked) - Combines multiple sketches into a single sketch. maxItemsTracked An optional positive INTEGER literal with upper limit of 1000000. If maxItemsTracked is specified, it will be set for the combined sketch. If maxItemsTracked is not specified, the input sketches must have the same maxItemsTracked value, otherwise an error will be thrown. The output sketch will use the same value from the input sketches.

Examples:

> SELECT approx_top_k_estimate(approx_top_k_combine(sketch, 10000), 5) FROM (SELECT approx_top_k_accumulate(expr) AS sketch FROM VALUES (0), (0), (1), (1) AS tab(expr) UNION ALL SELECT approx_top_k_accumulate(expr) AS sketch FROM VALUES (2), (3), (4), (4) AS tab(expr));
 [{"item":0,"count":2},{"item":4,"count":2},{"item":1,"count":2},{"item":2,"count":1},{"item":3,"count":1}]

Since: 4.1.0

array_agg ¶

array_agg(expr) - Collects and returns a list of non-unique elements.

Examples:

> SELECT array_agg(col) FROM VALUES (1), (2), (1) AS tab(col);
 [1,2,1]

Note:

The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.

Since: 3.3.0

avg ¶

avg(expr) - Returns the mean calculated from values of a group.

Examples:

> SELECT avg(col) FROM VALUES (1), (2), (3) AS tab(col);
 2.0
> SELECT avg(col) FROM VALUES (1), (2), (NULL) AS tab(col);
 1.5

Since: 1.0.0

bit_and ¶

bit_and(expr) - Returns the bitwise AND of all non-null input values, or null if none.

Examples:

> SELECT bit_and(col) FROM VALUES (3), (5) AS tab(col);
 1

Since: 3.0.0

bit_or ¶

bit_or(expr) - Returns the bitwise OR of all non-null input values, or null if none.

Examples:

> SELECT bit_or(col) FROM VALUES (3), (5) AS tab(col);
 7

Since: 3.0.0

bit_xor ¶

bit_xor(expr) - Returns the bitwise XOR of all non-null input values, or null if none.

Examples:

> SELECT bit_xor(col) FROM VALUES (3), (5) AS tab(col);
 6

Since: 3.0.0

bitmap_and_agg ¶

bitmap_and_agg(child) - Returns a bitmap that is the bitwise AND of all of the bitmaps from the child expression. The input should be bitmaps created from bitmap_construct_agg().

Examples:

> SELECT substring(hex(bitmap_and_agg(col)), 0, 6) FROM VALUES (X 'F0'), (X '70'), (X '30') AS tab(col);
 300000
> SELECT substring(hex(bitmap_and_agg(col)), 0, 6) FROM VALUES (X 'FF'), (X 'FF'), (X 'FF') AS tab(col);
 FF0000

Since: 4.1.0

bitmap_construct_agg ¶

bitmap_construct_agg(child) - Returns a bitmap with the positions of the bits set from all the values from the child expression. The child expression will most likely be bitmap_bit_position().

Examples:

> SELECT substring(hex(bitmap_construct_agg(bitmap_bit_position(col))), 0, 6) FROM VALUES (1), (2), (3) AS tab(col);
 070000
> SELECT substring(hex(bitmap_construct_agg(bitmap_bit_position(col))), 0, 6) FROM VALUES (1), (1), (1) AS tab(col);
 010000

Since: 3.5.0

bitmap_or_agg ¶

bitmap_or_agg(child) - Returns a bitmap that is the bitwise OR of all of the bitmaps from the child expression. The input should be bitmaps created from bitmap_construct_agg().

Examples:

> SELECT substring(hex(bitmap_or_agg(col)), 0, 6) FROM VALUES (X '10'), (X '20'), (X '40') AS tab(col);
 700000
> SELECT substring(hex(bitmap_or_agg(col)), 0, 6) FROM VALUES (X '10'), (X '10'), (X '10') AS tab(col);
 100000

Since: 3.5.0

bool_and ¶

bool_and(expr) - Returns true if all values of expr are true.

Examples:

> SELECT bool_and(col) FROM VALUES (true), (true), (true) AS tab(col);
 true
> SELECT bool_and(col) FROM VALUES (NULL), (true), (true) AS tab(col);
 true
> SELECT bool_and(col) FROM VALUES (true), (false), (true) AS tab(col);
 false

Since: 3.0.0

bool_or ¶

bool_or(expr) - Returns true if at least one value of expr is true.

Examples:

> SELECT bool_or(col) FROM VALUES (true), (false), (false) AS tab(col);
 true
> SELECT bool_or(col) FROM VALUES (NULL), (true), (false) AS tab(col);
 true
> SELECT bool_or(col) FROM VALUES (false), (false), (NULL) AS tab(col);
 false

Since: 3.0.0

collect_list ¶

collect_list(expr) - Collects and returns a list of non-unique elements.

Examples:

> SELECT collect_list(col) FROM VALUES (1), (2), (1) AS tab(col);
 [1,2,1]

Note:

The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.

Since: 2.0.0

collect_set ¶

collect_set(expr) - Collects and returns a set of unique elements.

Examples:

> SELECT collect_set(col) FROM VALUES (1), (2), (1) AS tab(col);
 [1,2]

Note:

The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.

Since: 2.0.0

corr ¶

corr(expr1, expr2) - Returns Pearson coefficient of correlation between a set of number pairs.

Examples:

> SELECT corr(c1, c2) FROM VALUES (3, 2), (3, 3), (6, 4) as tab(c1, c2);
 0.8660254037844387

Since: 1.6.0

count ¶

count(*) - Returns the total number of retrieved rows, including rows containing null.

count(expr[, expr...]) - Returns the number of rows for which the supplied expression(s) are all non-null.

count(DISTINCT expr[, expr...]) - Returns the number of rows for which the supplied expression(s) are unique and non-null.

Examples:

> SELECT count(*) FROM VALUES (NULL), (5), (5), (20) AS tab(col);
 4
> SELECT count(col) FROM VALUES (NULL), (5), (5), (20) AS tab(col);
 3
> SELECT count(DISTINCT col) FROM VALUES (NULL), (5), (5), (10) AS tab(col);
 2

Since: 1.0.0

count_if ¶

count_if(expr) - Returns the number of TRUE values for the expression.

Examples:

> SELECT count_if(col % 2 = 0) FROM VALUES (NULL), (0), (1), (2), (3) AS tab(col);
 2
> SELECT count_if(col IS NULL) FROM VALUES (NULL), (0), (1), (2), (3) AS tab(col);
 1

Since: 3.0.0

count_min_sketch ¶

count_min_sketch(col, eps, confidence, seed) - Returns a count-min sketch of a column with the given esp, confidence and seed. The result is an array of bytes, which can be deserialized to a CountMinSketch before usage. Count-min sketch is a probabilistic data structure used for cardinality estimation using sub-linear space.

Examples:

> SELECT hex(count_min_sketch(col, 0.5d, 0.5d, 1)) FROM VALUES (1), (2), (1) AS tab(col);
 0000000100000000000000030000000100000004000000005D8D6AB90000000000000000000000000000000200000000000000010000000000000000

Since: 2.2.0

covar_pop ¶

covar_pop(expr1, expr2) - Returns the population covariance of a set of number pairs.

Examples:

> SELECT covar_pop(c1, c2) FROM VALUES (1,1), (2,2), (3,3) AS tab(c1, c2);
 0.6666666666666666

Since: 2.0.0

covar_samp ¶

covar_samp(expr1, expr2) - Returns the sample covariance of a set of number pairs.

Examples:

> SELECT covar_samp(c1, c2) FROM VALUES (1,1), (2,2), (3,3) AS tab(c1, c2);
 1.0

Since: 2.0.0

every ¶

every(expr) - Returns true if all values of expr are true.

Examples:

> SELECT every(col) FROM VALUES (true), (true), (true) AS tab(col);
 true
> SELECT every(col) FROM VALUES (NULL), (true), (true) AS tab(col);
 true
> SELECT every(col) FROM VALUES (true), (false), (true) AS tab(col);
 false

Since: 3.0.0

first ¶

first(expr[, isIgnoreNull]) - Returns the first value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values.

Examples:

> SELECT first(col) FROM VALUES (10), (5), (20) AS tab(col);
 10
> SELECT first(col) FROM VALUES (NULL), (5), (20) AS tab(col);
 NULL
> SELECT first(col, true) FROM VALUES (NULL), (5), (20) AS tab(col);
 5

Note:

The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.

Since: 2.0.0

first_value ¶

first_value(expr[, isIgnoreNull]) - Returns the first value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values.

Examples:

> SELECT first_value(col) FROM VALUES (10), (5), (20) AS tab(col);
 10
> SELECT first_value(col) FROM VALUES (NULL), (5), (20) AS tab(col);
 NULL
> SELECT first_value(col, true) FROM VALUES (NULL), (5), (20) AS tab(col);
 5

Note:

The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.

Since: 2.0.0

grouping ¶

grouping(col) - indicates whether a specified column in a GROUP BY is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.",

Examples:

> SELECT name, grouping(name), sum(age) FROM VALUES (2, 'Alice'), (5, 'Bob') people(age, name) GROUP BY cube(name);
  Alice 0   2
  Bob   0   5
  NULL  1   7

Since: 2.0.0

grouping_id ¶

grouping_id([col1[, col2 ..]]) - returns the level of grouping, equals to (grouping(c1) << (n-1)) + (grouping(c2) << (n-2)) + ... + grouping(cn)

Examples:

> SELECT name, grouping_id(), sum(age), avg(height) FROM VALUES (2, 'Alice', 165), (5, 'Bob', 180) people(age, name, height) GROUP BY cube(name, height);
  Alice 0   2   165.0
  Alice 1   2   165.0
  NULL  3   7   172.5
  Bob   0   5   180.0
  Bob   1   5   180.0
  NULL  2   2   165.0
  NULL  2   5   180.0

Note:

Input columns should match with grouping columns exactly, or empty (means all the grouping columns).

Since: 2.0.0

histogram_numeric ¶

histogram_numeric(expr, nb) - Computes a histogram on numeric 'expr' using nb bins. The return value is an array of (x,y) pairs representing the centers of the histogram's bins. As the value of 'nb' is increased, the histogram approximation gets finer-grained, but may yield artifacts around outliers. In practice, 20-40 histogram bins appear to work well, with more bins being required for skewed or smaller datasets. Note that this function creates a histogram with non-uniform bin widths. It offers no guarantees in terms of the mean-squared-error of the histogram, but in practice is comparable to the histograms produced by the R/S-Plus statistical computing packages. Note: the output type of the 'x' field in the return value is propagated from the input value consumed in the aggregate function.

Examples:

> SELECT histogram_numeric(col, 5) FROM VALUES (0), (1), (2), (10) AS tab(col);
 [{"x":0,"y":1.0},{"x":1,"y":1.0},{"x":2,"y":1.0},{"x":10,"y":1.0}]

Since: 3.3.0

hll_sketch_agg ¶

hll_sketch_agg(expr, lgConfigK) - Returns the HllSketch's updatable binary representation. lgConfigK (optional) the log-base-2 of K, with K is the number of buckets or slots for the HllSketch.

Examples:

> SELECT hll_sketch_estimate(hll_sketch_agg(col, 12)) FROM VALUES (1), (1), (2), (2), (3) tab(col);
 3

Since: 3.5.0

hll_union_agg ¶

hll_union_agg(expr, allowDifferentLgConfigK) - Returns the estimated number of unique values. allowDifferentLgConfigK (optional) Allow sketches with different lgConfigK values to be unioned (defaults to false).

Examples:

> SELECT hll_sketch_estimate(hll_union_agg(sketch, true)) FROM (SELECT hll_sketch_agg(col) as sketch FROM VALUES (1) tab(col) UNION ALL SELECT hll_sketch_agg(col, 20) as sketch FROM VALUES (1) tab(col));
 1

Since: 3.5.0

kll_merge_agg_bigint ¶

kll_merge_agg_bigint(expr[, k]) - Merges binary KllLongsSketch representations and returns the merged sketch. The input expression should contain binary sketch representations (e.g., from kll_sketch_agg_bigint). The optional k parameter controls the size and accuracy of the merged sketch (range 8-65535). If k is not specified, the merged sketch adopts the k value from the first input sketch.

Examples:

> SELECT kll_sketch_get_n_bigint(kll_merge_agg_bigint(sketch)) FROM (SELECT kll_sketch_agg_bigint(col) as sketch FROM VALUES (1), (2), (3) tab(col) UNION ALL SELECT kll_sketch_agg_bigint(col) as sketch FROM VALUES (4), (5), (6) tab(col)) t;
 6

Since: 4.1.0

kll_merge_agg_double ¶

kll_merge_agg_double(expr[, k]) - Merges binary KllDoublesSketch representations and returns the merged sketch. The input expression should contain binary sketch representations (e.g., from kll_sketch_agg_double). The optional k parameter controls the size and accuracy of the merged sketch (range 8-65535). If k is not specified, the merged sketch adopts the k value from the first input sketch.

Examples:

> SELECT kll_sketch_get_n_double(kll_merge_agg_double(sketch)) FROM (SELECT kll_sketch_agg_double(col) as sketch FROM VALUES (CAST(1.0 AS DOUBLE)), (CAST(2.0 AS DOUBLE)), (CAST(3.0 AS DOUBLE)) tab(col) UNION ALL SELECT kll_sketch_agg_double(col) as sketch FROM VALUES (CAST(4.0 AS DOUBLE)), (CAST(5.0 AS DOUBLE)), (CAST(6.0 AS DOUBLE)) tab(col)) t;
 6

Since: 4.1.0

kll_merge_agg_float ¶

kll_merge_agg_float(expr[, k]) - Merges binary KllFloatsSketch representations and returns the merged sketch. The input expression should contain binary sketch representations (e.g., from kll_sketch_agg_float). The optional k parameter controls the size and accuracy of the merged sketch (range 8-65535). If k is not specified, the merged sketch adopts the k value from the first input sketch.

Examples:

> SELECT kll_sketch_get_n_float(kll_merge_agg_float(sketch)) FROM (SELECT kll_sketch_agg_float(col) as sketch FROM VALUES (CAST(1.0 AS FLOAT)), (CAST(2.0 AS FLOAT)), (CAST(3.0 AS FLOAT)) tab(col) UNION ALL SELECT kll_sketch_agg_float(col) as sketch FROM VALUES (CAST(4.0 AS FLOAT)), (CAST(5.0 AS FLOAT)), (CAST(6.0 AS FLOAT)) tab(col)) t;
 6

Since: 4.1.0

kll_sketch_agg_bigint ¶

kll_sketch_agg_bigint(expr[, k]) - Returns the KllLongsSketch compact binary representation. The optional k parameter controls the size and accuracy of the sketch (default 200, range 8-65535). Larger k values provide more accurate quantile estimates but result in larger, slower sketches.

Examples:

> SELECT LENGTH(kll_sketch_to_string_bigint(kll_sketch_agg_bigint(col))) > 0 FROM VALUES (1), (2), (3), (4), (5) tab(col);
 true
> SELECT LENGTH(kll_sketch_to_string_bigint(kll_sketch_agg_bigint(col, 400))) > 0 FROM VALUES (1), (2), (3), (4), (5) tab(col);
 true

Since: 4.1.0

kll_sketch_agg_double ¶

kll_sketch_agg_double(expr[, k]) - Returns the KllDoublesSketch compact binary representation. The optional k parameter controls the size and accuracy of the sketch (default 200, range 8-65535). Larger k values provide more accurate quantile estimates but result in larger, slower sketches.

Examples:

> SELECT LENGTH(kll_sketch_to_string_double(kll_sketch_agg_double(col))) > 0 FROM VALUES (CAST(1.0 AS DOUBLE)), (CAST(2.0 AS DOUBLE)), (CAST(3.0 AS DOUBLE)), (CAST(4.0 AS DOUBLE)), (CAST(5.0 AS DOUBLE)) tab(col);
 true
> SELECT LENGTH(kll_sketch_to_string_double(kll_sketch_agg_double(col, 400))) > 0 FROM VALUES (CAST(1.0 AS DOUBLE)), (CAST(2.0 AS DOUBLE)), (CAST(3.0 AS DOUBLE)), (CAST(4.0 AS DOUBLE)), (CAST(5.0 AS DOUBLE)) tab(col);
 true

Since: 4.1.0

kll_sketch_agg_float ¶

kll_sketch_agg_float(expr[, k]) - Returns the KllFloatsSketch compact binary representation. The optional k parameter controls the size and accuracy of the sketch (default 200, range 8-65535). Larger k values provide more accurate quantile estimates but result in larger, slower sketches.

Examples:

> SELECT LENGTH(kll_sketch_to_string_float(kll_sketch_agg_float(col))) > 0 FROM VALUES (CAST(1.0 AS FLOAT)), (CAST(2.0 AS FLOAT)), (CAST(3.0 AS FLOAT)), (CAST(4.0 AS FLOAT)), (CAST(5.0 AS FLOAT)) tab(col);
 true
> SELECT LENGTH(kll_sketch_to_string_float(kll_sketch_agg_float(col, 400))) > 0 FROM VALUES (CAST(1.0 AS FLOAT)), (CAST(2.0 AS FLOAT)), (CAST(3.0 AS FLOAT)), (CAST(4.0 AS FLOAT)), (CAST(5.0 AS FLOAT)) tab(col);
 true

Since: 4.1.0

kurtosis ¶

kurtosis(expr) - Returns the kurtosis value calculated from values of a group.

Examples:

> SELECT kurtosis(col) FROM VALUES (-10), (-20), (100), (1000) AS tab(col);
 -0.7014368047529627
> SELECT kurtosis(col) FROM VALUES (1), (10), (100), (10), (1) as tab(col);
 0.19432323191699075

Since: 1.6.0

last ¶

last(expr[, isIgnoreNull]) - Returns the last value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values

Examples:

> SELECT last(col) FROM VALUES (10), (5), (20) AS tab(col);
 20
> SELECT last(col) FROM VALUES (10), (5), (NULL) AS tab(col);
 NULL
> SELECT last(col, true) FROM VALUES (10), (5), (NULL) AS tab(col);
 5

Note:

The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.

Since: 2.0.0

last_value ¶

last_value(expr[, isIgnoreNull]) - Returns the last value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values

Examples:

> SELECT last_value(col) FROM VALUES (10), (5), (20) AS tab(col);
 20
> SELECT last_value(col) FROM VALUES (10), (5), (NULL) AS tab(col);
 NULL
> SELECT last_value(col, true) FROM VALUES (10), (5), (NULL) AS tab(col);
 5

Note:

The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.

Since: 2.0.0

listagg ¶

listagg(expr[, delimiter])[ WITHIN GROUP (ORDER BY key [ASC | DESC] [,...])] - Returns the concatenation of non-NULL input values, separated by the delimiter ordered by key. If all values are NULL, NULL is returned.

Arguments:

expr - a string or binary expression to be concatenated.
delimiter - an optional string or binary foldable expression used to separate the input values. If NULL, the concatenation will be performed without a delimiter. Default is NULL.
key - an optional expression for ordering the input values. Multiple keys can be specified. If none are specified, the order of the rows in the result is non-deterministic.

Examples:

> SELECT listagg(col) FROM VALUES ('a'), ('b'), ('c') AS tab(col);
 abc
> SELECT listagg(col) WITHIN GROUP (ORDER BY col DESC) FROM VALUES ('a'), ('b'), ('c') AS tab(col);
 cba
> SELECT listagg(col) FROM VALUES ('a'), (NULL), ('b') AS tab(col);
 ab
> SELECT listagg(col) FROM VALUES ('a'), ('a') AS tab(col);
 aa
> SELECT listagg(DISTINCT col) FROM VALUES ('a'), ('a'), ('b') AS tab(col);
 ab
> SELECT listagg(col, ', ') FROM VALUES ('a'), ('b'), ('c') AS tab(col);
 a, b, c
> SELECT listagg(col) FROM VALUES (NULL), (NULL) AS tab(col);
 NULL

Note:

If the order is not specified, the function is non-deterministic because the order of the rows may be non-deterministic after a shuffle.
If DISTINCT is specified, then expr and key must be the same expression.

Since: 4.0.0

max ¶

max(expr) - Returns the maximum value of expr.

Examples:

> SELECT max(col) FROM VALUES (10), (50), (20) AS tab(col);
 50

Since: 1.0.0

max_by ¶

max_by(x, y) - Returns the value of x associated with the maximum value of y. max_by(x, y, k) - Returns an array of the k values of x associated with the maximum values of y, sorted in descending order by y. Returns NULL if there are no non-NULL ordering values.

Examples:

> SELECT max_by(x, y) FROM VALUES ('a', 10), ('b', 50), ('c', 20) AS tab(x, y);
 b
> SELECT max_by(x, y, 2) FROM VALUES ('a', 10), ('b', 50), ('c', 20) AS tab(x, y);
 ["b","c"]

Note:

The function is non-deterministic so the output order can be different for those associated the same values of y.

The maximum value of k is 100000.

Since: 4.2.0

mean ¶

mean(expr) - Returns the mean calculated from values of a group.

Examples:

> SELECT mean(col) FROM VALUES (1), (2), (3) AS tab(col);
 2.0
> SELECT mean(col) FROM VALUES (1), (2), (NULL) AS tab(col);
 1.5

Since: 1.0.0

measure ¶

measure(expr) - this function is used and can only be used to calculate a measure defined in a metric view.

Examples:

> SELECT dimension_col, measure(measure_col)
  FROM test_metric_view
  GROUP BY dimension_col;
dim_1, 100
dim_2, 200

Since: 4.2.0

median ¶

median(col) - Returns the median of numeric or ANSI interval column col.

Examples:

> SELECT median(col) FROM VALUES (0), (10) AS tab(col);
 5.0
> SELECT median(col) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH) AS tab(col);
 0-5

Since: 3.4.0

min ¶

min(expr) - Returns the minimum value of expr.

Examples:

> SELECT min(col) FROM VALUES (10), (-1), (20) AS tab(col);
 -1

Since: 1.0.0

min_by ¶

min_by(x, y) - Returns the value of x associated with the minimum value of y. min_by(x, y, k) - Returns an array of the k values of x associated with the minimum values of y, sorted in ascending order by y. Returns NULL if there are no non-NULL ordering values.

Examples:

> SELECT min_by(x, y) FROM VALUES ('a', 10), ('b', 50), ('c', 20) AS tab(x, y);
 a
> SELECT min_by(x, y, 2) FROM VALUES ('a', 10), ('b', 50), ('c', 20) AS tab(x, y);
 ["a","c"]

Note:

The function is non-deterministic so the output order can be different for those associated the same values of y.

The maximum value of k is 100000.

Since: 4.2.0

mode ¶

mode(col[, deterministic]) - Returns the most frequent value for the values within col. NULL values are ignored. If all the values are NULL, or there are 0 rows, returns NULL. When multiple values have the same greatest frequency then either any of values is returned if deterministic is false or is not defined, or the lowest value is returned if deterministic is true. mode() WITHIN GROUP (ORDER BY col) - Returns the most frequent value for the values within col (specified in ORDER BY clause). NULL values are ignored. If all the values are NULL, or there are 0 rows, returns NULL. When multiple values have the same greatest frequency only one value will be returned. The value will be chosen based on sort direction. Return the smallest value if sort direction is asc or the largest value if sort direction is desc from multiple values with the same frequency.

Examples:

> SELECT mode(col) FROM VALUES (0), (10), (10) AS tab(col);
 10
> SELECT mode(col) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH), (INTERVAL '10' MONTH) AS tab(col);
 0-10
> SELECT mode(col) FROM VALUES (0), (10), (10), (null), (null), (null) AS tab(col);
 10
> SELECT mode(col, false) FROM VALUES (-10), (0), (10) AS tab(col);
 0
> SELECT mode(col, true) FROM VALUES (-10), (0), (10) AS tab(col);
 -10
> SELECT mode() WITHIN GROUP (ORDER BY col) FROM VALUES (0), (10), (10) AS tab(col);
 10
> SELECT mode() WITHIN GROUP (ORDER BY col) FROM VALUES (0), (10), (10), (20), (20) AS tab(col);
 10
> SELECT mode() WITHIN GROUP (ORDER BY col DESC) FROM VALUES (0), (10), (10), (20), (20) AS tab(col);
 20

Since: 3.4.0

percentile ¶

percentile(col, percentage [, frequency]) - Returns the exact percentile value of numeric or ANSI interval column col at the given percentage. The value of percentage must be between 0.0 and 1.0. The value of frequency should be positive integral

percentile(col, array(percentage1 [, percentage2]...) [, frequency]) - Returns the exact percentile value array of numeric column col at the given percentage(s). Each value of the percentage array must be between 0.0 and 1.0. The value of frequency should be positive integral

Examples:

> SELECT percentile(col, 0.3) FROM VALUES (0), (10) AS tab(col);
 3.0
> SELECT percentile(col, array(0.25, 0.75)) FROM VALUES (0), (10) AS tab(col);
 [2.5,7.5]
> SELECT percentile(col, 0.5) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH) AS tab(col);
 0-5
> SELECT percentile(col, array(0.2, 0.5)) FROM VALUES (INTERVAL '0' SECOND), (INTERVAL '10' SECOND) AS tab(col);
 [0 00:00:02.000000000,0 00:00:05.000000000]

Since: 2.1.0

percentile_approx ¶

percentile_approx(col, percentage [, accuracy]) - Returns the approximate percentile of the numeric or ansi interval column col which is the smallest value in the ordered col values (sorted from least to greatest) such that no more than percentage of col values is less than the value or equal to that value. The value of percentage must be between 0.0 and 1.0. The accuracy parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of accuracy yields better accuracy, 1.0/accuracy is the relative error of the approximation. When percentage is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column col at the given percentage array.

Examples:

> SELECT percentile_approx(col, array(0.5, 0.4, 0.1), 100) FROM VALUES (0), (1), (2), (10) AS tab(col);
 [1,1,0]
> SELECT percentile_approx(col, 0.5, 100) FROM VALUES (0), (6), (7), (9), (10) AS tab(col);
 7
> SELECT percentile_approx(col, 0.5, 100) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '1' MONTH), (INTERVAL '2' MONTH), (INTERVAL '10' MONTH) AS tab(col);
 0-1
> SELECT percentile_approx(col, array(0.5, 0.7), 100) FROM VALUES (INTERVAL '0' SECOND), (INTERVAL '1' SECOND), (INTERVAL '2' SECOND), (INTERVAL '10' SECOND) AS tab(col);
 [0 00:00:01.000000000,0 00:00:02.000000000]

Since: 2.1.0

percentile_cont ¶

percentile_cont(percentage) WITHIN GROUP (ORDER BY col) - Return a percentile value based on a continuous distribution of numeric or ANSI interval column col at the given percentage (specified in ORDER BY clause).

Examples:

> SELECT percentile_cont(0.25) WITHIN GROUP (ORDER BY col) FROM VALUES (0), (10) AS tab(col);
 2.5
> SELECT percentile_cont(0.25) WITHIN GROUP (ORDER BY col) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH) AS tab(col);
 0-2

Since: 4.0.0

percentile_disc ¶

percentile_disc(percentage) WITHIN GROUP (ORDER BY col) - Return a percentile value based on a discrete distribution of numeric or ANSI interval column col at the given percentage (specified in ORDER BY clause).

Examples:

> SELECT percentile_disc(0.25) WITHIN GROUP (ORDER BY col) FROM VALUES (0), (10) AS tab(col);
 0.0
> SELECT percentile_disc(0.25) WITHIN GROUP (ORDER BY col) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH) AS tab(col);
 0-0

Since: 4.0.0

regr_avgx ¶

regr_avgx(y, x) - Returns the average of the independent variable for non-null pairs in a group, where y is the dependent variable and x is the independent variable.

Examples:

> SELECT regr_avgx(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x);
 2.75
> SELECT regr_avgx(y, x) FROM VALUES (1, null) AS tab(y, x);
 NULL
> SELECT regr_avgx(y, x) FROM VALUES (null, 1) AS tab(y, x);
 NULL
> SELECT regr_avgx(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x);
 3.0
> SELECT regr_avgx(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x);
 3.0

Since: 3.3.0

regr_avgy ¶

regr_avgy(y, x) - Returns the average of the dependent variable for non-null pairs in a group, where y is the dependent variable and x is the independent variable.

Examples:

> SELECT regr_avgy(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x);
 1.75
> SELECT regr_avgy(y, x) FROM VALUES (1, null) AS tab(y, x);
 NULL
> SELECT regr_avgy(y, x) FROM VALUES (null, 1) AS tab(y, x);
 NULL
> SELECT regr_avgy(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x);
 1.6666666666666667
> SELECT regr_avgy(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x);
 1.5

Since: 3.3.0

regr_count ¶

regr_count(y, x) - Returns the number of non-null number pairs in a group, where y is the dependent variable and x is the independent variable.

Examples:

> SELECT regr_count(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x);
 4
> SELECT regr_count(y, x) FROM VALUES (1, null) AS tab(y, x);
 0
> SELECT regr_count(y, x) FROM VALUES (null, 1) AS tab(y, x);
 0
> SELECT regr_count(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x);
 3
> SELECT regr_count(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x);
 2

Since: 3.3.0

regr_intercept ¶

regr_intercept(y, x) - Returns the intercept of the univariate linear regression line for non-null pairs in a group, where y is the dependent variable and x is the independent variable.

Examples:

> SELECT regr_intercept(y, x) FROM VALUES (1, 1), (2, 2), (3, 3), (4, 4) AS tab(y, x);
 0.0
> SELECT regr_intercept(y, x) FROM VALUES (1, null) AS tab(y, x);
 NULL
> SELECT regr_intercept(y, x) FROM VALUES (null, 1) AS tab(y, x);
 NULL
> SELECT regr_intercept(y, x) FROM VALUES (1, 1), (2, null), (3, 3), (4, 4) AS tab(y, x);
 0.0
> SELECT regr_intercept(y, x) FROM VALUES (1, 1), (2, null), (null, 3), (4, 4) AS tab(y, x);
 0.0

Since: 3.4.0

regr_r2 ¶

regr_r2(y, x) - Returns the coefficient of determination for non-null pairs in a group, where y is the dependent variable and x is the independent variable.

Examples:

> SELECT regr_r2(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x);
 0.2727272727272727
> SELECT regr_r2(y, x) FROM VALUES (1, null) AS tab(y, x);
 NULL
> SELECT regr_r2(y, x) FROM VALUES (null, 1) AS tab(y, x);
 NULL
> SELECT regr_r2(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x);
 0.7500000000000001
> SELECT regr_r2(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x);
 1.0

Since: 3.3.0

regr_slope ¶

regr_slope(y, x) - Returns the slope of the linear regression line for non-null pairs in a group, where y is the dependent variable and x is the independent variable.

Examples:

> SELECT regr_slope(y, x) FROM VALUES (1, 1), (2, 2), (3, 3), (4, 4) AS tab(y, x);
 1.0
> SELECT regr_slope(y, x) FROM VALUES (1, null) AS tab(y, x);
 NULL
> SELECT regr_slope(y, x) FROM VALUES (null, 1) AS tab(y, x);
 NULL
> SELECT regr_slope(y, x) FROM VALUES (1, 1), (2, null), (3, 3), (4, 4) AS tab(y, x);
 1.0
> SELECT regr_slope(y, x) FROM VALUES (1, 1), (2, null), (null, 3), (4, 4) AS tab(y, x);
 1.0

Since: 3.4.0

regr_sxx ¶

regr_sxx(y, x) - Returns REGR_COUNT(y, x) * VAR_POP(x) for non-null pairs in a group, where y is the dependent variable and x is the independent variable.

Examples:

> SELECT regr_sxx(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x);
 2.75
> SELECT regr_sxx(y, x) FROM VALUES (1, null) AS tab(y, x);
 NULL
> SELECT regr_sxx(y, x) FROM VALUES (null, 1) AS tab(y, x);
 NULL
> SELECT regr_sxx(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x);
 2.0
> SELECT regr_sxx(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x);
 2.0

Since: 3.4.0

regr_sxy ¶

regr_sxy(y, x) - Returns REGR_COUNT(y, x) * COVAR_POP(y, x) for non-null pairs in a group, where y is the dependent variable and x is the independent variable.

Examples:

> SELECT regr_sxy(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x);
 0.75
> SELECT regr_sxy(y, x) FROM VALUES (1, null) AS tab(y, x);
 NULL
> SELECT regr_sxy(y, x) FROM VALUES (null, 1) AS tab(y, x);
 NULL
> SELECT regr_sxy(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x);
 1.0
> SELECT regr_sxy(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x);
 1.0

Since: 3.4.0

regr_syy ¶

regr_syy(y, x) - Returns REGR_COUNT(y, x) * VAR_POP(y) for non-null pairs in a group, where y is the dependent variable and x is the independent variable.

Examples:

> SELECT regr_syy(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x);
 0.75
> SELECT regr_syy(y, x) FROM VALUES (1, null) AS tab(y, x);
 NULL
> SELECT regr_syy(y, x) FROM VALUES (null, 1) AS tab(y, x);
 NULL
> SELECT regr_syy(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x);
 0.6666666666666666
> SELECT regr_syy(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x);
 0.5

Since: 3.4.0

skewness ¶

skewness(expr) - Returns the skewness value calculated from values of a group.

Examples:

> SELECT skewness(col) FROM VALUES (-10), (-20), (100), (1000) AS tab(col);
 1.1135657469022011
> SELECT skewness(col) FROM VALUES (-1000), (-100), (10), (20) AS tab(col);
 -1.1135657469022011

Since: 1.6.0

some ¶

some(expr) - Returns true if at least one value of expr is true.

Examples:

> SELECT some(col) FROM VALUES (true), (false), (false) AS tab(col);
 true
> SELECT some(col) FROM VALUES (NULL), (true), (false) AS tab(col);
 true
> SELECT some(col) FROM VALUES (false), (false), (NULL) AS tab(col);
 false

Since: 3.0.0

std ¶

std(expr) - Returns the sample standard deviation calculated from values of a group.

Examples:

> SELECT std(col) FROM VALUES (1), (2), (3) AS tab(col);
 1.0

Since: 1.6.0

stddev ¶

stddev(expr) - Returns the sample standard deviation calculated from values of a group.

Examples:

> SELECT stddev(col) FROM VALUES (1), (2), (3) AS tab(col);
 1.0

Since: 1.6.0

stddev_pop ¶

stddev_pop(expr) - Returns the population standard deviation calculated from values of a group.

Examples:

> SELECT stddev_pop(col) FROM VALUES (1), (2), (3) AS tab(col);
 0.816496580927726

Since: 1.6.0

stddev_samp ¶

stddev_samp(expr) - Returns the sample standard deviation calculated from values of a group.

Examples:

> SELECT stddev_samp(col) FROM VALUES (1), (2), (3) AS tab(col);
 1.0

Since: 1.6.0

string_agg ¶

string_agg(expr[, delimiter])[ WITHIN GROUP (ORDER BY key [ASC | DESC] [,...])] - Returns the concatenation of non-NULL input values, separated by the delimiter ordered by key. If all values are NULL, NULL is returned.

Arguments:

expr - a string or binary expression to be concatenated.
delimiter - an optional string or binary foldable expression used to separate the input values. If NULL, the concatenation will be performed without a delimiter. Default is NULL.
key - an optional expression for ordering the input values. Multiple keys can be specified. If none are specified, the order of the rows in the result is non-deterministic.

Examples:

> SELECT string_agg(col) FROM VALUES ('a'), ('b'), ('c') AS tab(col);
 abc
> SELECT string_agg(col) WITHIN GROUP (ORDER BY col DESC) FROM VALUES ('a'), ('b'), ('c') AS tab(col);
 cba
> SELECT string_agg(col) FROM VALUES ('a'), (NULL), ('b') AS tab(col);
 ab
> SELECT string_agg(col) FROM VALUES ('a'), ('a') AS tab(col);
 aa
> SELECT string_agg(DISTINCT col) FROM VALUES ('a'), ('a'), ('b') AS tab(col);
 ab
> SELECT string_agg(col, ', ') FROM VALUES ('a'), ('b'), ('c') AS tab(col);
 a, b, c
> SELECT string_agg(col) FROM VALUES (NULL), (NULL) AS tab(col);
 NULL

Note:

If the order is not specified, the function is non-deterministic because the order of the rows may be non-deterministic after a shuffle.
If DISTINCT is specified, then expr and key must be the same expression.

Since: 4.0.0

sum ¶

sum(expr) - Returns the sum calculated from values of a group.

Examples:

> SELECT sum(col) FROM VALUES (5), (10), (15) AS tab(col);
 30
> SELECT sum(col) FROM VALUES (NULL), (10), (15) AS tab(col);
 25
> SELECT sum(col) FROM VALUES (NULL), (NULL) AS tab(col);
 NULL

Since: 1.0.0

theta_intersection_agg ¶

theta_intersection_agg(expr) - Returns the ThetaSketch's Compact binary representation by intersecting all the Theta sketches in the input column.

Examples:

> SELECT theta_sketch_estimate(theta_intersection_agg(sketch)) FROM (SELECT theta_sketch_agg(col) as sketch FROM VALUES (1) tab(col) UNION ALL SELECT theta_sketch_agg(col, 20) as sketch FROM VALUES (1) tab(col));
 1

Since: 4.1.0

theta_sketch_agg ¶

theta_sketch_agg(expr, lgNomEntries) - Returns the ThetaSketch compact binary representation. lgNomEntries (optional) is the log-base-2 of nominal entries, with nominal entries deciding the number buckets or slots for the ThetaSketch.

Examples:

> SELECT theta_sketch_estimate(theta_sketch_agg(col, 12)) FROM VALUES (1), (1), (2), (2), (3) tab(col);
 3

Since: 4.1.0

theta_union_agg ¶

theta_union_agg(expr, lgNomEntries) - Returns the ThetaSketch's Compact binary representation. lgNomEntries (optional) the log-base-2 of Nominal Entries, with Nominal Entries deciding the number buckets or slots for the ThetaSketch.

Examples:

> SELECT theta_sketch_estimate(theta_union_agg(sketch)) FROM (SELECT theta_sketch_agg(col) as sketch FROM VALUES (1) tab(col) UNION ALL SELECT theta_sketch_agg(col, 20) as sketch FROM VALUES (1) tab(col));
 1

Since: 4.1.0

try_avg ¶

try_avg(expr) - Returns the mean calculated from values of a group and the result is null on overflow.

Examples:

> SELECT try_avg(col) FROM VALUES (1), (2), (3) AS tab(col);
 2.0
> SELECT try_avg(col) FROM VALUES (1), (2), (NULL) AS tab(col);
 1.5
> SELECT try_avg(col) FROM VALUES (interval '2147483647 months'), (interval '1 months') AS tab(col);
 NULL

Since: 3.3.0

try_sum ¶

try_sum(expr) - Returns the sum calculated from values of a group and the result is null on overflow.

Examples:

> SELECT try_sum(col) FROM VALUES (5), (10), (15) AS tab(col);
 30
> SELECT try_sum(col) FROM VALUES (NULL), (10), (15) AS tab(col);
 25
> SELECT try_sum(col) FROM VALUES (NULL), (NULL) AS tab(col);
 NULL
> SELECT try_sum(col) FROM VALUES (9223372036854775807L), (1L) AS tab(col);
 NULL

Since: 3.3.0

tuple_intersection_agg_double ¶

tuple_intersection_agg_double(child, mode) - Returns the intersected TupleSketch compact binary representation. child should be a binary TupleSketch representation created with a double type summary. mode is the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone). Default is sum.

Examples:

> SELECT tuple_sketch_estimate_double(tuple_intersection_agg_double(sketch)) FROM (SELECT tuple_sketch_agg_double(key, summary) as sketch FROM VALUES (1, 5.0D), (2, 10.0D), (3, 15.0D) tab(key, summary) UNION ALL SELECT tuple_sketch_agg_double(key, summary) as sketch FROM VALUES (2, 3.0D), (3, 7.0D), (4, 12.0D) tab(key, summary));
 2.0

Since: 4.2.0

tuple_intersection_agg_integer ¶

tuple_intersection_agg_integer(child, mode) - Returns the intersected TupleSketch compact binary representation. child should be a binary TupleSketch representation created with an integer type summary. mode is the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone). Default is sum.

Examples:

> SELECT tuple_sketch_estimate_integer(tuple_intersection_agg_integer(sketch)) FROM (SELECT tuple_sketch_agg_integer(key, summary) as sketch FROM VALUES (1, 1), (2, 2), (3, 3) tab(key, summary) UNION ALL SELECT tuple_sketch_agg_integer(key, summary) as sketch FROM VALUES (2, 2), (3, 3), (4, 4) tab(key, summary));
 2.0

Since: 4.2.0

tuple_sketch_agg_double ¶

tuple_sketch_agg_double(key, summary, lgNomEntries, mode) - Returns the TupleSketch compact binary representation. key is the expression for unique value counting. summary is the double value to be aggregated. lgNomEntries is the log-base-2 of nominal entries, with nominal entries deciding the number buckets or slots for the TupleSketch. Default is 12. mode is the aggregation mode for numeric summaries (sum, min, max, alwaysone). Default is sum.

Examples:

> SELECT tuple_sketch_estimate_double(tuple_sketch_agg_double(key, summary, 12, 'sum')) FROM VALUES (1, 5.0D), (1, 1.0D), (2, 2.0D), (2, 3.0D), (3, 2.2D) tab(key, summary);
 3.0

Since: 4.2.0

tuple_sketch_agg_integer ¶

tuple_sketch_agg_integer(key, summary, lgNomEntries, mode) - Returns the TupleSketch compact binary representation. key is the expression for unique value counting. summary is the integer value to be aggregated. lgNomEntries is the log-base-2 of nominal entries, with nominal entries deciding the number buckets or slots for the TupleSketch. Default is 12. mode is the aggregation mode for numeric summaries (sum, min, max, alwaysone). Default is sum.

Examples:

> SELECT tuple_sketch_estimate_integer(tuple_sketch_agg_integer(key, summary, 12, 'sum')) FROM VALUES (1, 5), (1, 1), (2, 2), (2, 3), (3, 2) tab(key, summary);
 3.0

Since: 4.2.0

tuple_union_agg_double ¶

tuple_union_agg_double(child, lgNomEntries, mode) - Returns the unioned TupleSketch compact binary representation. child should be a binary TupleSketch representation created with a double type summary. lgNomEntries is the log-base-2 of nominal entries for the union operation. Default is 12. mode is the aggregation mode for numeric summaries during union (sum, min, max, alwaysone). Default is sum.

Examples:

> SELECT tuple_sketch_estimate_double(tuple_union_agg_double(sketch)) FROM (SELECT tuple_sketch_agg_double(key, summary) as sketch FROM VALUES (1, 5.0D), (2, 10.0D) tab(key, summary) UNION ALL SELECT tuple_sketch_agg_double(key, summary) as sketch FROM VALUES (2, 3.0D), (3, 7.0D) tab(key, summary));
 3.0

Since: 4.2.0

tuple_union_agg_integer ¶

tuple_union_agg_integer(child, lgNomEntries, mode) - Returns the unioned TupleSketch compact binary representation. child should be a binary TupleSketch representation created with an integer type summary. lgNomEntries is the log-base-2 of nominal entries for the union operation. Default is 12. mode is the aggregation mode for numeric summaries during union (sum, min, max, alwaysone). Default is sum.

Examples:

> SELECT tuple_sketch_estimate_integer(tuple_union_agg_integer(sketch)) FROM (SELECT tuple_sketch_agg_integer(key, summary) as sketch FROM VALUES (1, 5), (2, 10) tab(key, summary) UNION ALL SELECT tuple_sketch_agg_integer(key, summary) as sketch FROM VALUES (2, 3), (3, 7) tab(key, summary));
 3.0

Since: 4.2.0

var_pop ¶

var_pop(expr) - Returns the population variance calculated from values of a group.

Examples:

> SELECT var_pop(col) FROM VALUES (1), (2), (3) AS tab(col);
 0.6666666666666666

Since: 1.6.0

var_samp ¶

var_samp(expr) - Returns the sample variance calculated from values of a group.

Examples:

> SELECT var_samp(col) FROM VALUES (1), (2), (3) AS tab(col);
 1.0

Since: 1.6.0

variance ¶

variance(expr) - Returns the sample variance calculated from values of a group.

Examples:

> SELECT variance(col) FROM VALUES (1), (2), (3) AS tab(col);
 1.0

Since: 1.6.0