Agg Functions

This page lists all agg functions available in Spark SQL.


any

any(expr) - Returns true if at least one value of expr is true.

Examples:

> SELECT any(col) FROM VALUES (true), (false), (false) AS tab(col);
 true
> SELECT any(col) FROM VALUES (NULL), (true), (false) AS tab(col);
 true
> SELECT any(col) FROM VALUES (false), (false), (NULL) AS tab(col);
 false

Since: 3.0.0


any_value

any_value(expr[, isIgnoreNull]) - Returns some value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values.

Examples:

> SELECT any_value(col) FROM VALUES (10), (5), (20) AS tab(col);
 10
> SELECT any_value(col) FROM VALUES (NULL), (5), (20) AS tab(col);
 NULL
> SELECT any_value(col, true) FROM VALUES (NULL), (5), (20) AS tab(col);
 5

Note:

The function is non-deterministic.

Since: 3.4.0


approx_count_distinct

approx_count_distinct(expr[, relativeSD]) - Returns the estimated cardinality by HyperLogLog++. relativeSD defines the maximum relative standard deviation allowed.

Examples:

> SELECT approx_count_distinct(col1) FROM VALUES (1), (1), (2), (2), (3) tab(col1);
 3

Since: 1.6.0


approx_percentile

approx_percentile(col, percentage [, accuracy]) - Returns the approximate percentile of the numeric or ansi interval column col which is the smallest value in the ordered col values (sorted from least to greatest) such that no more than percentage of col values is less than the value or equal to that value. The value of percentage must be between 0.0 and 1.0. The accuracy parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of accuracy yields better accuracy, 1.0/accuracy is the relative error of the approximation. When percentage is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column col at the given percentage array.

Examples:

> SELECT approx_percentile(col, array(0.5, 0.4, 0.1), 100) FROM VALUES (0), (1), (2), (10) AS tab(col);
 [1,1,0]
> SELECT approx_percentile(col, 0.5, 100) FROM VALUES (0), (6), (7), (9), (10) AS tab(col);
 7
> SELECT approx_percentile(col, 0.5, 100) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '1' MONTH), (INTERVAL '2' MONTH), (INTERVAL '10' MONTH) AS tab(col);
 0-1
> SELECT approx_percentile(col, array(0.5, 0.7), 100) FROM VALUES (INTERVAL '0' SECOND), (INTERVAL '1' SECOND), (INTERVAL '2' SECOND), (INTERVAL '10' SECOND) AS tab(col);
 [0 00:00:01.000000000,0 00:00:02.000000000]

Since: 2.1.0


approx_top_k

approx_top_k(expr, k, maxItemsTracked) - Returns top k items with their frequency. k An optional INTEGER literal greater than 0. If k is not specified, it defaults to 5. maxItemsTracked An optional INTEGER literal greater than or equal to k and has upper limit of 1000000. If maxItemsTracked is not specified, it defaults to 10000.

Examples:

> SELECT approx_top_k(expr) FROM VALUES (0), (0), (1), (1), (2), (3), (4), (4) AS tab(expr);
 [{"item":0,"count":2},{"item":4,"count":2},{"item":1,"count":2},{"item":2,"count":1},{"item":3,"count":1}]

> SELECT approx_top_k(expr, 2) FROM VALUES 'a', 'b', 'c', 'c', 'c', 'c', 'd', 'd' AS tab(expr);
 [{"item":"c","count":4},{"item":"d","count":2}]

> SELECT approx_top_k(expr, 10, 100) FROM VALUES (0), (1), (1), (2), (2), (2) AS tab(expr);
 [{"item":2,"count":3},{"item":1,"count":2},{"item":0,"count":1}]

Since: 4.1.0


approx_top_k_accumulate

approx_top_k_accumulate(expr, maxItemsTracked) - Accumulates items into a sketch. maxItemsTracked An optional positive INTEGER literal with upper limit of 1000000. If maxItemsTracked is not specified, it defaults to 10000.

Examples:

> SELECT approx_top_k_estimate(approx_top_k_accumulate(expr)) FROM VALUES (0), (0), (1), (1), (2), (3), (4), (4) AS tab(expr);
 [{"item":0,"count":2},{"item":4,"count":2},{"item":1,"count":2},{"item":2,"count":1},{"item":3,"count":1}]

> SELECT approx_top_k_estimate(approx_top_k_accumulate(expr, 100), 2) FROM VALUES 'a', 'b', 'c', 'c', 'c', 'c', 'd', 'd' AS tab(expr);
 [{"item":"c","count":4},{"item":"d","count":2}]

Since: 4.1.0


approx_top_k_combine

approx_top_k_combine(state, maxItemsTracked) - Combines multiple sketches into a single sketch. maxItemsTracked An optional positive INTEGER literal with upper limit of 1000000. If maxItemsTracked is specified, it will be set for the combined sketch. If maxItemsTracked is not specified, the input sketches must have the same maxItemsTracked value, otherwise an error will be thrown. The output sketch will use the same value from the input sketches.

Examples:

> SELECT approx_top_k_estimate(approx_top_k_combine(sketch, 10000), 5) FROM (SELECT approx_top_k_accumulate(expr) AS sketch FROM VALUES (0), (0), (1), (1) AS tab(expr) UNION ALL SELECT approx_top_k_accumulate(expr) AS sketch FROM VALUES (2), (3), (4), (4) AS tab(expr));
 [{"item":0,"count":2},{"item":4,"count":2},{"item":1,"count":2},{"item":2,"count":1},{"item":3,"count":1}]

Since: 4.1.0


array_agg

array_agg(expr) - Collects and returns a list of non-unique elements.

Examples:

> SELECT array_agg(col) FROM VALUES (1), (2), (1) AS tab(col);
 [1,2,1]

Note:

The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.

Since: 3.3.0


avg

avg(expr) - Returns the mean calculated from values of a group.

Examples:

> SELECT avg(col) FROM VALUES (1), (2), (3) AS tab(col);
 2.0
> SELECT avg(col) FROM VALUES (1), (2), (NULL) AS tab(col);
 1.5

Since: 1.0.0


bit_and

bit_and(expr) - Returns the bitwise AND of all non-null input values, or null if none.

Examples:

> SELECT bit_and(col) FROM VALUES (3), (5) AS tab(col);
 1

Since: 3.0.0


bit_or

bit_or(expr) - Returns the bitwise OR of all non-null input values, or null if none.

Examples:

> SELECT bit_or(col) FROM VALUES (3), (5) AS tab(col);
 7

Since: 3.0.0


bit_xor

bit_xor(expr) - Returns the bitwise XOR of all non-null input values, or null if none.

Examples:

> SELECT bit_xor(col) FROM VALUES (3), (5) AS tab(col);
 6

Since: 3.0.0


bitmap_and_agg

bitmap_and_agg(child) - Returns a bitmap that is the bitwise AND of all of the bitmaps from the child expression. The input should be bitmaps created from bitmap_construct_agg().

Examples:

> SELECT substring(hex(bitmap_and_agg(col)), 0, 6) FROM VALUES (X 'F0'), (X '70'), (X '30') AS tab(col);
 300000
> SELECT substring(hex(bitmap_and_agg(col)), 0, 6) FROM VALUES (X 'FF'), (X 'FF'), (X 'FF') AS tab(col);
 FF0000

Since: 4.1.0


bitmap_construct_agg

bitmap_construct_agg(child) - Returns a bitmap with the positions of the bits set from all the values from the child expression. The child expression will most likely be bitmap_bit_position().

Examples:

> SELECT substring(hex(bitmap_construct_agg(bitmap_bit_position(col))), 0, 6) FROM VALUES (1), (2), (3) AS tab(col);
 070000
> SELECT substring(hex(bitmap_construct_agg(bitmap_bit_position(col))), 0, 6) FROM VALUES (1), (1), (1) AS tab(col);
 010000

Since: 3.5.0


bitmap_or_agg

bitmap_or_agg(child) - Returns a bitmap that is the bitwise OR of all of the bitmaps from the child expression. The input should be bitmaps created from bitmap_construct_agg().

Examples:

> SELECT substring(hex(bitmap_or_agg(col)), 0, 6) FROM VALUES (X '10'), (X '20'), (X '40') AS tab(col);
 700000
> SELECT substring(hex(bitmap_or_agg(col)), 0, 6) FROM VALUES (X '10'), (X '10'), (X '10') AS tab(col);
 100000

Since: 3.5.0


bool_and

bool_and(expr) - Returns true if all values of expr are true.

Examples:

> SELECT bool_and(col) FROM VALUES (true), (true), (true) AS tab(col);
 true
> SELECT bool_and(col) FROM VALUES (NULL), (true), (true) AS tab(col);
 true
> SELECT bool_and(col) FROM VALUES (true), (false), (true) AS tab(col);
 false

Since: 3.0.0


bool_or

bool_or(expr) - Returns true if at least one value of expr is true.

Examples:

> SELECT bool_or(col) FROM VALUES (true), (false), (false) AS tab(col);
 true
> SELECT bool_or(col) FROM VALUES (NULL), (true), (false) AS tab(col);
 true
> SELECT bool_or(col) FROM VALUES (false), (false), (NULL) AS tab(col);
 false

Since: 3.0.0


collect_list

collect_list(expr) - Collects and returns a list of non-unique elements.

Examples:

> SELECT collect_list(col) FROM VALUES (1), (2), (1) AS tab(col);
 [1,2,1]

Note:

The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.

Since: 2.0.0


collect_set

collect_set(expr) - Collects and returns a set of unique elements.

Examples:

> SELECT collect_set(col) FROM VALUES (1), (2), (1) AS tab(col);
 [1,2]

Note:

The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.

Since: 2.0.0


corr

corr(expr1, expr2) - Returns Pearson coefficient of correlation between a set of number pairs.

Examples:

> SELECT corr(c1, c2) FROM VALUES (3, 2), (3, 3), (6, 4) as tab(c1, c2);
 0.8660254037844387

Since: 1.6.0


count

count(*) - Returns the total number of retrieved rows, including rows containing null.

count(expr[, expr...]) - Returns the number of rows for which the supplied expression(s) are all non-null.

count(DISTINCT expr[, expr...]) - Returns the number of rows for which the supplied expression(s) are unique and non-null.

Examples:

> SELECT count(*) FROM VALUES (NULL), (5), (5), (20) AS tab(col);
 4
> SELECT count(col) FROM VALUES (NULL), (5), (5), (20) AS tab(col);
 3
> SELECT count(DISTINCT col) FROM VALUES (NULL), (5), (5), (10) AS tab(col);
 2

Since: 1.0.0


count_if

count_if(expr) - Returns the number of TRUE values for the expression.

Examples:

> SELECT count_if(col % 2 = 0) FROM VALUES (NULL), (0), (1), (2), (3) AS tab(col);
 2
> SELECT count_if(col IS NULL) FROM VALUES (NULL), (0), (1), (2), (3) AS tab(col);
 1

Since: 3.0.0


count_min_sketch

count_min_sketch(col, eps, confidence, seed) - Returns a count-min sketch of a column with the given esp, confidence and seed. The result is an array of bytes, which can be deserialized to a CountMinSketch before usage. Count-min sketch is a probabilistic data structure used for cardinality estimation using sub-linear space.

Examples:

> SELECT hex(count_min_sketch(col, 0.5d, 0.5d, 1)) FROM VALUES (1), (2), (1) AS tab(col);
 0000000100000000000000030000000100000004000000005D8D6AB90000000000000000000000000000000200000000000000010000000000000000

Since: 2.2.0


covar_pop

covar_pop(expr1, expr2) - Returns the population covariance of a set of number pairs.

Examples:

> SELECT covar_pop(c1, c2) FROM VALUES (1,1), (2,2), (3,3) AS tab(c1, c2);
 0.6666666666666666

Since: 2.0.0


covar_samp

covar_samp(expr1, expr2) - Returns the sample covariance of a set of number pairs.

Examples:

> SELECT covar_samp(c1, c2) FROM VALUES (1,1), (2,2), (3,3) AS tab(c1, c2);
 1.0

Since: 2.0.0


every

every(expr) - Returns true if all values of expr are true.

Examples:

> SELECT every(col) FROM VALUES (true), (true), (true) AS tab(col);
 true
> SELECT every(col) FROM VALUES (NULL), (true), (true) AS tab(col);
 true
> SELECT every(col) FROM VALUES (true), (false), (true) AS tab(col);
 false

Since: 3.0.0


first

first(expr[, isIgnoreNull]) - Returns the first value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values.

Examples:

> SELECT first(col) FROM VALUES (10), (5), (20) AS tab(col);
 10
> SELECT first(col) FROM VALUES (NULL), (5), (20) AS tab(col);
 NULL
> SELECT first(col, true) FROM VALUES (NULL), (5), (20) AS tab(col);
 5

Note:

The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.

Since: 2.0.0


first_value

first_value(expr[, isIgnoreNull]) - Returns the first value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values.

Examples:

> SELECT first_value(col) FROM VALUES (10), (5), (20) AS tab(col);
 10
> SELECT first_value(col) FROM VALUES (NULL), (5), (20) AS tab(col);
 NULL
> SELECT first_value(col, true) FROM VALUES (NULL), (5), (20) AS tab(col);
 5

Note:

The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.

Since: 2.0.0


grouping

grouping(col) - indicates whether a specified column in a GROUP BY is aggregated or not, returns 1 for aggregated or 0 for not aggregated in the result set.",

Examples:

> SELECT name, grouping(name), sum(age) FROM VALUES (2, 'Alice'), (5, 'Bob') people(age, name) GROUP BY cube(name);
  Alice 0   2
  Bob   0   5
  NULL  1   7

Since: 2.0.0


grouping_id

grouping_id([col1[, col2 ..]]) - returns the level of grouping, equals to (grouping(c1) << (n-1)) + (grouping(c2) << (n-2)) + ... + grouping(cn)

Examples:

> SELECT name, grouping_id(), sum(age), avg(height) FROM VALUES (2, 'Alice', 165), (5, 'Bob', 180) people(age, name, height) GROUP BY cube(name, height);
  Alice 0   2   165.0
  Alice 1   2   165.0
  NULL  3   7   172.5
  Bob   0   5   180.0
  Bob   1   5   180.0
  NULL  2   2   165.0
  NULL  2   5   180.0

Note:

Input columns should match with grouping columns exactly, or empty (means all the grouping columns).

Since: 2.0.0


histogram_numeric

histogram_numeric(expr, nb) - Computes a histogram on numeric 'expr' using nb bins. The return value is an array of (x,y) pairs representing the centers of the histogram's bins. As the value of 'nb' is increased, the histogram approximation gets finer-grained, but may yield artifacts around outliers. In practice, 20-40 histogram bins appear to work well, with more bins being required for skewed or smaller datasets. Note that this function creates a histogram with non-uniform bin widths. It offers no guarantees in terms of the mean-squared-error of the histogram, but in practice is comparable to the histograms produced by the R/S-Plus statistical computing packages. Note: the output type of the 'x' field in the return value is propagated from the input value consumed in the aggregate function.

Examples:

> SELECT histogram_numeric(col, 5) FROM VALUES (0), (1), (2), (10) AS tab(col);
 [{"x":0,"y":1.0},{"x":1,"y":1.0},{"x":2,"y":1.0},{"x":10,"y":1.0}]

Since: 3.3.0


hll_sketch_agg

hll_sketch_agg(expr, lgConfigK) - Returns the HllSketch's updatable binary representation. lgConfigK (optional) the log-base-2 of K, with K is the number of buckets or slots for the HllSketch.

Examples:

> SELECT hll_sketch_estimate(hll_sketch_agg(col, 12)) FROM VALUES (1), (1), (2), (2), (3) tab(col);
 3

Since: 3.5.0


hll_union_agg

hll_union_agg(expr, allowDifferentLgConfigK) - Returns the estimated number of unique values. allowDifferentLgConfigK (optional) Allow sketches with different lgConfigK values to be unioned (defaults to false).

Examples:

> SELECT hll_sketch_estimate(hll_union_agg(sketch, true)) FROM (SELECT hll_sketch_agg(col) as sketch FROM VALUES (1) tab(col) UNION ALL SELECT hll_sketch_agg(col, 20) as sketch FROM VALUES (1) tab(col));
 1

Since: 3.5.0


kll_merge_agg_bigint

kll_merge_agg_bigint(expr[, k]) - Merges binary KllLongsSketch representations and returns the merged sketch. The input expression should contain binary sketch representations (e.g., from kll_sketch_agg_bigint). The optional k parameter controls the size and accuracy of the merged sketch (range 8-65535). If k is not specified, the merged sketch adopts the k value from the first input sketch.

Examples:

> SELECT kll_sketch_get_n_bigint(kll_merge_agg_bigint(sketch)) FROM (SELECT kll_sketch_agg_bigint(col) as sketch FROM VALUES (1), (2), (3) tab(col) UNION ALL SELECT kll_sketch_agg_bigint(col) as sketch FROM VALUES (4), (5), (6) tab(col)) t;
 6

Since: 4.1.0


kll_merge_agg_double

kll_merge_agg_double(expr[, k]) - Merges binary KllDoublesSketch representations and returns the merged sketch. The input expression should contain binary sketch representations (e.g., from kll_sketch_agg_double). The optional k parameter controls the size and accuracy of the merged sketch (range 8-65535). If k is not specified, the merged sketch adopts the k value from the first input sketch.

Examples:

> SELECT kll_sketch_get_n_double(kll_merge_agg_double(sketch)) FROM (SELECT kll_sketch_agg_double(col) as sketch FROM VALUES (CAST(1.0 AS DOUBLE)), (CAST(2.0 AS DOUBLE)), (CAST(3.0 AS DOUBLE)) tab(col) UNION ALL SELECT kll_sketch_agg_double(col) as sketch FROM VALUES (CAST(4.0 AS DOUBLE)), (CAST(5.0 AS DOUBLE)), (CAST(6.0 AS DOUBLE)) tab(col)) t;
 6

Since: 4.1.0


kll_merge_agg_float

kll_merge_agg_float(expr[, k]) - Merges binary KllFloatsSketch representations and returns the merged sketch. The input expression should contain binary sketch representations (e.g., from kll_sketch_agg_float). The optional k parameter controls the size and accuracy of the merged sketch (range 8-65535). If k is not specified, the merged sketch adopts the k value from the first input sketch.

Examples:

> SELECT kll_sketch_get_n_float(kll_merge_agg_float(sketch)) FROM (SELECT kll_sketch_agg_float(col) as sketch FROM VALUES (CAST(1.0 AS FLOAT)), (CAST(2.0 AS FLOAT)), (CAST(3.0 AS FLOAT)) tab(col) UNION ALL SELECT kll_sketch_agg_float(col) as sketch FROM VALUES (CAST(4.0 AS FLOAT)), (CAST(5.0 AS FLOAT)), (CAST(6.0 AS FLOAT)) tab(col)) t;
 6

Since: 4.1.0


kll_sketch_agg_bigint

kll_sketch_agg_bigint(expr[, k]) - Returns the KllLongsSketch compact binary representation. The optional k parameter controls the size and accuracy of the sketch (default 200, range 8-65535). Larger k values provide more accurate quantile estimates but result in larger, slower sketches.

Examples:

> SELECT LENGTH(kll_sketch_to_string_bigint(kll_sketch_agg_bigint(col))) > 0 FROM VALUES (1), (2), (3), (4), (5) tab(col);
 true
> SELECT LENGTH(kll_sketch_to_string_bigint(kll_sketch_agg_bigint(col, 400))) > 0 FROM VALUES (1), (2), (3), (4), (5) tab(col);
 true

Since: 4.1.0


kll_sketch_agg_double

kll_sketch_agg_double(expr[, k]) - Returns the KllDoublesSketch compact binary representation. The optional k parameter controls the size and accuracy of the sketch (default 200, range 8-65535). Larger k values provide more accurate quantile estimates but result in larger, slower sketches.

Examples:

> SELECT LENGTH(kll_sketch_to_string_double(kll_sketch_agg_double(col))) > 0 FROM VALUES (CAST(1.0 AS DOUBLE)), (CAST(2.0 AS DOUBLE)), (CAST(3.0 AS DOUBLE)), (CAST(4.0 AS DOUBLE)), (CAST(5.0 AS DOUBLE)) tab(col);
 true
> SELECT LENGTH(kll_sketch_to_string_double(kll_sketch_agg_double(col, 400))) > 0 FROM VALUES (CAST(1.0 AS DOUBLE)), (CAST(2.0 AS DOUBLE)), (CAST(3.0 AS DOUBLE)), (CAST(4.0 AS DOUBLE)), (CAST(5.0 AS DOUBLE)) tab(col);
 true

Since: 4.1.0


kll_sketch_agg_float

kll_sketch_agg_float(expr[, k]) - Returns the KllFloatsSketch compact binary representation. The optional k parameter controls the size and accuracy of the sketch (default 200, range 8-65535). Larger k values provide more accurate quantile estimates but result in larger, slower sketches.

Examples:

> SELECT LENGTH(kll_sketch_to_string_float(kll_sketch_agg_float(col))) > 0 FROM VALUES (CAST(1.0 AS FLOAT)), (CAST(2.0 AS FLOAT)), (CAST(3.0 AS FLOAT)), (CAST(4.0 AS FLOAT)), (CAST(5.0 AS FLOAT)) tab(col);
 true
> SELECT LENGTH(kll_sketch_to_string_float(kll_sketch_agg_float(col, 400))) > 0 FROM VALUES (CAST(1.0 AS FLOAT)), (CAST(2.0 AS FLOAT)), (CAST(3.0 AS FLOAT)), (CAST(4.0 AS FLOAT)), (CAST(5.0 AS FLOAT)) tab(col);
 true

Since: 4.1.0


kurtosis

kurtosis(expr) - Returns the kurtosis value calculated from values of a group.

Examples:

> SELECT kurtosis(col) FROM VALUES (-10), (-20), (100), (1000) AS tab(col);
 -0.7014368047529627
> SELECT kurtosis(col) FROM VALUES (1), (10), (100), (10), (1) as tab(col);
 0.19432323191699075

Since: 1.6.0


last

last(expr[, isIgnoreNull]) - Returns the last value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values

Examples:

> SELECT last(col) FROM VALUES (10), (5), (20) AS tab(col);
 20
> SELECT last(col) FROM VALUES (10), (5), (NULL) AS tab(col);
 NULL
> SELECT last(col, true) FROM VALUES (10), (5), (NULL) AS tab(col);
 5

Note:

The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.

Since: 2.0.0


last_value

last_value(expr[, isIgnoreNull]) - Returns the last value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values

Examples:

> SELECT last_value(col) FROM VALUES (10), (5), (20) AS tab(col);
 20
> SELECT last_value(col) FROM VALUES (10), (5), (NULL) AS tab(col);
 NULL
> SELECT last_value(col, true) FROM VALUES (10), (5), (NULL) AS tab(col);
 5

Note:

The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.

Since: 2.0.0


listagg

listagg(expr[, delimiter])[ WITHIN GROUP (ORDER BY key [ASC | DESC] [,...])] - Returns the concatenation of non-NULL input values, separated by the delimiter ordered by key. If all values are NULL, NULL is returned.

Arguments:

  • expr - a string or binary expression to be concatenated.
  • delimiter - an optional string or binary foldable expression used to separate the input values. If NULL, the concatenation will be performed without a delimiter. Default is NULL.
  • key - an optional expression for ordering the input values. Multiple keys can be specified. If none are specified, the order of the rows in the result is non-deterministic.

Examples:

> SELECT listagg(col) FROM VALUES ('a'), ('b'), ('c') AS tab(col);
 abc
> SELECT listagg(col) WITHIN GROUP (ORDER BY col DESC) FROM VALUES ('a'), ('b'), ('c') AS tab(col);
 cba
> SELECT listagg(col) FROM VALUES ('a'), (NULL), ('b') AS tab(col);
 ab
> SELECT listagg(col) FROM VALUES ('a'), ('a') AS tab(col);
 aa
> SELECT listagg(DISTINCT col) FROM VALUES ('a'), ('a'), ('b') AS tab(col);
 ab
> SELECT listagg(col, ', ') FROM VALUES ('a'), ('b'), ('c') AS tab(col);
 a, b, c
> SELECT listagg(col) FROM VALUES (NULL), (NULL) AS tab(col);
 NULL

Note:

  • If the order is not specified, the function is non-deterministic because the order of the rows may be non-deterministic after a shuffle.
  • If DISTINCT is specified, then expr and key must be the same expression.

Since: 4.0.0


max

max(expr) - Returns the maximum value of expr.

Examples:

> SELECT max(col) FROM VALUES (10), (50), (20) AS tab(col);
 50

Since: 1.0.0


max_by

max_by(x, y) - Returns the value of x associated with the maximum value of y.

Examples:

> SELECT max_by(x, y) FROM VALUES ('a', 10), ('b', 50), ('c', 20) AS tab(x, y);
 b

Note:

The function is non-deterministic so the output order can be different for those associated the same values of x.

Since: 3.0.0


mean

mean(expr) - Returns the mean calculated from values of a group.

Examples:

> SELECT mean(col) FROM VALUES (1), (2), (3) AS tab(col);
 2.0
> SELECT mean(col) FROM VALUES (1), (2), (NULL) AS tab(col);
 1.5

Since: 1.0.0


measure

measure(expr) - this function is used and can only be used to calculate a measure defined in a metric view.

Examples:

> SELECT dimension_col, measure(measure_col)
  FROM test_metric_view
  GROUP BY dimension_col;
dim_1, 100
dim_2, 200

Since: 4.2.0


median

median(col) - Returns the median of numeric or ANSI interval column col.

Examples:

> SELECT median(col) FROM VALUES (0), (10) AS tab(col);
 5.0
> SELECT median(col) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH) AS tab(col);
 0-5

Since: 3.4.0


min

min(expr) - Returns the minimum value of expr.

Examples:

> SELECT min(col) FROM VALUES (10), (-1), (20) AS tab(col);
 -1

Since: 1.0.0


min_by

min_by(x, y) - Returns the value of x associated with the minimum value of y.

Examples:

> SELECT min_by(x, y) FROM VALUES ('a', 10), ('b', 50), ('c', 20) AS tab(x, y);
 a

Note:

The function is non-deterministic so the output order can be different for those associated the same values of x.

Since: 3.0.0


mode

mode(col[, deterministic]) - Returns the most frequent value for the values within col. NULL values are ignored. If all the values are NULL, or there are 0 rows, returns NULL. When multiple values have the same greatest frequency then either any of values is returned if deterministic is false or is not defined, or the lowest value is returned if deterministic is true. mode() WITHIN GROUP (ORDER BY col) - Returns the most frequent value for the values within col (specified in ORDER BY clause). NULL values are ignored. If all the values are NULL, or there are 0 rows, returns NULL. When multiple values have the same greatest frequency only one value will be returned. The value will be chosen based on sort direction. Return the smallest value if sort direction is asc or the largest value if sort direction is desc from multiple values with the same frequency.

Examples:

> SELECT mode(col) FROM VALUES (0), (10), (10) AS tab(col);
 10
> SELECT mode(col) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH), (INTERVAL '10' MONTH) AS tab(col);
 0-10
> SELECT mode(col) FROM VALUES (0), (10), (10), (null), (null), (null) AS tab(col);
 10
> SELECT mode(col, false) FROM VALUES (-10), (0), (10) AS tab(col);
 0
> SELECT mode(col, true) FROM VALUES (-10), (0), (10) AS tab(col);
 -10
> SELECT mode() WITHIN GROUP (ORDER BY col) FROM VALUES (0), (10), (10) AS tab(col);
 10
> SELECT mode() WITHIN GROUP (ORDER BY col) FROM VALUES (0), (10), (10), (20), (20) AS tab(col);
 10
> SELECT mode() WITHIN GROUP (ORDER BY col DESC) FROM VALUES (0), (10), (10), (20), (20) AS tab(col);
 20

Since: 3.4.0


percentile

percentile(col, percentage [, frequency]) - Returns the exact percentile value of numeric or ANSI interval column col at the given percentage. The value of percentage must be between 0.0 and 1.0. The value of frequency should be positive integral

percentile(col, array(percentage1 [, percentage2]...) [, frequency]) - Returns the exact percentile value array of numeric column col at the given percentage(s). Each value of the percentage array must be between 0.0 and 1.0. The value of frequency should be positive integral

Examples:

> SELECT percentile(col, 0.3) FROM VALUES (0), (10) AS tab(col);
 3.0
> SELECT percentile(col, array(0.25, 0.75)) FROM VALUES (0), (10) AS tab(col);
 [2.5,7.5]
> SELECT percentile(col, 0.5) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH) AS tab(col);
 0-5
> SELECT percentile(col, array(0.2, 0.5)) FROM VALUES (INTERVAL '0' SECOND), (INTERVAL '10' SECOND) AS tab(col);
 [0 00:00:02.000000000,0 00:00:05.000000000]

Since: 2.1.0


percentile_approx

percentile_approx(col, percentage [, accuracy]) - Returns the approximate percentile of the numeric or ansi interval column col which is the smallest value in the ordered col values (sorted from least to greatest) such that no more than percentage of col values is less than the value or equal to that value. The value of percentage must be between 0.0 and 1.0. The accuracy parameter (default: 10000) is a positive numeric literal which controls approximation accuracy at the cost of memory. Higher value of accuracy yields better accuracy, 1.0/accuracy is the relative error of the approximation. When percentage is an array, each value of the percentage array must be between 0.0 and 1.0. In this case, returns the approximate percentile array of column col at the given percentage array.

Examples:

> SELECT percentile_approx(col, array(0.5, 0.4, 0.1), 100) FROM VALUES (0), (1), (2), (10) AS tab(col);
 [1,1,0]
> SELECT percentile_approx(col, 0.5, 100) FROM VALUES (0), (6), (7), (9), (10) AS tab(col);
 7
> SELECT percentile_approx(col, 0.5, 100) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '1' MONTH), (INTERVAL '2' MONTH), (INTERVAL '10' MONTH) AS tab(col);
 0-1
> SELECT percentile_approx(col, array(0.5, 0.7), 100) FROM VALUES (INTERVAL '0' SECOND), (INTERVAL '1' SECOND), (INTERVAL '2' SECOND), (INTERVAL '10' SECOND) AS tab(col);
 [0 00:00:01.000000000,0 00:00:02.000000000]

Since: 2.1.0


percentile_cont

percentile_cont(percentage) WITHIN GROUP (ORDER BY col) - Return a percentile value based on a continuous distribution of numeric or ANSI interval column col at the given percentage (specified in ORDER BY clause).

Examples:

> SELECT percentile_cont(0.25) WITHIN GROUP (ORDER BY col) FROM VALUES (0), (10) AS tab(col);
 2.5
> SELECT percentile_cont(0.25) WITHIN GROUP (ORDER BY col) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH) AS tab(col);
 0-2

Since: 4.0.0


percentile_disc

percentile_disc(percentage) WITHIN GROUP (ORDER BY col) - Return a percentile value based on a discrete distribution of numeric or ANSI interval column col at the given percentage (specified in ORDER BY clause).

Examples:

> SELECT percentile_disc(0.25) WITHIN GROUP (ORDER BY col) FROM VALUES (0), (10) AS tab(col);
 0.0
> SELECT percentile_disc(0.25) WITHIN GROUP (ORDER BY col) FROM VALUES (INTERVAL '0' MONTH), (INTERVAL '10' MONTH) AS tab(col);
 0-0

Since: 4.0.0


regr_avgx

regr_avgx(y, x) - Returns the average of the independent variable for non-null pairs in a group, where y is the dependent variable and x is the independent variable.

Examples:

> SELECT regr_avgx(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x);
 2.75
> SELECT regr_avgx(y, x) FROM VALUES (1, null) AS tab(y, x);
 NULL
> SELECT regr_avgx(y, x) FROM VALUES (null, 1) AS tab(y, x);
 NULL
> SELECT regr_avgx(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x);
 3.0
> SELECT regr_avgx(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x);
 3.0

Since: 3.3.0


regr_avgy

regr_avgy(y, x) - Returns the average of the dependent variable for non-null pairs in a group, where y is the dependent variable and x is the independent variable.

Examples:

> SELECT regr_avgy(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x);
 1.75
> SELECT regr_avgy(y, x) FROM VALUES (1, null) AS tab(y, x);
 NULL
> SELECT regr_avgy(y, x) FROM VALUES (null, 1) AS tab(y, x);
 NULL
> SELECT regr_avgy(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x);
 1.6666666666666667
> SELECT regr_avgy(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x);
 1.5

Since: 3.3.0


regr_count

regr_count(y, x) - Returns the number of non-null number pairs in a group, where y is the dependent variable and x is the independent variable.

Examples:

> SELECT regr_count(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x);
 4
> SELECT regr_count(y, x) FROM VALUES (1, null) AS tab(y, x);
 0
> SELECT regr_count(y, x) FROM VALUES (null, 1) AS tab(y, x);
 0
> SELECT regr_count(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x);
 3
> SELECT regr_count(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x);
 2

Since: 3.3.0


regr_intercept

regr_intercept(y, x) - Returns the intercept of the univariate linear regression line for non-null pairs in a group, where y is the dependent variable and x is the independent variable.

Examples:

> SELECT regr_intercept(y, x) FROM VALUES (1, 1), (2, 2), (3, 3), (4, 4) AS tab(y, x);
 0.0
> SELECT regr_intercept(y, x) FROM VALUES (1, null) AS tab(y, x);
 NULL
> SELECT regr_intercept(y, x) FROM VALUES (null, 1) AS tab(y, x);
 NULL
> SELECT regr_intercept(y, x) FROM VALUES (1, 1), (2, null), (3, 3), (4, 4) AS tab(y, x);
 0.0
> SELECT regr_intercept(y, x) FROM VALUES (1, 1), (2, null), (null, 3), (4, 4) AS tab(y, x);
 0.0

Since: 3.4.0


regr_r2

regr_r2(y, x) - Returns the coefficient of determination for non-null pairs in a group, where y is the dependent variable and x is the independent variable.

Examples:

> SELECT regr_r2(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x);
 0.2727272727272727
> SELECT regr_r2(y, x) FROM VALUES (1, null) AS tab(y, x);
 NULL
> SELECT regr_r2(y, x) FROM VALUES (null, 1) AS tab(y, x);
 NULL
> SELECT regr_r2(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x);
 0.7500000000000001
> SELECT regr_r2(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x);
 1.0

Since: 3.3.0


regr_slope

regr_slope(y, x) - Returns the slope of the linear regression line for non-null pairs in a group, where y is the dependent variable and x is the independent variable.

Examples:

> SELECT regr_slope(y, x) FROM VALUES (1, 1), (2, 2), (3, 3), (4, 4) AS tab(y, x);
 1.0
> SELECT regr_slope(y, x) FROM VALUES (1, null) AS tab(y, x);
 NULL
> SELECT regr_slope(y, x) FROM VALUES (null, 1) AS tab(y, x);
 NULL
> SELECT regr_slope(y, x) FROM VALUES (1, 1), (2, null), (3, 3), (4, 4) AS tab(y, x);
 1.0
> SELECT regr_slope(y, x) FROM VALUES (1, 1), (2, null), (null, 3), (4, 4) AS tab(y, x);
 1.0

Since: 3.4.0


regr_sxx

regr_sxx(y, x) - Returns REGR_COUNT(y, x) * VAR_POP(x) for non-null pairs in a group, where y is the dependent variable and x is the independent variable.

Examples:

> SELECT regr_sxx(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x);
 2.75
> SELECT regr_sxx(y, x) FROM VALUES (1, null) AS tab(y, x);
 NULL
> SELECT regr_sxx(y, x) FROM VALUES (null, 1) AS tab(y, x);
 NULL
> SELECT regr_sxx(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x);
 2.0
> SELECT regr_sxx(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x);
 2.0

Since: 3.4.0


regr_sxy

regr_sxy(y, x) - Returns REGR_COUNT(y, x) * COVAR_POP(y, x) for non-null pairs in a group, where y is the dependent variable and x is the independent variable.

Examples:

> SELECT regr_sxy(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x);
 0.75
> SELECT regr_sxy(y, x) FROM VALUES (1, null) AS tab(y, x);
 NULL
> SELECT regr_sxy(y, x) FROM VALUES (null, 1) AS tab(y, x);
 NULL
> SELECT regr_sxy(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x);
 1.0
> SELECT regr_sxy(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x);
 1.0

Since: 3.4.0


regr_syy

regr_syy(y, x) - Returns REGR_COUNT(y, x) * VAR_POP(y) for non-null pairs in a group, where y is the dependent variable and x is the independent variable.

Examples:

> SELECT regr_syy(y, x) FROM VALUES (1, 2), (2, 2), (2, 3), (2, 4) AS tab(y, x);
 0.75
> SELECT regr_syy(y, x) FROM VALUES (1, null) AS tab(y, x);
 NULL
> SELECT regr_syy(y, x) FROM VALUES (null, 1) AS tab(y, x);
 NULL
> SELECT regr_syy(y, x) FROM VALUES (1, 2), (2, null), (2, 3), (2, 4) AS tab(y, x);
 0.6666666666666666
> SELECT regr_syy(y, x) FROM VALUES (1, 2), (2, null), (null, 3), (2, 4) AS tab(y, x);
 0.5

Since: 3.4.0


skewness

skewness(expr) - Returns the skewness value calculated from values of a group.

Examples:

> SELECT skewness(col) FROM VALUES (-10), (-20), (100), (1000) AS tab(col);
 1.1135657469022011
> SELECT skewness(col) FROM VALUES (-1000), (-100), (10), (20) AS tab(col);
 -1.1135657469022011

Since: 1.6.0


some

some(expr) - Returns true if at least one value of expr is true.

Examples:

> SELECT some(col) FROM VALUES (true), (false), (false) AS tab(col);
 true
> SELECT some(col) FROM VALUES (NULL), (true), (false) AS tab(col);
 true
> SELECT some(col) FROM VALUES (false), (false), (NULL) AS tab(col);
 false

Since: 3.0.0


std

std(expr) - Returns the sample standard deviation calculated from values of a group.

Examples:

> SELECT std(col) FROM VALUES (1), (2), (3) AS tab(col);
 1.0

Since: 1.6.0


stddev

stddev(expr) - Returns the sample standard deviation calculated from values of a group.

Examples:

> SELECT stddev(col) FROM VALUES (1), (2), (3) AS tab(col);
 1.0

Since: 1.6.0


stddev_pop

stddev_pop(expr) - Returns the population standard deviation calculated from values of a group.

Examples:

> SELECT stddev_pop(col) FROM VALUES (1), (2), (3) AS tab(col);
 0.816496580927726

Since: 1.6.0


stddev_samp

stddev_samp(expr) - Returns the sample standard deviation calculated from values of a group.

Examples:

> SELECT stddev_samp(col) FROM VALUES (1), (2), (3) AS tab(col);
 1.0

Since: 1.6.0


string_agg

string_agg(expr[, delimiter])[ WITHIN GROUP (ORDER BY key [ASC | DESC] [,...])] - Returns the concatenation of non-NULL input values, separated by the delimiter ordered by key. If all values are NULL, NULL is returned.

Arguments:

  • expr - a string or binary expression to be concatenated.
  • delimiter - an optional string or binary foldable expression used to separate the input values. If NULL, the concatenation will be performed without a delimiter. Default is NULL.
  • key - an optional expression for ordering the input values. Multiple keys can be specified. If none are specified, the order of the rows in the result is non-deterministic.

Examples:

> SELECT string_agg(col) FROM VALUES ('a'), ('b'), ('c') AS tab(col);
 abc
> SELECT string_agg(col) WITHIN GROUP (ORDER BY col DESC) FROM VALUES ('a'), ('b'), ('c') AS tab(col);
 cba
> SELECT string_agg(col) FROM VALUES ('a'), (NULL), ('b') AS tab(col);
 ab
> SELECT string_agg(col) FROM VALUES ('a'), ('a') AS tab(col);
 aa
> SELECT string_agg(DISTINCT col) FROM VALUES ('a'), ('a'), ('b') AS tab(col);
 ab
> SELECT string_agg(col, ', ') FROM VALUES ('a'), ('b'), ('c') AS tab(col);
 a, b, c
> SELECT string_agg(col) FROM VALUES (NULL), (NULL) AS tab(col);
 NULL

Note:

  • If the order is not specified, the function is non-deterministic because the order of the rows may be non-deterministic after a shuffle.
  • If DISTINCT is specified, then expr and key must be the same expression.

Since: 4.0.0


sum

sum(expr) - Returns the sum calculated from values of a group.

Examples:

> SELECT sum(col) FROM VALUES (5), (10), (15) AS tab(col);
 30
> SELECT sum(col) FROM VALUES (NULL), (10), (15) AS tab(col);
 25
> SELECT sum(col) FROM VALUES (NULL), (NULL) AS tab(col);
 NULL

Since: 1.0.0


theta_intersection_agg

theta_intersection_agg(expr) - Returns the ThetaSketch's Compact binary representation by intersecting all the Theta sketches in the input column.

Examples:

> SELECT theta_sketch_estimate(theta_intersection_agg(sketch)) FROM (SELECT theta_sketch_agg(col) as sketch FROM VALUES (1) tab(col) UNION ALL SELECT theta_sketch_agg(col, 20) as sketch FROM VALUES (1) tab(col));
 1

Since: 4.1.0


theta_sketch_agg

theta_sketch_agg(expr, lgNomEntries) - Returns the ThetaSketch compact binary representation. lgNomEntries (optional) is the log-base-2 of nominal entries, with nominal entries deciding the number buckets or slots for the ThetaSketch.

Examples:

> SELECT theta_sketch_estimate(theta_sketch_agg(col, 12)) FROM VALUES (1), (1), (2), (2), (3) tab(col);
 3

Since: 4.1.0


theta_union_agg

theta_union_agg(expr, lgNomEntries) - Returns the ThetaSketch's Compact binary representation. lgNomEntries (optional) the log-base-2 of Nominal Entries, with Nominal Entries deciding the number buckets or slots for the ThetaSketch.

Examples:

> SELECT theta_sketch_estimate(theta_union_agg(sketch)) FROM (SELECT theta_sketch_agg(col) as sketch FROM VALUES (1) tab(col) UNION ALL SELECT theta_sketch_agg(col, 20) as sketch FROM VALUES (1) tab(col));
 1

Since: 4.1.0


try_avg

try_avg(expr) - Returns the mean calculated from values of a group and the result is null on overflow.

Examples:

> SELECT try_avg(col) FROM VALUES (1), (2), (3) AS tab(col);
 2.0
> SELECT try_avg(col) FROM VALUES (1), (2), (NULL) AS tab(col);
 1.5
> SELECT try_avg(col) FROM VALUES (interval '2147483647 months'), (interval '1 months') AS tab(col);
 NULL

Since: 3.3.0


try_sum

try_sum(expr) - Returns the sum calculated from values of a group and the result is null on overflow.

Examples:

> SELECT try_sum(col) FROM VALUES (5), (10), (15) AS tab(col);
 30
> SELECT try_sum(col) FROM VALUES (NULL), (10), (15) AS tab(col);
 25
> SELECT try_sum(col) FROM VALUES (NULL), (NULL) AS tab(col);
 NULL
> SELECT try_sum(col) FROM VALUES (9223372036854775807L), (1L) AS tab(col);
 NULL

Since: 3.3.0


tuple_intersection_agg_double

tuple_intersection_agg_double(child, mode) - Returns the intersected TupleSketch compact binary representation. child should be a binary TupleSketch representation created with a double type summary. mode is the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone). Default is sum.

Examples:

> SELECT tuple_sketch_estimate_double(tuple_intersection_agg_double(sketch)) FROM (SELECT tuple_sketch_agg_double(key, summary) as sketch FROM VALUES (1, 5.0D), (2, 10.0D), (3, 15.0D) tab(key, summary) UNION ALL SELECT tuple_sketch_agg_double(key, summary) as sketch FROM VALUES (2, 3.0D), (3, 7.0D), (4, 12.0D) tab(key, summary));
 2.0

Since: 4.2.0


tuple_intersection_agg_integer

tuple_intersection_agg_integer(child, mode) - Returns the intersected TupleSketch compact binary representation. child should be a binary TupleSketch representation created with an integer type summary. mode is the aggregation mode for numeric summaries during intersection (sum, min, max, alwaysone). Default is sum.

Examples:

> SELECT tuple_sketch_estimate_integer(tuple_intersection_agg_integer(sketch)) FROM (SELECT tuple_sketch_agg_integer(key, summary) as sketch FROM VALUES (1, 1), (2, 2), (3, 3) tab(key, summary) UNION ALL SELECT tuple_sketch_agg_integer(key, summary) as sketch FROM VALUES (2, 2), (3, 3), (4, 4) tab(key, summary));
 2.0

Since: 4.2.0


tuple_sketch_agg_double

tuple_sketch_agg_double(key, summary, lgNomEntries, mode) - Returns the TupleSketch compact binary representation. key is the expression for unique value counting. summary is the double value to be aggregated. lgNomEntries is the log-base-2 of nominal entries, with nominal entries deciding the number buckets or slots for the TupleSketch. Default is 12. mode is the aggregation mode for numeric summaries (sum, min, max, alwaysone). Default is sum.

Examples:

> SELECT tuple_sketch_estimate_double(tuple_sketch_agg_double(key, summary, 12, 'sum')) FROM VALUES (1, 5.0D), (1, 1.0D), (2, 2.0D), (2, 3.0D), (3, 2.2D) tab(key, summary);
 3.0

Since: 4.2.0


tuple_sketch_agg_integer

tuple_sketch_agg_integer(key, summary, lgNomEntries, mode) - Returns the TupleSketch compact binary representation. key is the expression for unique value counting. summary is the integer value to be aggregated. lgNomEntries is the log-base-2 of nominal entries, with nominal entries deciding the number buckets or slots for the TupleSketch. Default is 12. mode is the aggregation mode for numeric summaries (sum, min, max, alwaysone). Default is sum.

Examples:

> SELECT tuple_sketch_estimate_integer(tuple_sketch_agg_integer(key, summary, 12, 'sum')) FROM VALUES (1, 5), (1, 1), (2, 2), (2, 3), (3, 2) tab(key, summary);
 3.0

Since: 4.2.0


tuple_union_agg_double

tuple_union_agg_double(child, lgNomEntries, mode) - Returns the unioned TupleSketch compact binary representation. child should be a binary TupleSketch representation created with a double type summary. lgNomEntries is the log-base-2 of nominal entries for the union operation. Default is 12. mode is the aggregation mode for numeric summaries during union (sum, min, max, alwaysone). Default is sum.

Examples:

> SELECT tuple_sketch_estimate_double(tuple_union_agg_double(sketch)) FROM (SELECT tuple_sketch_agg_double(key, summary) as sketch FROM VALUES (1, 5.0D), (2, 10.0D) tab(key, summary) UNION ALL SELECT tuple_sketch_agg_double(key, summary) as sketch FROM VALUES (2, 3.0D), (3, 7.0D) tab(key, summary));
 3.0

Since: 4.2.0


tuple_union_agg_integer

tuple_union_agg_integer(child, lgNomEntries, mode) - Returns the unioned TupleSketch compact binary representation. child should be a binary TupleSketch representation created with an integer type summary. lgNomEntries is the log-base-2 of nominal entries for the union operation. Default is 12. mode is the aggregation mode for numeric summaries during union (sum, min, max, alwaysone). Default is sum.

Examples:

> SELECT tuple_sketch_estimate_integer(tuple_union_agg_integer(sketch)) FROM (SELECT tuple_sketch_agg_integer(key, summary) as sketch FROM VALUES (1, 5), (2, 10) tab(key, summary) UNION ALL SELECT tuple_sketch_agg_integer(key, summary) as sketch FROM VALUES (2, 3), (3, 7) tab(key, summary));
 3.0

Since: 4.2.0


var_pop

var_pop(expr) - Returns the population variance calculated from values of a group.

Examples:

> SELECT var_pop(col) FROM VALUES (1), (2), (3) AS tab(col);
 0.6666666666666666

Since: 1.6.0


var_samp

var_samp(expr) - Returns the sample variance calculated from values of a group.

Examples:

> SELECT var_samp(col) FROM VALUES (1), (2), (3) AS tab(col);
 1.0

Since: 1.6.0


variance

variance(expr) - Returns the sample variance calculated from values of a group.

Examples:

> SELECT variance(col) FROM VALUES (1), (2), (3) AS tab(col);
 1.0

Since: 1.6.0