Window Functions¶
This page lists all window functions available in Spark SQL.
counter_diff¶
counter_diff(value [, start_time]) - Computes the differences between consecutive cumulative counter values in a time series, thereby converting the counter from the cumulative to the delta format.
Arguments:
- value - A cumulative counter. Must be a numeric data type. Must be non-negative.
- start_time - An optional timestamp parameter which indicates when the counter was last set to zero. Used to signal counter resets.
Examples:
> SELECT m, t, c, counter_diff(c) OVER (PARTITION BY m ORDER BY t) AS diff FROM VALUES ('http_requests', TIMESTAMP_NTZ '2026-01-01 00:00:00', 100), ('http_requests', TIMESTAMP_NTZ '2026-01-01 00:01:00', 200), ('http_requests', TIMESTAMP_NTZ '2026-01-01 00:02:00', 400), ('http_requests', TIMESTAMP_NTZ '2026-01-01 00:03:00', 50), ('http_requests', TIMESTAMP_NTZ '2026-01-01 00:04:00', 100) AS tab(m, t, c) ORDER BY t;
http_requests 2026-01-01 00:00:00 100 NULL
http_requests 2026-01-01 00:01:00 200 100
http_requests 2026-01-01 00:02:00 400 200
http_requests 2026-01-01 00:03:00 50 NULL
http_requests 2026-01-01 00:04:00 100 50
> SELECT m, t, s, c, counter_diff(c, s) OVER (PARTITION BY m ORDER BY t) AS diff FROM VALUES ('http_requests', TIMESTAMP_NTZ '2026-01-01 00:00:00', 100, TIMESTAMP_NTZ '2026-01-01 00:00:00'), ('http_requests', TIMESTAMP_NTZ '2026-01-01 00:01:00', 200, TIMESTAMP_NTZ '2026-01-01 00:00:00'), ('http_requests', TIMESTAMP_NTZ '2026-01-01 00:02:00', 400, TIMESTAMP_NTZ '2026-01-01 00:00:00'), ('http_requests', TIMESTAMP_NTZ '2026-01-01 00:03:00', 500, TIMESTAMP_NTZ '2026-01-01 00:02:15'), ('http_requests', TIMESTAMP_NTZ '2026-01-01 00:04:00', 600, TIMESTAMP_NTZ '2026-01-01 00:02:15') AS tab(m, t, c, s) ORDER BY t;
http_requests 2026-01-01 00:00:00 2026-01-01 00:00:00 100 NULL
http_requests 2026-01-01 00:01:00 2026-01-01 00:00:00 200 100
http_requests 2026-01-01 00:02:00 2026-01-01 00:00:00 400 200
http_requests 2026-01-01 00:03:00 2026-01-01 00:02:15 500 NULL
http_requests 2026-01-01 00:04:00 2026-01-01 00:02:15 600 100
Note:
counter_diff calculates the difference between the current and the previous counter value within the partition, according to the order defined by the ORDER BY clause.
Use the PARTITION BY clause of the window to separate independent counters. This is done by specifying all columns which uniquely identify a time series. These are typically the counter name and any attributes tied to the counter.
Use the ORDER BY clause of the window to order the observations by the associated timestamp in ascending order.
The following special cases are handled: 1. If the counter value is NULL, NULL is returned for that row, and the row is excluded from difference calculations and comparisons for subsequent rows. 2. If the counter value is negative, or the start time decreases, an error is raised. 3. In the case of a counter reset, NULL is returned. 4. NULL is returned for the first row. 5. If either the current or the previous start time is NULL, the start time reset check is skipped.
Counter resets are detected when: 1. The current counter value is less than the previous counter value. 2. The current start time is greater than the previous start time, if start_time was provided.
Since: 4.3.0
cume_dist¶
cume_dist() - Computes the position of a value relative to all values in the partition.
Examples:
> SELECT a, b, cume_dist() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1 1 0.6666666666666666
A1 1 0.6666666666666666
A1 2 1.0
A2 3 1.0
Since: 2.0.0
dense_rank¶
dense_rank() - Computes the rank of a value in a group of values. The result is one plus the previously assigned rank value. Unlike the function rank, dense_rank will not produce gaps in the ranking sequence.
Arguments:
- children - this is to base the rank on; a change in the value of one the children will trigger a change in rank. This is an internal parameter and will be assigned by the Analyser.
Examples:
> SELECT a, b, dense_rank(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1 1 1
A1 1 1
A1 2 2
A2 3 1
Since: 2.0.0
lag¶
lag(input[, offset[, default]]) - Returns the value of input at the offsetth row
before the current row in the window. The default value of offset is 1 and the default
value of default is null. If the value of input at the offsetth row is null,
null is returned. If there is no such offset row (e.g., when the offset is 1, the first
row of the window does not have any previous row), default is returned.
Arguments:
- input - a string expression to evaluate
offsetrows before the current row. - offset - an int expression which is rows to jump back in the partition.
- default - a string expression which is to use when the offset row does not exist.
Examples:
> SELECT a, b, lag(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1 1 NULL
A1 1 1
A1 2 1
A2 3 NULL
Since: 2.0.0
lead¶
lead(input[, offset[, default]]) - Returns the value of input at the offsetth row
after the current row in the window. The default value of offset is 1 and the default
value of default is null. If the value of input at the offsetth row is null,
null is returned. If there is no such an offset row (e.g., when the offset is 1, the last
row of the window does not have any subsequent row), default is returned.
Arguments:
- input - a string expression to evaluate
offsetrows after the current row. - offset - an int expression which is rows to jump ahead in the partition.
- default - a string expression which is to use when the offset is larger than the window. The default value is null.
Examples:
> SELECT a, b, lead(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1 1 1
A1 1 2
A1 2 NULL
A2 3 NULL
Since: 2.0.0
nth_value¶
nth_value(input[, offset]) - Returns the value of input at the row that is the offsetth row
from beginning of the window frame. Offset starts at 1. If ignoreNulls=true, we will skip
nulls when finding the offsetth row. Otherwise, every row counts for the offset. If
there is no such an offsetth row (e.g., when the offset is 10, size of the window frame
is less than 10), null is returned.
Arguments:
- input - the target column or expression that the function operates on.
- offset - a positive int literal to indicate the offset in the window frame. It starts with 1.
- ignoreNulls - an optional specification that indicates the NthValue should skip null values in the determination of which row to use.
Examples:
> SELECT a, b, nth_value(b, 2) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1 1 1
A1 1 1
A1 2 1
A2 3 NULL
Since: 3.1.0
ntile¶
ntile(n) - Divides the rows for each window partition into n buckets ranging
from 1 to at most n.
Arguments:
- buckets - an int expression which is number of buckets to divide the rows in. Default value is 1.
Examples:
> SELECT a, b, ntile(2) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1 1 1
A1 1 1
A1 2 2
A2 3 1
Since: 2.0.0
percent_rank¶
percent_rank() - Computes the percentage ranking of a value in a group of values.
Arguments:
- children - this is to base the rank on; a change in the value of one the children will trigger a change in rank. This is an internal parameter and will be assigned by the Analyser.
Examples:
> SELECT a, b, percent_rank(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1 1 0.0
A1 1 0.0
A1 2 1.0
A2 3 0.0
Since: 2.0.0
rank¶
rank() - Computes the rank of a value in a group of values. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values will produce gaps in the sequence.
Arguments:
- children - this is to base the rank on; a change in the value of one the children will trigger a change in rank. This is an internal parameter and will be assigned by the Analyser.
Examples:
> SELECT a, b, rank(b) OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1 1 1
A1 1 1
A1 2 3
A2 3 1
Since: 2.0.0
row_number¶
row_number() - Assigns a unique, sequential number to each row, starting with one, according to the ordering of rows within the window partition.
Examples:
> SELECT a, b, row_number() OVER (PARTITION BY a ORDER BY b) FROM VALUES ('A1', 2), ('A1', 1), ('A2', 3), ('A1', 1) tab(a, b);
A1 1 1
A1 1 2
A1 2 3
A2 3 1
Since: 2.0.0