Package org.apache.spark.sql.expressions
Class Aggregator<IN,BUF,OUT>
Object
org.apache.spark.sql.expressions.Aggregator<IN,BUF,OUT>
- All Implemented Interfaces:
Serializable
,org.apache.spark.sql.internal.UserDefinedFunctionLike
public abstract class Aggregator<IN,BUF,OUT>
extends Object
implements Serializable, org.apache.spark.sql.internal.UserDefinedFunctionLike
A base class for user-defined aggregations, which can be used in
Dataset
operations to take
all of the elements of a group and reduce them to a single value.
For example, the following aggregator extracts an int
from a specific class and adds them up:
case class Data(i: Int)
val customSummer = new Aggregator[Data, Int, Int] {
def zero: Int = 0
def reduce(b: Int, a: Data): Int = b + a.i
def merge(b1: Int, b2: Int): Int = b1 + b2
def finish(r: Int): Int = r
def bufferEncoder: Encoder[Int] = Encoders.scalaInt
def outputEncoder: Encoder[Int] = Encoders.scalaInt
}.toColumn()
val ds: Dataset[Data] = ...
val aggregated = ds.select(customSummer)
Based loosely on Aggregator from Algebird: https://github.com/twitter/algebird
- Since:
- 1.6.0
- See Also:
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionSpecifies theEncoder
for the intermediate value type.abstract OUT
Transform the output of the reduction.abstract BUF
Merge two intermediate values.Specifies theEncoder
for the final output value type.abstract BUF
Combine two values to produce a new value.toColumn()
Returns thisAggregator
as aTypedColumn
that can be used inDataset
operations.abstract BUF
zero()
A zero value for this aggregation.Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.spark.sql.internal.UserDefinedFunctionLike
name
-
Constructor Details
-
Aggregator
public Aggregator()
-
-
Method Details
-
bufferEncoder
Specifies theEncoder
for the intermediate value type.- Returns:
- (undocumented)
- Since:
- 2.0.0
-
finish
Transform the output of the reduction.- Parameters:
reduction
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
merge
Merge two intermediate values.- Parameters:
b1
- (undocumented)b2
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
outputEncoder
Specifies theEncoder
for the final output value type.- Returns:
- (undocumented)
- Since:
- 2.0.0
-
reduce
Combine two values to produce a new value. For performance, the function may modifyb
and return it instead of constructing new object for b.- Parameters:
b
- (undocumented)a
- (undocumented)- Returns:
- (undocumented)
- Since:
- 1.6.0
-
toColumn
Returns thisAggregator
as aTypedColumn
that can be used inDataset
operations.- Returns:
- (undocumented)
- Since:
- 1.6.0
-
zero
A zero value for this aggregation. Should satisfy the property that any b + zero = b.- Returns:
- (undocumented)
- Since:
- 1.6.0
-