pyspark.RDD.reduce#

RDD.reduce(f)[source]#

Reduces the elements of this RDD using the specified commutative and associative binary operator. Currently reduces partitions locally.

Added in version 0.7.0.

Parameters:

ffunction: the reduce function

Returns:

T: the aggregated result

See also

RDD.treeReduce()
RDD.aggregate()
RDD.treeAggregate()

Examples

>>> from operator import add
>>> sc.parallelize([1, 2, 3, 4, 5]).reduce(add)
15
>>> sc.parallelize((2 for _ in range(10))).map(lambda x: 1).cache().reduce(add)
10
>>> sc.parallelize([]).reduce(add)
Traceback (most recent call last):
    ...
ValueError: Can not reduce() empty RDD