pyspark.RDD.reduce#
- RDD.reduce(f)[source]#
Reduces the elements of this RDD using the specified commutative and associative binary operator. Currently reduces partitions locally.
New in version 0.7.0.
- Parameters
- ffunction
the reduce function
- Returns
- T
the aggregated result
Examples
>>> from operator import add >>> sc.parallelize([1, 2, 3, 4, 5]).reduce(add) 15 >>> sc.parallelize((2 for _ in range(10))).map(lambda x: 1).cache().reduce(add) 10 >>> sc.parallelize([]).reduce(add) Traceback (most recent call last): ... ValueError: Can not reduce() empty RDD