pyspark.RDD.coalesce#

RDD.coalesce(numPartitions, shuffle=False)[source]#

Return a new RDD that is reduced into numPartitions partitions.

Added in version 1.0.0.

Parameters:

numPartitionsint, optional: the number of partitions in new RDD
shufflebool, optional, default False: whether to add a shuffle step

Returns:

RDD: a RDD that is reduced into numPartitions partitions

See also

RDD.repartition()

Examples

>>> sc.parallelize([1, 2, 3, 4, 5], 3).glom().collect()
[[1], [2, 3], [4, 5]]
>>> sc.parallelize([1, 2, 3, 4, 5], 3).coalesce(1).glom().collect()
[[1, 2, 3, 4, 5]]