Class Partitioner<T,S extends QuantilesGenericAPI<T> & PartitioningFeature<T>>
java.lang.Object
org.apache.datasketches.partitions.Partitioner<T,S>
- Type Parameters:
T
- the data typeS
- the quantiles sketch that implements both QuantilesGenericAPI and PartitioningFeature.
public class Partitioner<T,S extends QuantilesGenericAPI<T> & PartitioningFeature<T>>
extends Object
A partitioning process that can partition very large data sets into thousands
of partitions of approximately the same size.
The code included here does work fine for moderate sized partitioning tasks. As an example, using the test code in the test branch with the partitioning task of splitting a data set of 1 billion items into 324 partitions of size 3M items completed in under 3 minutes, which was performed on a single CPU. For much larger partitioning tasks, it is recommended that this code be leveraged into a parallelized systems environment.
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
Defines a row for List of PartitionBounds.static class
Holds data for a Stack element -
Constructor Summary
ConstructorDescriptionPartitioner
(long tgtPartitionSize, int maxPartsPerPass, SketchFillRequest<T, S> fillReq) This constructor assumes a QuantileSearchCriteria of INCLUSIVE.Partitioner
(long tgtPartitionSize, int maxPartsPerSk, SketchFillRequest<T, S> fillReq, QuantileSearchCriteria criteria) This constructor includes the QuantileSearchCriteria criteria as a parameter. -
Method Summary
Modifier and TypeMethodDescriptionThis initiates the partitioning process
-
Constructor Details
-
Partitioner
This constructor assumes a QuantileSearchCriteria of INCLUSIVE.- Parameters:
tgtPartitionSize
- the target size of the resulting partitions in number of items.maxPartsPerPass
- The maximum number of partitions to request from the sketch. The smaller this number is the smaller the variance will be of the resulting partitions, but this will increase the number of passes of the source data set.fillReq
- The is an implementation of the SketchFillRequest call-back supplied by the user and implements the SketchFillRequest interface.
-
Partitioner
public Partitioner(long tgtPartitionSize, int maxPartsPerSk, SketchFillRequest<T, S> fillReq, QuantileSearchCriteria criteria) This constructor includes the QuantileSearchCriteria criteria as a parameter.- Parameters:
tgtPartitionSize
- the target size of the resulting partitions in number of items.maxPartsPerSk
- The maximum number of partitions to request from the sketch. The smaller this number is the smaller the variance will be of the resulting partitions, but this will increase the number of passes of the source data set.fillReq
- The is an implementation of the SketchFillRequest call-back supplied by the user.criteria
- This is the desired QuantileSearchCriteria to be used.
-
-
Method Details
-
partition
This initiates the partitioning process- Parameters:
sk
- A sketch of the entire data set.- Returns:
- the final partitioning list
-