Class KllSketch
java.lang.Object
org.apache.datasketches.kll.KllSketch
- All Implemented Interfaces:
MemorySegmentStatus, QuantilesAPI
- Direct Known Subclasses:
KllDoublesSketch, KllFloatsSketch, KllItemsSketch, KllLongsSketch
This class is the root of the KLL sketch class hierarchy. It includes the public API that is independent
of either sketch type (e.g., float, double, long or generic item) and independent of whether the sketch is targeted
for use on the Java heap or off-heap.
KLL is an implementation of a very compact quantiles sketch with lazy compaction scheme and nearly optimal accuracy per retained quantile.
Reference Optimal Quantile Approximation in Streams.
The default k of 200 yields a "single-sided" epsilon of about 1.33% and a "double-sided" (PMF) epsilon of about 1.65%, with a confidence of 99%.
- Author:
- Lee Rhodes, Kevin Lang, Alexander Saydakov
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enumUsed primarily to define the structure of the serialized sketch.static enumUsed to define the variable type of the current instance of this class. -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intThe default Kstatic final intThe maximum KFields inherited from interface QuantilesAPI
EMPTY_MSG, MEM_REQ_SVR_NULL_MSG, NOT_SINGLE_ITEM_MSG, SELF_MERGE_MSG, TGT_IS_READ_ONLY_MSG, UNSUPPORTED_MSG -
Method Summary
Modifier and TypeMethodDescriptionstatic intgetKFromEpsilon(double epsilon, boolean pmf) Gets the approximate k to use given epsilon, the normalized rank error.static intgetMaxSerializedSizeBytes(int k, long n, KllSketch.SketchType sketchType, boolean updatableFormat) Returns upper bound on the serialized size of a KllSketch given the following parameters.final doublegetNormalizedRankError(boolean pmf) Gets the approximate rank error of this sketch normalized as a fraction between zero and one.static doublegetNormalizedRankError(int k, boolean pmf) Gets the normalized rank error given k and pmf.final intGets the number of quantiles retained by the sketch.intReturns the current number of bytes this Sketch would require if serialized in compact form.abstract booleanReturns true if this object's internal data is backed by a MemorySegment, which may be on-heap or off-heap.booleanReturns true if this sketch is in a Compact MemorySegment Format.final booleanisEmpty()Returns true if this sketch is empty.final booleanReturns true if this sketch is in estimation mode.final booleanReturns true if the backing MemorySegment is in updatable format.abstract booleanReturns true if this object's internal data is backed by an off-heap (direct or native)) MemorySegment.final booleanReturns true if this sketch is read only.abstract booleanisSameResource(MemorySegment that) Returns true if an internally referenced MemorySegment has the same backing resource as that, or equivalently, if their two memory regions overlap.abstract voidMerges another sketch into this one.final StringtoString()Returns a summary of the key parameters of the sketch.abstract StringtoString(boolean withLevels, boolean withLevelsAndItems) Returns human readable summary information about this sketch.Methods inherited from interface QuantilesAPI
getK, getN, getRankLowerBound, getRankUpperBound, reset
-
Field Details
-
DEFAULT_K
public static final int DEFAULT_KThe default K- See Also:
-
MAX_K
public static final int MAX_KThe maximum K- See Also:
-
-
Method Details
-
getKFromEpsilon
public static int getKFromEpsilon(double epsilon, boolean pmf) Gets the approximate k to use given epsilon, the normalized rank error.- Parameters:
epsilon- the normalized rank error between zero and one.pmf- if true, this function returns the k assuming the input epsilon is the desired "double-sided" epsilon for the getPMF() function. Otherwise, this function returns k assuming the input epsilon is the desired "single-sided" epsilon for all the other queries.- Returns:
- k given epsilon.
-
getMaxSerializedSizeBytes
public static int getMaxSerializedSizeBytes(int k, long n, KllSketch.SketchType sketchType, boolean updatableFormat) Returns upper bound on the serialized size of a KllSketch given the following parameters.- Parameters:
k- parameter that controls size of the sketch and accuracy of estimatesn- stream lengthsketchType- Only DOUBLES_SKETCH, LONGS_SKETCH and FLOATS_SKETCH are supported for this operation.updatableFormat- true if updatable MemorySegment format, otherwise the standard compact format.- Returns:
- upper bound on the serialized size of a KllSketch.
-
getNormalizedRankError
public static double getNormalizedRankError(int k, boolean pmf) Gets the normalized rank error given k and pmf. Static method version of the getNormalizedRankError(boolean). The epsilon returned is a best fit to 99 percent confidence empirically measured max error in thousands of trials.- Parameters:
k- the configuration parameterpmf- if true, returns the "double-sided" normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries.- Returns:
- if pmf is true, the normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries.
-
getNormalizedRankError
public final double getNormalizedRankError(boolean pmf) Description copied from interface:QuantilesAPIGets the approximate rank error of this sketch normalized as a fraction between zero and one. The epsilon returned is a best fit to 99 percent confidence empirically measured max error in thousands of trials.- Specified by:
getNormalizedRankErrorin interfaceQuantilesAPI- Parameters:
pmf- if true, returns the "double-sided" normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries.- Returns:
- if pmf is true, returns the "double-sided" normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries.
-
getNumRetained
public final int getNumRetained()Description copied from interface:QuantilesAPIGets the number of quantiles retained by the sketch.- Specified by:
getNumRetainedin interfaceQuantilesAPI- Returns:
- the number of quantiles retained by the sketch
-
getSerializedSizeBytes
public int getSerializedSizeBytes()Returns the current number of bytes this Sketch would require if serialized in compact form.- Returns:
- the number of bytes this sketch would require if serialized.
-
hasMemorySegment
public abstract boolean hasMemorySegment()Description copied from interface:MemorySegmentStatusReturns true if this object's internal data is backed by a MemorySegment, which may be on-heap or off-heap.- Specified by:
hasMemorySegmentin interfaceMemorySegmentStatus- Returns:
- true if this object's internal data is backed by a MemorySegment.
-
isCompactMemorySegmentFormat
public boolean isCompactMemorySegmentFormat()Returns true if this sketch is in a Compact MemorySegment Format.- Returns:
- true if this sketch is in a Compact MemorySegment Format.
-
isOffHeap
public abstract boolean isOffHeap()Description copied from interface:MemorySegmentStatusReturns true if this object's internal data is backed by an off-heap (direct or native)) MemorySegment.- Specified by:
isOffHeapin interfaceMemorySegmentStatus- Returns:
- true if this object's internal data is backed by an off-heap (direct or native)) MemorySegment.
-
isEmpty
public final boolean isEmpty()Description copied from interface:QuantilesAPIReturns true if this sketch is empty.- Specified by:
isEmptyin interfaceQuantilesAPI- Returns:
- true if this sketch is empty.
-
isEstimationMode
public final boolean isEstimationMode()Description copied from interface:QuantilesAPIReturns true if this sketch is in estimation mode.- Specified by:
isEstimationModein interfaceQuantilesAPI- Returns:
- true if this sketch is in estimation mode.
-
isMemorySegmentUpdatableFormat
public final boolean isMemorySegmentUpdatableFormat()Returns true if the backing MemorySegment is in updatable format.- Returns:
- true if the backing MemorySegment is in updatable format.
-
isReadOnly
public final boolean isReadOnly()Description copied from interface:QuantilesAPIReturns true if this sketch is read only.- Specified by:
isReadOnlyin interfaceQuantilesAPI- Returns:
- true if this sketch is read only.
-
isSameResource
Description copied from interface:MemorySegmentStatusReturns true if an internally referenced MemorySegment has the same backing resource as that, or equivalently, if their two memory regions overlap. This applies to both on-heap and off-heap MemorySegments.Note: If both segments are on-heap and not read-only, it can be determined if they were derived from the same backing memory (array). However, this is not always possible off-heap. Because of this asymmetry, this definition of "isSameResource" is confined to the existence of an overlap.
- Specified by:
isSameResourcein interfaceMemorySegmentStatus- Parameters:
that- The given MemorySegment.- Returns:
- true if an internally referenced MemorySegment has the same backing resource as that.
-
merge
Merges another sketch into this one. Attempting to merge a sketch of the wrong type will throw an exception.- Parameters:
other- sketch to merge into this one
-
toString
Description copied from interface:QuantilesAPIReturns a summary of the key parameters of the sketch.- Specified by:
toStringin interfaceQuantilesAPI- Overrides:
toStringin classObject- Returns:
- a summary of the key parameters of the sketch.
-
toString
Returns human readable summary information about this sketch. Used for debugging.- Parameters:
withLevels- if true includes sketch levels array summary informationwithLevelsAndItems- if true include detail of levels array and items array together- Returns:
- human readable summary information about this sketch.
-