Package org.apache.datasketches.kll
Class KllSketch
java.lang.Object
org.apache.datasketches.kll.KllSketch
- All Implemented Interfaces:
QuantilesAPI
- Direct Known Subclasses:
KllDoublesSketch
,KllFloatsSketch
,KllItemsSketch
,KllLongsSketch
This class is the root of the KLL sketch class hierarchy. It includes the public API that is independent
of either sketch type (e.g., float, double or generic item) and independent of whether the sketch is targeted
for use on the Java heap or off-heap.
KLL is an implementation of a very compact quantiles sketch with lazy compaction scheme and nearly optimal accuracy per retained quantile.
Reference Optimal Quantile Approximation in Streams.
The default k of 200 yields a "single-sided" epsilon of about 1.33% and a "double-sided" (PMF) epsilon of about 1.65%, with a confidence of 99%.
- Author:
- Lee Rhodes, Kevin Lang, Alexander Saydakov
- See Also:
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic enum
Used primarily to define the structure of the serialized sketch.static enum
Used to define the variable type of the current instance of this class. -
Field Summary
Modifier and TypeFieldDescriptionstatic final int
The default Kstatic final int
The maximum KFields inherited from interface org.apache.datasketches.quantilescommon.QuantilesAPI
EMPTY_MSG, MEM_REQ_SVR_NULL_MSG, NOT_SINGLE_ITEM_MSG, SELF_MERGE_MSG, TGT_IS_READ_ONLY_MSG, UNSUPPORTED_MSG
-
Method Summary
Modifier and TypeMethodDescriptionstatic int
getKFromEpsilon
(double epsilon, boolean pmf) Gets the approximate k to use given epsilon, the normalized rank error.static int
getMaxSerializedSizeBytes
(int k, long n, KllSketch.SketchType sketchType, boolean updatableMemFormat) Returns upper bound on the serialized size of a KllSketch given the following parameters.final double
getNormalizedRankError
(boolean pmf) Gets the approximate rank error of this sketch normalized as a fraction between zero and one.static double
getNormalizedRankError
(int k, boolean pmf) Gets the normalized rank error given k and pmf.final int
Gets the number of quantiles retained by the sketch.int
Returns the current number of bytes this Sketch would require if serialized in compact form.boolean
Returns true if this sketch's data structure is backed by Memory or WritableMemory.boolean
Returns true if this sketch is in a Compact Memory Format.boolean
isDirect()
Returns true if this sketch's data structure is off-heap (a.k.a., Direct or Native memory).final boolean
isEmpty()
Returns true if this sketch is empty.final boolean
Returns true if this sketch is in estimation mode.final boolean
Returns true if the backing WritableMemory is in updatable format.final boolean
Returns true if this sketch is read only.final boolean
isSameResource
(org.apache.datasketches.memory.Memory that) Returns true if the backing resource of this is identical with the backing resource of that.abstract void
Merges another sketch into this one.final String
toString()
Returns a summary of the key parameters of the sketch.abstract String
toString
(boolean withLevels, boolean withLevelsAndItems) Returns human readable summary information about this sketch.Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface org.apache.datasketches.quantilescommon.QuantilesAPI
getK, getN, getRankLowerBound, getRankUpperBound, reset
-
Field Details
-
DEFAULT_K
public static final int DEFAULT_KThe default K- See Also:
-
MAX_K
public static final int MAX_KThe maximum K- See Also:
-
-
Method Details
-
getKFromEpsilon
public static int getKFromEpsilon(double epsilon, boolean pmf) Gets the approximate k to use given epsilon, the normalized rank error.- Parameters:
epsilon
- the normalized rank error between zero and one.pmf
- if true, this function returns the k assuming the input epsilon is the desired "double-sided" epsilon for the getPMF() function. Otherwise, this function returns k assuming the input epsilon is the desired "single-sided" epsilon for all the other queries.- Returns:
- k given epsilon.
-
getMaxSerializedSizeBytes
public static int getMaxSerializedSizeBytes(int k, long n, KllSketch.SketchType sketchType, boolean updatableMemFormat) Returns upper bound on the serialized size of a KllSketch given the following parameters.- Parameters:
k
- parameter that controls size of the sketch and accuracy of estimatesn
- stream lengthsketchType
- Only DOUBLES_SKETCH and FLOATS_SKETCH is supported for this operation.updatableMemFormat
- true if updatable Memory format, otherwise the standard compact format.- Returns:
- upper bound on the serialized size of a KllSketch.
-
getNormalizedRankError
public static double getNormalizedRankError(int k, boolean pmf) Gets the normalized rank error given k and pmf. Static method version of the getNormalizedRankError(boolean). The epsilon returned is a best fit to 99 percent confidence empirically measured max error in thousands of trials.- Parameters:
k
- the configuration parameterpmf
- if true, returns the "double-sided" normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries.- Returns:
- if pmf is true, the normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries.
-
getNormalizedRankError
public final double getNormalizedRankError(boolean pmf) Description copied from interface:QuantilesAPI
Gets the approximate rank error of this sketch normalized as a fraction between zero and one. The epsilon returned is a best fit to 99 percent confidence empirically measured max error in thousands of trials.- Specified by:
getNormalizedRankError
in interfaceQuantilesAPI
- Parameters:
pmf
- if true, returns the "double-sided" normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries.- Returns:
- if pmf is true, returns the "double-sided" normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries.
-
getNumRetained
public final int getNumRetained()Description copied from interface:QuantilesAPI
Gets the number of quantiles retained by the sketch.- Specified by:
getNumRetained
in interfaceQuantilesAPI
- Returns:
- the number of quantiles retained by the sketch
-
getSerializedSizeBytes
public int getSerializedSizeBytes()Returns the current number of bytes this Sketch would require if serialized in compact form.- Returns:
- the number of bytes this sketch would require if serialized.
-
hasMemory
public boolean hasMemory()Description copied from interface:QuantilesAPI
Returns true if this sketch's data structure is backed by Memory or WritableMemory.- Specified by:
hasMemory
in interfaceQuantilesAPI
- Returns:
- true if this sketch's data structure is backed by Memory or WritableMemory.
-
isCompactMemoryFormat
public boolean isCompactMemoryFormat()Returns true if this sketch is in a Compact Memory Format.- Returns:
- true if this sketch is in a Compact Memory Format.
-
isDirect
public boolean isDirect()Description copied from interface:QuantilesAPI
Returns true if this sketch's data structure is off-heap (a.k.a., Direct or Native memory).- Specified by:
isDirect
in interfaceQuantilesAPI
- Returns:
- true if this sketch's data structure is off-heap (a.k.a., Direct or Native memory).
-
isEmpty
public final boolean isEmpty()Description copied from interface:QuantilesAPI
Returns true if this sketch is empty.- Specified by:
isEmpty
in interfaceQuantilesAPI
- Returns:
- true if this sketch is empty.
-
isEstimationMode
public final boolean isEstimationMode()Description copied from interface:QuantilesAPI
Returns true if this sketch is in estimation mode.- Specified by:
isEstimationMode
in interfaceQuantilesAPI
- Returns:
- true if this sketch is in estimation mode.
-
isMemoryUpdatableFormat
public final boolean isMemoryUpdatableFormat()Returns true if the backing WritableMemory is in updatable format.- Returns:
- true if the backing WritableMemory is in updatable format.
-
isReadOnly
public final boolean isReadOnly()Description copied from interface:QuantilesAPI
Returns true if this sketch is read only.- Specified by:
isReadOnly
in interfaceQuantilesAPI
- Returns:
- true if this sketch is read only.
-
isSameResource
public final boolean isSameResource(org.apache.datasketches.memory.Memory that) Returns true if the backing resource of this is identical with the backing resource of that. The capacities must be the same. If this is a region, the region offset must also be the same.- Parameters:
that
- A different non-null object- Returns:
- true if the backing resource of this is the same as the backing resource of that.
-
merge
Merges another sketch into this one. Attempting to merge a sketch of the wrong type will throw an exception.- Parameters:
other
- sketch to merge into this one
-
toString
Description copied from interface:QuantilesAPI
Returns a summary of the key parameters of the sketch.- Specified by:
toString
in interfaceQuantilesAPI
- Overrides:
toString
in classObject
- Returns:
- a summary of the key parameters of the sketch.
-
toString
Returns human readable summary information about this sketch. Used for debugging.- Parameters:
withLevels
- if true includes sketch levels array summary informationwithLevelsAndItems
- if true include detail of levels array and items array together- Returns:
- human readable summary information about this sketch.
-