Package org.apache.datasketches.kll
Class KllSketch
- java.lang.Object
-
- org.apache.datasketches.kll.KllSketch
-
- All Implemented Interfaces:
QuantilesAPI
- Direct Known Subclasses:
KllDoublesSketch,KllFloatsSketch,KllItemsSketch,KllLongsSketch
public abstract class KllSketch extends Object implements QuantilesAPI
This class is the root of the KLL sketch class hierarchy. It includes the public API that is independent of either sketch type (e.g., float, double or generic item) and independent of whether the sketch is targeted for use on the Java heap or off-heap.KLL is an implementation of a very compact quantiles sketch with lazy compaction scheme and nearly optimal accuracy per retained quantile.
Reference Optimal Quantile Approximation in Streams.
The default k of 200 yields a "single-sided" epsilon of about 1.33% and a "double-sided" (PMF) epsilon of about 1.65%, with a confidence of 99%.
- Author:
- Lee Rhodes, Kevin Lang, Alexander Saydakov
- See Also:
- KLL Sketch,
QuantilesAPI
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classKllSketch.SketchStructureUsed primarily to define the structure of the serialized sketch.static classKllSketch.SketchTypeUsed to define the variable type of the current instance of this class.
-
Field Summary
Fields Modifier and Type Field Description static intDEFAULT_KThe default Kstatic intMAX_KThe maximum K-
Fields inherited from interface org.apache.datasketches.quantilescommon.QuantilesAPI
EMPTY_MSG, MEM_REQ_SVR_NULL_MSG, NOT_SINGLE_ITEM_MSG, SELF_MERGE_MSG, TGT_IS_READ_ONLY_MSG, UNSUPPORTED_MSG
-
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description static intgetKFromEpsilon(double epsilon, boolean pmf)Gets the approximate k to use given epsilon, the normalized rank error.static intgetMaxSerializedSizeBytes(int k, long n, KllSketch.SketchType sketchType, boolean updatableMemFormat)Returns upper bound on the serialized size of a KllSketch given the following parameters.doublegetNormalizedRankError(boolean pmf)Gets the approximate rank error of this sketch normalized as a fraction between zero and one.static doublegetNormalizedRankError(int k, boolean pmf)Gets the normalized rank error given k and pmf.intgetNumRetained()Gets the number of quantiles retained by the sketch.intgetSerializedSizeBytes()Returns the current number of bytes this Sketch would require if serialized in compact form.booleanhasMemory()Returns true if this sketch's data structure is backed by Memory or WritableMemory.booleanisCompactMemoryFormat()Returns true if this sketch is in a Compact Memory Format.booleanisDirect()Returns true if this sketch's data structure is off-heap (a.k.a., Direct or Native memory).booleanisEmpty()Returns true if this sketch is empty.booleanisEstimationMode()Returns true if this sketch is in estimation mode.booleanisMemoryUpdatableFormat()Returns true if the backing WritableMemory is in updatable format.booleanisReadOnly()Returns true if this sketch is read only.booleanisSameResource(org.apache.datasketches.memory.Memory that)Returns true if the backing resource of this is identical with the backing resource of that.abstract voidmerge(KllSketch other)Merges another sketch into this one.StringtoString()Returns a summary of the key parameters of the sketch.abstract StringtoString(boolean withLevels, boolean withLevelsAndItems)Returns human readable summary information about this sketch.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface org.apache.datasketches.quantilescommon.QuantilesAPI
getK, getN, getRankLowerBound, getRankUpperBound, reset
-
-
-
-
Field Detail
-
DEFAULT_K
public static final int DEFAULT_K
The default K- See Also:
- Constant Field Values
-
MAX_K
public static final int MAX_K
The maximum K- See Also:
- Constant Field Values
-
-
Method Detail
-
getKFromEpsilon
public static int getKFromEpsilon(double epsilon, boolean pmf)Gets the approximate k to use given epsilon, the normalized rank error.- Parameters:
epsilon- the normalized rank error between zero and one.pmf- if true, this function returns the k assuming the input epsilon is the desired "double-sided" epsilon for the getPMF() function. Otherwise, this function returns k assuming the input epsilon is the desired "single-sided" epsilon for all the other queries.- Returns:
- k given epsilon.
-
getMaxSerializedSizeBytes
public static int getMaxSerializedSizeBytes(int k, long n, KllSketch.SketchType sketchType, boolean updatableMemFormat)Returns upper bound on the serialized size of a KllSketch given the following parameters.- Parameters:
k- parameter that controls size of the sketch and accuracy of estimatesn- stream lengthsketchType- Only DOUBLES_SKETCH and FLOATS_SKETCH is supported for this operation.updatableMemFormat- true if updatable Memory format, otherwise the standard compact format.- Returns:
- upper bound on the serialized size of a KllSketch.
-
getNormalizedRankError
public static double getNormalizedRankError(int k, boolean pmf)Gets the normalized rank error given k and pmf. Static method version of the getNormalizedRankError(boolean). The epsilon returned is a best fit to 99 percent confidence empirically measured max error in thousands of trials.- Parameters:
k- the configuration parameterpmf- if true, returns the "double-sided" normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries.- Returns:
- if pmf is true, the normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries.
-
getNormalizedRankError
public final double getNormalizedRankError(boolean pmf)
Description copied from interface:QuantilesAPIGets the approximate rank error of this sketch normalized as a fraction between zero and one. The epsilon returned is a best fit to 99 percent confidence empirically measured max error in thousands of trials.- Specified by:
getNormalizedRankErrorin interfaceQuantilesAPI- Parameters:
pmf- if true, returns the "double-sided" normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries.- Returns:
- if pmf is true, returns the "double-sided" normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries.
-
getNumRetained
public final int getNumRetained()
Description copied from interface:QuantilesAPIGets the number of quantiles retained by the sketch.- Specified by:
getNumRetainedin interfaceQuantilesAPI- Returns:
- the number of quantiles retained by the sketch
-
getSerializedSizeBytes
public int getSerializedSizeBytes()
Returns the current number of bytes this Sketch would require if serialized in compact form.- Returns:
- the number of bytes this sketch would require if serialized.
-
hasMemory
public boolean hasMemory()
Description copied from interface:QuantilesAPIReturns true if this sketch's data structure is backed by Memory or WritableMemory.- Specified by:
hasMemoryin interfaceQuantilesAPI- Returns:
- true if this sketch's data structure is backed by Memory or WritableMemory.
-
isCompactMemoryFormat
public boolean isCompactMemoryFormat()
Returns true if this sketch is in a Compact Memory Format.- Returns:
- true if this sketch is in a Compact Memory Format.
-
isDirect
public boolean isDirect()
Description copied from interface:QuantilesAPIReturns true if this sketch's data structure is off-heap (a.k.a., Direct or Native memory).- Specified by:
isDirectin interfaceQuantilesAPI- Returns:
- true if this sketch's data structure is off-heap (a.k.a., Direct or Native memory).
-
isEmpty
public final boolean isEmpty()
Description copied from interface:QuantilesAPIReturns true if this sketch is empty.- Specified by:
isEmptyin interfaceQuantilesAPI- Returns:
- true if this sketch is empty.
-
isEstimationMode
public final boolean isEstimationMode()
Description copied from interface:QuantilesAPIReturns true if this sketch is in estimation mode.- Specified by:
isEstimationModein interfaceQuantilesAPI- Returns:
- true if this sketch is in estimation mode.
-
isMemoryUpdatableFormat
public final boolean isMemoryUpdatableFormat()
Returns true if the backing WritableMemory is in updatable format.- Returns:
- true if the backing WritableMemory is in updatable format.
-
isReadOnly
public final boolean isReadOnly()
Description copied from interface:QuantilesAPIReturns true if this sketch is read only.- Specified by:
isReadOnlyin interfaceQuantilesAPI- Returns:
- true if this sketch is read only.
-
isSameResource
public final boolean isSameResource(org.apache.datasketches.memory.Memory that)
Returns true if the backing resource of this is identical with the backing resource of that. The capacities must be the same. If this is a region, the region offset must also be the same.- Parameters:
that- A different non-null object- Returns:
- true if the backing resource of this is the same as the backing resource of that.
-
merge
public abstract void merge(KllSketch other)
Merges another sketch into this one. Attempting to merge a sketch of the wrong type will throw an exception.- Parameters:
other- sketch to merge into this one
-
toString
public final String toString()
Description copied from interface:QuantilesAPIReturns a summary of the key parameters of the sketch.- Specified by:
toStringin interfaceQuantilesAPI- Overrides:
toStringin classObject- Returns:
- a summary of the key parameters of the sketch.
-
toString
public abstract String toString(boolean withLevels, boolean withLevelsAndItems)
Returns human readable summary information about this sketch. Used for debugging.- Parameters:
withLevels- if true includes sketch levels array summary informationwithLevelsAndItems- if true include detail of levels array and items array together- Returns:
- human readable summary information about this sketch.
-
-