Class KllSketch

java.lang.Object
org.apache.datasketches.kll.KllSketch
All Implemented Interfaces:
MemorySegmentStatus, QuantilesAPI
Direct Known Subclasses:
KllDoublesSketch, KllFloatsSketch, KllItemsSketch, KllLongsSketch

public abstract class KllSketch extends Object implements QuantilesAPI, MemorySegmentStatus
This class is the root of the KLL sketch class hierarchy. It includes the public API that is independent of either sketch type (e.g., float, double, long or generic item) and independent of whether the sketch is targeted for use on the Java heap or off-heap.

KLL is an implementation of a very compact quantiles sketch with lazy compaction scheme and nearly optimal accuracy per retained quantile.

Reference Optimal Quantile Approximation in Streams.

The default k of 200 yields a "single-sided" epsilon of about 1.33% and a "double-sided" (PMF) epsilon of about 1.65%, with a confidence of 99%.

Author:
Lee Rhodes, Kevin Lang, Alexander Saydakov
See Also:
  • Field Details

  • Method Details

    • getKFromEpsilon

      public static int getKFromEpsilon(double epsilon, boolean pmf)
      Gets the approximate k to use given epsilon, the normalized rank error.
      Parameters:
      epsilon - the normalized rank error between zero and one.
      pmf - if true, this function returns the k assuming the input epsilon is the desired "double-sided" epsilon for the getPMF() function. Otherwise, this function returns k assuming the input epsilon is the desired "single-sided" epsilon for all the other queries.
      Returns:
      k given epsilon.
    • getMaxSerializedSizeBytes

      public static int getMaxSerializedSizeBytes(int k, long n, KllSketch.SketchType sketchType, boolean updatableFormat)
      Returns upper bound on the serialized size of a KllSketch given the following parameters.
      Parameters:
      k - parameter that controls size of the sketch and accuracy of estimates
      n - stream length
      sketchType - Only DOUBLES_SKETCH, LONGS_SKETCH and FLOATS_SKETCH are supported for this operation.
      updatableFormat - true if updatable MemorySegment format, otherwise the standard compact format.
      Returns:
      upper bound on the serialized size of a KllSketch.
    • getNormalizedRankError

      public static double getNormalizedRankError(int k, boolean pmf)
      Gets the normalized rank error given k and pmf. Static method version of the getNormalizedRankError(boolean). The epsilon returned is a best fit to 99 percent confidence empirically measured max error in thousands of trials.
      Parameters:
      k - the configuration parameter
      pmf - if true, returns the "double-sided" normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries.
      Returns:
      if pmf is true, the normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries.
    • getNormalizedRankError

      public final double getNormalizedRankError(boolean pmf)
      Description copied from interface: QuantilesAPI
      Gets the approximate rank error of this sketch normalized as a fraction between zero and one. The epsilon returned is a best fit to 99 percent confidence empirically measured max error in thousands of trials.
      Specified by:
      getNormalizedRankError in interface QuantilesAPI
      Parameters:
      pmf - if true, returns the "double-sided" normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries.
      Returns:
      if pmf is true, returns the "double-sided" normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries.
    • getNumRetained

      public final int getNumRetained()
      Description copied from interface: QuantilesAPI
      Gets the number of quantiles retained by the sketch.
      Specified by:
      getNumRetained in interface QuantilesAPI
      Returns:
      the number of quantiles retained by the sketch
    • getSerializedSizeBytes

      public int getSerializedSizeBytes()
      Returns the current number of bytes this Sketch would require if serialized in compact form.
      Returns:
      the number of bytes this sketch would require if serialized.
    • hasMemorySegment

      public abstract boolean hasMemorySegment()
      Description copied from interface: MemorySegmentStatus
      Returns true if this object's internal data is backed by a MemorySegment, which may be on-heap or off-heap.
      Specified by:
      hasMemorySegment in interface MemorySegmentStatus
      Returns:
      true if this object's internal data is backed by a MemorySegment.
    • isCompactMemorySegmentFormat

      public boolean isCompactMemorySegmentFormat()
      Returns true if this sketch is in a Compact MemorySegment Format.
      Returns:
      true if this sketch is in a Compact MemorySegment Format.
    • isOffHeap

      public abstract boolean isOffHeap()
      Description copied from interface: MemorySegmentStatus
      Returns true if this object's internal data is backed by an off-heap (direct or native)) MemorySegment.
      Specified by:
      isOffHeap in interface MemorySegmentStatus
      Returns:
      true if this object's internal data is backed by an off-heap (direct or native)) MemorySegment.
    • isEmpty

      public final boolean isEmpty()
      Description copied from interface: QuantilesAPI
      Returns true if this sketch is empty.
      Specified by:
      isEmpty in interface QuantilesAPI
      Returns:
      true if this sketch is empty.
    • isEstimationMode

      public final boolean isEstimationMode()
      Description copied from interface: QuantilesAPI
      Returns true if this sketch is in estimation mode.
      Specified by:
      isEstimationMode in interface QuantilesAPI
      Returns:
      true if this sketch is in estimation mode.
    • isMemorySegmentUpdatableFormat

      public final boolean isMemorySegmentUpdatableFormat()
      Returns true if the backing MemorySegment is in updatable format.
      Returns:
      true if the backing MemorySegment is in updatable format.
    • isReadOnly

      public final boolean isReadOnly()
      Description copied from interface: QuantilesAPI
      Returns true if this sketch is read only.
      Specified by:
      isReadOnly in interface QuantilesAPI
      Returns:
      true if this sketch is read only.
    • isSameResource

      public abstract boolean isSameResource(MemorySegment that)
      Description copied from interface: MemorySegmentStatus
      Returns true if an internally referenced MemorySegment has the same backing resource as that, or equivalently, if their two memory regions overlap. This applies to both on-heap and off-heap MemorySegments.

      Note: If both segments are on-heap and not read-only, it can be determined if they were derived from the same backing memory (array). However, this is not always possible off-heap. Because of this asymmetry, this definition of "isSameResource" is confined to the existence of an overlap.

      Specified by:
      isSameResource in interface MemorySegmentStatus
      Parameters:
      that - The given MemorySegment.
      Returns:
      true if an internally referenced MemorySegment has the same backing resource as that.
    • merge

      public abstract void merge(KllSketch other)
      Merges another sketch into this one. Attempting to merge a sketch of the wrong type will throw an exception.
      Parameters:
      other - sketch to merge into this one
    • toString

      public final String toString()
      Description copied from interface: QuantilesAPI
      Returns a summary of the key parameters of the sketch.
      Specified by:
      toString in interface QuantilesAPI
      Overrides:
      toString in class Object
      Returns:
      a summary of the key parameters of the sketch.
    • toString

      public abstract String toString(boolean withLevels, boolean withLevelsAndItems)
      Returns human readable summary information about this sketch. Used for debugging.
      Parameters:
      withLevels - if true includes sketch levels array summary information
      withLevelsAndItems - if true include detail of levels array and items array together
      Returns:
      human readable summary information about this sketch.