Class KllSketch

  • All Implemented Interfaces:
    QuantilesAPI
    Direct Known Subclasses:
    KllDoublesSketch, KllFloatsSketch

    public abstract class KllSketch
    extends Object
    implements QuantilesAPI
    This class is the root of the KLL sketch class hierarchy. It includes the public API that is independent of either sketch type (float or double) and independent of whether the sketch is targeted for use on the heap or Direct (off-heap).

    KLL is an implementation of a very compact quantiles sketch with lazy compaction scheme and nearly optimal accuracy per retained quantile.

    Reference Optimal Quantile Approximation in Streams.

    The default k of 200 yields a "single-sided" epsilon of about 1.33% and a "double-sided" (PMF) epsilon of about 1.65%, with a confidence of 99%.

    Author:
    Lee Rhodes, Kevin Lang, Alexander Saydakov
    See Also:
    KLL Sketch, QuantilesAPI
    • Method Detail

      • getKFromEpsilon

        public static int getKFromEpsilon​(double epsilon,
                                          boolean pmf)
        Gets the approximate k to use given epsilon, the normalized rank error.
        Parameters:
        epsilon - the normalized rank error between zero and one.
        pmf - if true, this function returns the k assuming the input epsilon is the desired "double-sided" epsilon for the getPMF() function. Otherwise, this function returns k assuming the input epsilon is the desired "single-sided" epsilon for all the other queries.
        Returns:
        k given epsilon.
      • getMaxSerializedSizeBytes

        public static int getMaxSerializedSizeBytes​(int k,
                                                    long n,
                                                    KllSketch.SketchType sketchType,
                                                    boolean updatableMemFormat)
        Returns upper bound on the serialized size of a KllSketch given the following parameters.
        Parameters:
        k - parameter that controls size of the sketch and accuracy of estimates
        n - stream length
        sketchType - either DOUBLES_SKETCH or FLOATS_SKETCH
        updatableMemFormat - true if updatable Memory format, otherwise the standard compact format.
        Returns:
        upper bound on the serialized size of a KllSketch.
      • getNormalizedRankError

        public static double getNormalizedRankError​(int k,
                                                    boolean pmf)
        Gets the normalized rank error given k and pmf. Static method version of the getNormalizedRankError(boolean). The epsilon returned is a best fit to 99 percent confidence empirically measured max error in thousands of trials.
        Parameters:
        k - the configuration parameter
        pmf - if true, returns the "double-sided" normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries.
        Returns:
        if pmf is true, the normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries.
      • getCurrentCompactSerializedSizeBytes

        @Deprecated
        public final int getCurrentCompactSerializedSizeBytes()
        Deprecated.
        version 4.0.0 use getSerializedSizeBytes().
        Returns the current number of bytes this sketch would require to store in the compact Memory Format.
        Returns:
        the current number of bytes this sketch would require to store in the compact Memory Format.
      • getCurrentUpdatableSerializedSizeBytes

        @Deprecated
        public final int getCurrentUpdatableSerializedSizeBytes()
        Deprecated.
        version 4.0.0 use getSerializedSizeBytes().
        Returns the current number of bytes this sketch would require to store in the updatable Memory Format.
        Returns:
        the current number of bytes this sketch would require to store in the updatable Memory Format.
      • getK

        public abstract int getK()
        Description copied from interface: QuantilesAPI
        Gets the user configured parameter k, which controls the accuracy of the sketch and its memory space usage.
        Specified by:
        getK in interface QuantilesAPI
        Returns:
        the user configured parameter k, which controls the accuracy of the sketch and its memory space usage.
      • getN

        public abstract long getN()
        Description copied from interface: QuantilesAPI
        Gets the length of the input stream.
        Specified by:
        getN in interface QuantilesAPI
        Returns:
        the length of the input stream.
      • getNormalizedRankError

        public final double getNormalizedRankError​(boolean pmf)
        Gets the approximate rank error of this sketch normalized as a fraction between zero and one. The epsilon returned is a best fit to 99 percent confidence empirically measured max error in thousands of trials.
        Parameters:
        pmf - if true, returns the "double-sided" normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries.
        Returns:
        if pmf is true, returns the "double-sided" normalized rank error for the getPMF() function. Otherwise, it is the "single-sided" normalized rank error for all the other queries.
      • getNumRetained

        public final int getNumRetained()
        Description copied from interface: QuantilesAPI
        Gets the number of quantiles retained by the sketch.
        Specified by:
        getNumRetained in interface QuantilesAPI
        Returns:
        the number of quantiles retained by the sketch
      • getSerializedSizeBytes

        public int getSerializedSizeBytes()
        Returns the current number of bytes this Sketch would require if serialized.
        Returns:
        the number of bytes this sketch would require if serialized.
      • hasMemory

        public boolean hasMemory()
        Description copied from interface: QuantilesAPI
        Returns true if this sketch's data structure is backed by Memory or WritableMemory.
        Specified by:
        hasMemory in interface QuantilesAPI
        Returns:
        true if this sketch's data structure is backed by Memory or WritableMemory.
      • isDirect

        public boolean isDirect()
        Description copied from interface: QuantilesAPI
        Returns true if this sketch's data structure is off-heap (a.k.a., Direct or Native memory).
        Specified by:
        isDirect in interface QuantilesAPI
        Returns:
        true if this sketch's data structure is off-heap (a.k.a., Direct or Native memory).
      • isEmpty

        public final boolean isEmpty()
        Description copied from interface: QuantilesAPI
        Returns true if this sketch is empty.
        Specified by:
        isEmpty in interface QuantilesAPI
        Returns:
        true if this sketch is empty.
      • isEstimationMode

        public final boolean isEstimationMode()
        Description copied from interface: QuantilesAPI
        Returns true if this sketch is in estimation mode.
        Specified by:
        isEstimationMode in interface QuantilesAPI
        Returns:
        true if this sketch is in estimation mode.
      • isMemoryUpdatableFormat

        public final boolean isMemoryUpdatableFormat()
        Returns true if the backing WritableMemory is in updatable format.
        Returns:
        true if the backing WritableMemory is in updatable format.
      • isReadOnly

        public final boolean isReadOnly()
        Description copied from interface: QuantilesAPI
        Returns true if this sketch is read only.
        Specified by:
        isReadOnly in interface QuantilesAPI
        Returns:
        true if this sketch is read only.
      • isSameResource

        public final boolean isSameResource​(org.apache.datasketches.memory.Memory that)
        Returns true if the backing resource of this is identical with the backing resource of that. The capacities must be the same. If this is a region, the region offset must also be the same.
        Parameters:
        that - A different non-null object
        Returns:
        true if the backing resource of this is the same as the backing resource of that.
      • merge

        public final void merge​(KllSketch other)
        Merges another sketch into this one. Attempting to merge a KllDoublesSketch with a KllFloatsSketch will throw an exception.
        Parameters:
        other - sketch to merge into this one
      • toString

        public final String toString()
        Description copied from interface: QuantilesAPI
        Returns a summary of the key parameters of the sketch.
        Specified by:
        toString in interface QuantilesAPI
        Overrides:
        toString in class Object
        Returns:
        a summary of the key parameters of the sketch.
      • toString

        public String toString​(boolean withLevels,
                               boolean withData)
        Returns a summary of the sketch as a string.
        Parameters:
        withLevels - if true include information about levels
        withData - if true include sketch data
        Returns:
        string representation of sketch summary