Class TDigestDouble


  • public final class TDigestDouble
    extends Object
    t-Digest for estimating quantiles and ranks. This implementation is based on the following paper: Ted Dunning, Otmar Ertl. Extremely Accurate Quantiles Using t-Digests and the following implementation: https://github.com/tdunning/t-digest This implementation is similar to MergingDigest in the above implementation
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static short DEFAULT_K
      the default value of K if one is not specified
    • Constructor Summary

      Constructors 
      Constructor Description
      TDigestDouble()
      Constructor with the default K
      TDigestDouble​(short k)
      Constructor
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void compress()
      Process buffered values and merge centroids if needed
      double[] getCDF​(double[] splitPoints)
      Returns an approximation to the Cumulative Distribution Function (CDF), which is the cumulative analog of the PMF, of the input stream given a set of split points.
      short getK()  
      double getMaxValue()  
      double getMinValue()  
      double[] getPMF​(double[] splitPoints)
      Returns an approximation to the Probability Mass Function (PMF) of the input stream given a set of split points.
      double getQuantile​(double rank)
      Compute approximate quantile value corresponding to the given normalized rank
      double getRank​(double value)
      Compute approximate normalized rank of the given value.
      long getTotalWeight()  
      static TDigestDouble heapify​(org.apache.datasketches.memory.Memory mem)
      Deserialize TDigest from a given memory.
      static TDigestDouble heapify​(org.apache.datasketches.memory.Memory mem, boolean isFloat)
      Deserialize TDigest from a given memory.
      boolean isEmpty()  
      void merge​(TDigestDouble other)
      Merge the given TDigest into this one
      byte[] toByteArray()
      Serialize this TDigest to a byte array form.
      String toString()
      Human-readable summary of this TDigest as a string
      String toString​(boolean printCentroids)
      Human-readable summary of this TDigest as a string
      void update​(double value)
      Update this TDigest with the given value
    • Field Detail

      • DEFAULT_K

        public static final short DEFAULT_K
        the default value of K if one is not specified
        See Also:
        Constant Field Values
    • Constructor Detail

      • TDigestDouble

        public TDigestDouble()
        Constructor with the default K
      • TDigestDouble

        public TDigestDouble​(short k)
        Constructor
        Parameters:
        k - affects the size of TDigest and its estimation error
    • Method Detail

      • getK

        public short getK()
        Returns:
        parameter k (compression) that was used to configure this TDigest
      • update

        public void update​(double value)
        Update this TDigest with the given value
        Parameters:
        value - to update the TDigest with
      • merge

        public void merge​(TDigestDouble other)
        Merge the given TDigest into this one
        Parameters:
        other - TDigest to merge
      • compress

        public void compress()
        Process buffered values and merge centroids if needed
      • isEmpty

        public boolean isEmpty()
        Returns:
        true if TDigest has not seen any data
      • getMinValue

        public double getMinValue()
        Returns:
        minimum value seen by TDigest
      • getMaxValue

        public double getMaxValue()
        Returns:
        maximum value seen by TDigest
      • getTotalWeight

        public long getTotalWeight()
        Returns:
        total weight
      • getRank

        public double getRank​(double value)
        Compute approximate normalized rank of the given value.
        Parameters:
        value - to be ranked
        Returns:
        normalized rank (from 0 to 1 inclusive)
      • getQuantile

        public double getQuantile​(double rank)
        Compute approximate quantile value corresponding to the given normalized rank
        Parameters:
        rank - normalized rank (from 0 to 1 inclusive)
        Returns:
        quantile value corresponding to the given rank
      • getPMF

        public double[] getPMF​(double[] splitPoints)
        Returns an approximation to the Probability Mass Function (PMF) of the input stream given a set of split points.
        Parameters:
        splitPoints - an array of m unique, monotonically increasing values that divide the input domain into m+1 consecutive disjoint intervals (bins).
        Returns:
        an array of m+1 doubles each of which is an approximation to the fraction of the input stream values (the mass) that fall into one of those intervals.
        Throws:
        SketchesStateException - if sketch is empty.
      • getCDF

        public double[] getCDF​(double[] splitPoints)
        Returns an approximation to the Cumulative Distribution Function (CDF), which is the cumulative analog of the PMF, of the input stream given a set of split points.
        Parameters:
        splitPoints - an array of m unique, monotonically increasing values that divide the input domain into m+1 consecutive disjoint intervals.
        Returns:
        an array of m+1 doubles, which are a consecutive approximation to the CDF of the input stream given the splitPoints. The value at array position j of the returned CDF array is the sum of the returned values in positions 0 through j of the returned PMF array. This can be viewed as array of ranks of the given split points plus one more value that is always 1.
        Throws:
        SketchesStateException - if sketch is empty.
      • toByteArray

        public byte[] toByteArray()
        Serialize this TDigest to a byte array form.
        Returns:
        byte array
      • heapify

        public static TDigestDouble heapify​(org.apache.datasketches.memory.Memory mem)
        Deserialize TDigest from a given memory. Supports reading format of the reference implementation (autodetected).
        Parameters:
        mem - instance of Memory
        Returns:
        an instance of TDigest
      • heapify

        public static TDigestDouble heapify​(org.apache.datasketches.memory.Memory mem,
                                            boolean isFloat)
        Deserialize TDigest from a given memory. Supports reading compact format with (float, int) centroids as opposed to (double, long) to represent (mean, weight). Supports reading format of the reference implementation (autodetected).
        Parameters:
        mem - instance of Memory
        isFloat - if true the input represents (float, int) format
        Returns:
        an instance of TDigest
      • toString

        public String toString()
        Human-readable summary of this TDigest as a string
        Overrides:
        toString in class Object
        Returns:
        summary of this TDigest
      • toString

        public String toString​(boolean printCentroids)
        Human-readable summary of this TDigest as a string
        Parameters:
        printCentroids - if true append the list of centroids with weights
        Returns:
        summary of this TDigest