Class TDigestDouble

java.lang.Object
org.apache.datasketches.tdigest.TDigestDouble

public final class TDigestDouble extends Object
t-Digest for estimating quantiles and ranks. This implementation is based on the following paper: Ted Dunning, Otmar Ertl. Extremely Accurate Quantiles Using t-Digests and the following implementation: https://github.com/tdunning/t-digest This implementation is similar to MergingDigest in the above implementation
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final short
    the default value of K if one is not specified
  • Constructor Summary

    Constructors
    Constructor
    Description
    Constructor with the default K
    TDigestDouble(short k)
    Constructor
  • Method Summary

    Modifier and Type
    Method
    Description
    double[]
    getCDF(double[] splitPoints)
    Returns an approximation to the Cumulative Distribution Function (CDF), which is the cumulative analog of the PMF, of the input stream given a set of split points.
    short
     
    double
     
    double
     
    double[]
    getPMF(double[] splitPoints)
    Returns an approximation to the Probability Mass Function (PMF) of the input stream given a set of split points.
    double
    getQuantile(double rank)
    Compute approximate quantile value corresponding to the given normalized rank
    double
    getRank(double value)
    Compute approximate normalized rank of the given value.
    long
     
    heapify(org.apache.datasketches.memory.Memory mem)
    Deserialize TDigest from a given memory.
    heapify(org.apache.datasketches.memory.Memory mem, boolean isFloat)
    Deserialize TDigest from a given memory.
    boolean
     
    void
    Merge the given TDigest into this one
    byte[]
    Serialize this TDigest to a byte array form.
    Human-readable summary of this TDigest as a string
    toString(boolean printCentroids)
    Human-readable summary of this TDigest as a string
    void
    update(double value)
    Update this TDigest with the given value

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Field Details

    • DEFAULT_K

      public static final short DEFAULT_K
      the default value of K if one is not specified
      See Also:
  • Constructor Details

    • TDigestDouble

      public TDigestDouble()
      Constructor with the default K
    • TDigestDouble

      public TDigestDouble(short k)
      Constructor
      Parameters:
      k - affects the size of TDigest and its estimation error
  • Method Details

    • getK

      public short getK()
      Returns:
      parameter k (compression) that was used to configure this TDigest
    • update

      public void update(double value)
      Update this TDigest with the given value
      Parameters:
      value - to update the TDigest with
    • merge

      public void merge(TDigestDouble other)
      Merge the given TDigest into this one
      Parameters:
      other - TDigest to merge
    • isEmpty

      public boolean isEmpty()
      Returns:
      true if TDigest has not seen any data
    • getMinValue

      public double getMinValue()
      Returns:
      minimum value seen by TDigest
    • getMaxValue

      public double getMaxValue()
      Returns:
      maximum value seen by TDigest
    • getTotalWeight

      public long getTotalWeight()
      Returns:
      total weight
    • getRank

      public double getRank(double value)
      Compute approximate normalized rank of the given value.
      Parameters:
      value - to be ranked
      Returns:
      normalized rank (from 0 to 1 inclusive)
    • getQuantile

      public double getQuantile(double rank)
      Compute approximate quantile value corresponding to the given normalized rank
      Parameters:
      rank - normalized rank (from 0 to 1 inclusive)
      Returns:
      quantile value corresponding to the given rank
    • getPMF

      public double[] getPMF(double[] splitPoints)
      Returns an approximation to the Probability Mass Function (PMF) of the input stream given a set of split points.
      Parameters:
      splitPoints - an array of m unique, monotonically increasing values that divide the input domain into m+1 consecutive disjoint intervals (bins).
      Returns:
      an array of m+1 doubles each of which is an approximation to the fraction of the input stream values (the mass) that fall into one of those intervals.
      Throws:
      SketchesStateException - if sketch is empty.
    • getCDF

      public double[] getCDF(double[] splitPoints)
      Returns an approximation to the Cumulative Distribution Function (CDF), which is the cumulative analog of the PMF, of the input stream given a set of split points.
      Parameters:
      splitPoints - an array of m unique, monotonically increasing values that divide the input domain into m+1 consecutive disjoint intervals.
      Returns:
      an array of m+1 doubles, which are a consecutive approximation to the CDF of the input stream given the splitPoints. The value at array position j of the returned CDF array is the sum of the returned values in positions 0 through j of the returned PMF array. This can be viewed as array of ranks of the given split points plus one more value that is always 1.
      Throws:
      SketchesStateException - if sketch is empty.
    • toByteArray

      public byte[] toByteArray()
      Serialize this TDigest to a byte array form.
      Returns:
      byte array
    • heapify

      public static TDigestDouble heapify(org.apache.datasketches.memory.Memory mem)
      Deserialize TDigest from a given memory. Supports reading format of the reference implementation (autodetected).
      Parameters:
      mem - instance of Memory
      Returns:
      an instance of TDigest
    • heapify

      public static TDigestDouble heapify(org.apache.datasketches.memory.Memory mem, boolean isFloat)
      Deserialize TDigest from a given memory. Supports reading compact format with (float, int) centroids as opposed to (double, long) to represent (mean, weight). Supports reading format of the reference implementation (autodetected).
      Parameters:
      mem - instance of Memory
      isFloat - if true the input represents (float, int) format
      Returns:
      an instance of TDigest
    • toString

      public String toString()
      Human-readable summary of this TDigest as a string
      Overrides:
      toString in class Object
      Returns:
      summary of this TDigest
    • toString

      public String toString(boolean printCentroids)
      Human-readable summary of this TDigest as a string
      Parameters:
      printCentroids - if true append the list of centroids with weights
      Returns:
      summary of this TDigest