Package org.apache.datasketches.tdigest
Class TDigestDouble
- java.lang.Object
-
- org.apache.datasketches.tdigest.TDigestDouble
-
public final class TDigestDouble extends Object
t-Digest for estimating quantiles and ranks. This implementation is based on the following paper: Ted Dunning, Otmar Ertl. Extremely Accurate Quantiles Using t-Digests and the following implementation: https://github.com/tdunning/t-digest This implementation is similar to MergingDigest in the above implementation
-
-
Field Summary
Fields Modifier and Type Field Description static short
DEFAULT_K
the default value of K if one is not specified
-
Constructor Summary
Constructors Constructor Description TDigestDouble()
Constructor with the default KTDigestDouble(short k)
Constructor
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
compress()
Process buffered values and merge centroids if neededdouble[]
getCDF(double[] splitPoints)
Returns an approximation to the Cumulative Distribution Function (CDF), which is the cumulative analog of the PMF, of the input stream given a set of split points.short
getK()
double
getMaxValue()
double
getMinValue()
double[]
getPMF(double[] splitPoints)
Returns an approximation to the Probability Mass Function (PMF) of the input stream given a set of split points.double
getQuantile(double rank)
Compute approximate quantile value corresponding to the given normalized rankdouble
getRank(double value)
Compute approximate normalized rank of the given value.long
getTotalWeight()
static TDigestDouble
heapify(org.apache.datasketches.memory.Memory mem)
Deserialize TDigest from a given memory.static TDigestDouble
heapify(org.apache.datasketches.memory.Memory mem, boolean isFloat)
Deserialize TDigest from a given memory.boolean
isEmpty()
void
merge(TDigestDouble other)
Merge the given TDigest into this onebyte[]
toByteArray()
Serialize this TDigest to a byte array form.String
toString()
Human-readable summary of this TDigest as a stringString
toString(boolean printCentroids)
Human-readable summary of this TDigest as a stringvoid
update(double value)
Update this TDigest with the given value
-
-
-
Field Detail
-
DEFAULT_K
public static final short DEFAULT_K
the default value of K if one is not specified- See Also:
- Constant Field Values
-
-
Method Detail
-
getK
public short getK()
- Returns:
- parameter k (compression) that was used to configure this TDigest
-
update
public void update(double value)
Update this TDigest with the given value- Parameters:
value
- to update the TDigest with
-
merge
public void merge(TDigestDouble other)
Merge the given TDigest into this one- Parameters:
other
- TDigest to merge
-
compress
public void compress()
Process buffered values and merge centroids if needed
-
isEmpty
public boolean isEmpty()
- Returns:
- true if TDigest has not seen any data
-
getMinValue
public double getMinValue()
- Returns:
- minimum value seen by TDigest
-
getMaxValue
public double getMaxValue()
- Returns:
- maximum value seen by TDigest
-
getTotalWeight
public long getTotalWeight()
- Returns:
- total weight
-
getRank
public double getRank(double value)
Compute approximate normalized rank of the given value.- Parameters:
value
- to be ranked- Returns:
- normalized rank (from 0 to 1 inclusive)
-
getQuantile
public double getQuantile(double rank)
Compute approximate quantile value corresponding to the given normalized rank- Parameters:
rank
- normalized rank (from 0 to 1 inclusive)- Returns:
- quantile value corresponding to the given rank
-
getPMF
public double[] getPMF(double[] splitPoints)
Returns an approximation to the Probability Mass Function (PMF) of the input stream given a set of split points.- Parameters:
splitPoints
- an array of m unique, monotonically increasing values that divide the input domain into m+1 consecutive disjoint intervals (bins).- Returns:
- an array of m+1 doubles each of which is an approximation to the fraction of the input stream values (the mass) that fall into one of those intervals.
- Throws:
SketchesStateException
- if sketch is empty.
-
getCDF
public double[] getCDF(double[] splitPoints)
Returns an approximation to the Cumulative Distribution Function (CDF), which is the cumulative analog of the PMF, of the input stream given a set of split points.- Parameters:
splitPoints
- an array of m unique, monotonically increasing values that divide the input domain into m+1 consecutive disjoint intervals.- Returns:
- an array of m+1 doubles, which are a consecutive approximation to the CDF of the input stream given the splitPoints. The value at array position j of the returned CDF array is the sum of the returned values in positions 0 through j of the returned PMF array. This can be viewed as array of ranks of the given split points plus one more value that is always 1.
- Throws:
SketchesStateException
- if sketch is empty.
-
toByteArray
public byte[] toByteArray()
Serialize this TDigest to a byte array form.- Returns:
- byte array
-
heapify
public static TDigestDouble heapify(org.apache.datasketches.memory.Memory mem)
Deserialize TDigest from a given memory. Supports reading format of the reference implementation (autodetected).- Parameters:
mem
- instance of Memory- Returns:
- an instance of TDigest
-
heapify
public static TDigestDouble heapify(org.apache.datasketches.memory.Memory mem, boolean isFloat)
Deserialize TDigest from a given memory. Supports reading compact format with (float, int) centroids as opposed to (double, long) to represent (mean, weight). Supports reading format of the reference implementation (autodetected).- Parameters:
mem
- instance of MemoryisFloat
- if true the input represents (float, int) format- Returns:
- an instance of TDigest
-
toString
public String toString()
Human-readable summary of this TDigest as a string
-
toString
public String toString(boolean printCentroids)
Human-readable summary of this TDigest as a string- Parameters:
printCentroids
- if true append the list of centroids with weights- Returns:
- summary of this TDigest
-
-