Package org.apache.datasketches.tdigest
Class TDigestDouble
java.lang.Object
org.apache.datasketches.tdigest.TDigestDouble
t-Digest for estimating quantiles and ranks.
This implementation is based on the following paper:
Ted Dunning, Otmar Ertl. Extremely Accurate Quantiles Using t-Digests
and the following implementation:
https://github.com/tdunning/t-digest
This implementation is similar to MergingDigest in the above implementation
-
Field Summary
Modifier and TypeFieldDescriptionstatic final short
the default value of K if one is not specified -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptiondouble[]
getCDF
(double[] splitPoints) Returns an approximation to the Cumulative Distribution Function (CDF), which is the cumulative analog of the PMF, of the input stream given a set of split points.short
getK()
double
double
double[]
getPMF
(double[] splitPoints) Returns an approximation to the Probability Mass Function (PMF) of the input stream given a set of split points.double
getQuantile
(double rank) Compute approximate quantile value corresponding to the given normalized rankdouble
getRank
(double value) Compute approximate normalized rank of the given value.long
static TDigestDouble
heapify
(org.apache.datasketches.memory.Memory mem) Deserialize TDigest from a given memory.static TDigestDouble
heapify
(org.apache.datasketches.memory.Memory mem, boolean isFloat) Deserialize TDigest from a given memory.boolean
isEmpty()
void
merge
(TDigestDouble other) Merge the given TDigest into this onebyte[]
Serialize this TDigest to a byte array form.toString()
Human-readable summary of this TDigest as a stringtoString
(boolean printCentroids) Human-readable summary of this TDigest as a stringvoid
update
(double value) Update this TDigest with the given value
-
Field Details
-
DEFAULT_K
public static final short DEFAULT_Kthe default value of K if one is not specified- See Also:
-
-
Constructor Details
-
TDigestDouble
public TDigestDouble()Constructor with the default K -
TDigestDouble
public TDigestDouble(short k) Constructor- Parameters:
k
- affects the size of TDigest and its estimation error
-
-
Method Details
-
getK
public short getK()- Returns:
- parameter k (compression) that was used to configure this TDigest
-
update
public void update(double value) Update this TDigest with the given value- Parameters:
value
- to update the TDigest with
-
merge
Merge the given TDigest into this one- Parameters:
other
- TDigest to merge
-
isEmpty
public boolean isEmpty()- Returns:
- true if TDigest has not seen any data
-
getMinValue
public double getMinValue()- Returns:
- minimum value seen by TDigest
-
getMaxValue
public double getMaxValue()- Returns:
- maximum value seen by TDigest
-
getTotalWeight
public long getTotalWeight()- Returns:
- total weight
-
getRank
public double getRank(double value) Compute approximate normalized rank of the given value.- Parameters:
value
- to be ranked- Returns:
- normalized rank (from 0 to 1 inclusive)
-
getQuantile
public double getQuantile(double rank) Compute approximate quantile value corresponding to the given normalized rank- Parameters:
rank
- normalized rank (from 0 to 1 inclusive)- Returns:
- quantile value corresponding to the given rank
-
getPMF
public double[] getPMF(double[] splitPoints) Returns an approximation to the Probability Mass Function (PMF) of the input stream given a set of split points.- Parameters:
splitPoints
- an array of m unique, monotonically increasing values that divide the input domain into m+1 consecutive disjoint intervals (bins).- Returns:
- an array of m+1 doubles each of which is an approximation to the fraction of the input stream values (the mass) that fall into one of those intervals.
- Throws:
SketchesStateException
- if sketch is empty.
-
getCDF
public double[] getCDF(double[] splitPoints) Returns an approximation to the Cumulative Distribution Function (CDF), which is the cumulative analog of the PMF, of the input stream given a set of split points.- Parameters:
splitPoints
- an array of m unique, monotonically increasing values that divide the input domain into m+1 consecutive disjoint intervals.- Returns:
- an array of m+1 doubles, which are a consecutive approximation to the CDF of the input stream given the splitPoints. The value at array position j of the returned CDF array is the sum of the returned values in positions 0 through j of the returned PMF array. This can be viewed as array of ranks of the given split points plus one more value that is always 1.
- Throws:
SketchesStateException
- if sketch is empty.
-
toByteArray
public byte[] toByteArray()Serialize this TDigest to a byte array form.- Returns:
- byte array
-
heapify
Deserialize TDigest from a given memory. Supports reading format of the reference implementation (autodetected).- Parameters:
mem
- instance of Memory- Returns:
- an instance of TDigest
-
heapify
Deserialize TDigest from a given memory. Supports reading compact format with (float, int) centroids as opposed to (double, long) to represent (mean, weight). Supports reading format of the reference implementation (autodetected).- Parameters:
mem
- instance of MemoryisFloat
- if true the input represents (float, int) format- Returns:
- an instance of TDigest
-
toString
Human-readable summary of this TDigest as a string -
toString
Human-readable summary of this TDigest as a string- Parameters:
printCentroids
- if true append the list of centroids with weights- Returns:
- summary of this TDigest
-