Package org.apache.datasketches.tdigest
Class TDigestDouble
java.lang.Object
org.apache.datasketches.tdigest.TDigestDouble
t-Digest for estimating quantiles and ranks.
This implementation is based on the following paper:
Ted Dunning, Otmar Ertl. Extremely Accurate Quantiles Using t-Digests
and the following implementation:
https://github.com/tdunning/t-digest
This implementation is similar to MergingDigest in the above implementation
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final shortthe default value of K if one is not specified -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptiondouble[]getCDF(double[] splitPoints) Returns an approximation to the Cumulative Distribution Function (CDF), which is the cumulative analog of the PMF, of the input stream given a set of split points.shortgetK()doubledoubledouble[]getPMF(double[] splitPoints) Returns an approximation to the Probability Mass Function (PMF) of the input stream given a set of split points.doublegetQuantile(double rank) Compute approximate quantile value corresponding to the given normalized rankdoublegetRank(double value) Compute approximate normalized rank of the given value.longstatic TDigestDoubleheapify(org.apache.datasketches.memory.Memory mem) Deserialize TDigest from a given memory.static TDigestDoubleheapify(org.apache.datasketches.memory.Memory mem, boolean isFloat) Deserialize TDigest from a given memory.booleanisEmpty()voidmerge(TDigestDouble other) Merge the given TDigest into this onebyte[]Serialize this TDigest to a byte array form.toString()Human-readable summary of this TDigest as a stringtoString(boolean printCentroids) Human-readable summary of this TDigest as a stringvoidupdate(double value) Update this TDigest with the given value
-
Field Details
-
DEFAULT_K
public static final short DEFAULT_Kthe default value of K if one is not specified- See Also:
-
-
Constructor Details
-
TDigestDouble
public TDigestDouble()Constructor with the default K -
TDigestDouble
public TDigestDouble(short k) Constructor- Parameters:
k- affects the size of TDigest and its estimation error
-
-
Method Details
-
getK
public short getK()- Returns:
- parameter k (compression) that was used to configure this TDigest
-
update
public void update(double value) Update this TDigest with the given value- Parameters:
value- to update the TDigest with
-
merge
Merge the given TDigest into this one- Parameters:
other- TDigest to merge
-
isEmpty
public boolean isEmpty()- Returns:
- true if TDigest has not seen any data
-
getMinValue
public double getMinValue()- Returns:
- minimum value seen by TDigest
-
getMaxValue
public double getMaxValue()- Returns:
- maximum value seen by TDigest
-
getTotalWeight
public long getTotalWeight()- Returns:
- total weight
-
getRank
public double getRank(double value) Compute approximate normalized rank of the given value.- Parameters:
value- to be ranked- Returns:
- normalized rank (from 0 to 1 inclusive)
-
getQuantile
public double getQuantile(double rank) Compute approximate quantile value corresponding to the given normalized rank- Parameters:
rank- normalized rank (from 0 to 1 inclusive)- Returns:
- quantile value corresponding to the given rank
-
getPMF
public double[] getPMF(double[] splitPoints) Returns an approximation to the Probability Mass Function (PMF) of the input stream given a set of split points.- Parameters:
splitPoints- an array of m unique, monotonically increasing values that divide the input domain into m+1 consecutive disjoint intervals (bins).- Returns:
- an array of m+1 doubles each of which is an approximation to the fraction of the input stream values (the mass) that fall into one of those intervals.
- Throws:
SketchesStateException- if sketch is empty.
-
getCDF
public double[] getCDF(double[] splitPoints) Returns an approximation to the Cumulative Distribution Function (CDF), which is the cumulative analog of the PMF, of the input stream given a set of split points.- Parameters:
splitPoints- an array of m unique, monotonically increasing values that divide the input domain into m+1 consecutive disjoint intervals.- Returns:
- an array of m+1 doubles, which are a consecutive approximation to the CDF of the input stream given the splitPoints. The value at array position j of the returned CDF array is the sum of the returned values in positions 0 through j of the returned PMF array. This can be viewed as array of ranks of the given split points plus one more value that is always 1.
- Throws:
SketchesStateException- if sketch is empty.
-
toByteArray
public byte[] toByteArray()Serialize this TDigest to a byte array form.- Returns:
- byte array
-
heapify
Deserialize TDigest from a given memory. Supports reading format of the reference implementation (autodetected).- Parameters:
mem- instance of Memory- Returns:
- an instance of TDigest
-
heapify
Deserialize TDigest from a given memory. Supports reading compact format with (float, int) centroids as opposed to (double, long) to represent (mean, weight). Supports reading format of the reference implementation (autodetected).- Parameters:
mem- instance of MemoryisFloat- if true the input represents (float, int) format- Returns:
- an instance of TDigest
-
toString
Human-readable summary of this TDigest as a string -
toString
Human-readable summary of this TDigest as a string- Parameters:
printCentroids- if true append the list of centroids with weights- Returns:
- summary of this TDigest
-