Class TDigestDouble
java.lang.Object
org.apache.datasketches.tdigest.TDigestDouble
t-Digest for estimating quantiles and ranks.
This implementation is based on the following paper:
Ted Dunning, Otmar Ertl. Extremely Accurate Quantiles Using t-Digests
and the following implementation:
https://github.com/tdunning/t-digest
This implementation is similar to MergingDigest in the above implementation
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final shortthe default value of K if one is not specified -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptiondouble[]getCDF(double[] splitPoints) Returns an approximation to the Cumulative Distribution Function (CDF), which is the cumulative analog of the PMF, of the input stream given a set of split points.shortgetK()Returns parameter k (compression) that was used to configure this TDigestdoubleReturns maximum value seen by TDigestdoubleReturns minimum value seen by TDigestdouble[]getPMF(double[] splitPoints) Returns an approximation to the Probability Mass Function (PMF) of the input stream given a set of split points.doublegetQuantile(double rank) Compute approximate quantile value corresponding to the given normalized rankdoublegetRank(double value) Compute approximate normalized rank of the given value.longReturns total weightstatic TDigestDoubleheapify(MemorySegment seg) Deserialize TDigest from a given MemorySegment.static TDigestDoubleheapify(MemorySegment seg, boolean isFloat) Deserialize TDigest from a given MemorySegment.booleanisEmpty()Returns true if TDigest has not seen any datavoidmerge(TDigestDouble other) Merge the given TDigest into this onebyte[]Serialize this TDigest to a byte array form.toString()Human-readable summary of this TDigest as a stringtoString(boolean printCentroids) Human-readable summary of this TDigest as a stringvoidupdate(double value) Update this TDigest with the given value
-
Field Details
-
DEFAULT_K
public static final short DEFAULT_Kthe default value of K if one is not specified- See Also:
-
-
Constructor Details
-
TDigestDouble
public TDigestDouble()Constructor with the default K -
TDigestDouble
public TDigestDouble(short k) Constructor- Parameters:
k- affects the size of TDigest and its estimation error
-
-
Method Details
-
getK
public short getK()Returns parameter k (compression) that was used to configure this TDigest- Returns:
- parameter k (compression) that was used to configure this TDigest
-
update
public void update(double value) Update this TDigest with the given value- Parameters:
value- to update the TDigest with
-
merge
Merge the given TDigest into this one- Parameters:
other- TDigest to merge
-
isEmpty
public boolean isEmpty()Returns true if TDigest has not seen any data- Returns:
- true if TDigest has not seen any data
-
getMinValue
public double getMinValue()Returns minimum value seen by TDigest- Returns:
- minimum value seen by TDigest
-
getMaxValue
public double getMaxValue()Returns maximum value seen by TDigest- Returns:
- maximum value seen by TDigest
-
getTotalWeight
public long getTotalWeight()Returns total weight- Returns:
- total weight
-
getRank
public double getRank(double value) Compute approximate normalized rank of the given value.- Parameters:
value- to be ranked- Returns:
- normalized rank (from 0 to 1 inclusive)
-
getQuantile
public double getQuantile(double rank) Compute approximate quantile value corresponding to the given normalized rank- Parameters:
rank- normalized rank (from 0 to 1 inclusive)- Returns:
- quantile value corresponding to the given rank
-
getPMF
public double[] getPMF(double[] splitPoints) Returns an approximation to the Probability Mass Function (PMF) of the input stream given a set of split points.- Parameters:
splitPoints- an array of m unique, monotonically increasing values that divide the input domain into m+1 consecutive disjoint intervals (bins).- Returns:
- an array of m+1 doubles each of which is an approximation to the fraction of the input stream values (the mass) that fall into one of those intervals.
- Throws:
SketchesStateException- if sketch is empty.
-
getCDF
public double[] getCDF(double[] splitPoints) Returns an approximation to the Cumulative Distribution Function (CDF), which is the cumulative analog of the PMF, of the input stream given a set of split points.- Parameters:
splitPoints- an array of m unique, monotonically increasing values that divide the input domain into m+1 consecutive disjoint intervals.- Returns:
- an array of m+1 doubles, which are a consecutive approximation to the CDF of the input stream given the splitPoints. The value at array position j of the returned CDF array is the sum of the returned values in positions 0 through j of the returned PMF array. This can be viewed as array of ranks of the given split points plus one more value that is always 1.
- Throws:
SketchesStateException- if sketch is empty.
-
toByteArray
public byte[] toByteArray()Serialize this TDigest to a byte array form.- Returns:
- byte array
-
heapify
Deserialize TDigest from a given MemorySegment. Supports reading format of the reference implementation (auto-detected).- Parameters:
seg- instance of MemorySegment- Returns:
- an instance of TDigest
-
heapify
Deserialize TDigest from a given MemorySegment. Supports reading compact format with (float, int) centroids as opposed to (double, long) to represent (mean, weight). Supports reading format of the reference implementation (auto-detected).- Parameters:
seg- instance of MemorySegmentisFloat- if true the input represents (float, int) format- Returns:
- an instance of TDigest
-
toString
-
toString
Human-readable summary of this TDigest as a string- Parameters:
printCentroids- if true append the list of centroids with weights- Returns:
- summary of this TDigest
-