Class ReqSketch
- java.lang.Object
-
- org.apache.datasketches.req.ReqSketch
-
- All Implemented Interfaces:
QuantilesAPI
,QuantilesFloatsAPI
public final class ReqSketch extends Object
This Relative Error Quantiles Sketch is the Java implementation based on the paper "Relative Error Streaming Quantiles" by Graham Cormode, Zohar Karnin, Edo Liberty, Justin Thaler, Pavel VeselĂ˝, and loosely derived from a Python prototype written by Pavel VeselĂ˝.Reference: https://arxiv.org/abs/2004.01668
This implementation differs from the algorithm described in the paper in the following:
- The algorithm requires no upper bound on the stream length. Instead, each relative-compactor counts the number of compaction operations performed so far (via variable state). Initially, the relative-compactor starts with INIT_NUMBER_OF_SECTIONS. Each time the number of compactions (variable state) exceeds 2^{numSections - 1}, we double numSections. Note that after merging the sketch with another one variable state may not correspond to the number of compactions performed at a particular level, however, since the state variable never exceeds the number of compactions, the guarantees of the sketch remain valid.
- The size of each section (variable k and sectionSize in the code and parameter k in the paper) is initialized with a number set by the user via variable k. When the number of sections doubles, we decrease sectionSize by a factor of sqrt(2). This is applied at each level separately. Thus, when we double the number of sections, the nominal compactor size increases by a factor of approx. sqrt(2) (+/- rounding).
- The merge operation here does not perform "special compactions", which are used in the paper to allow for a tight mathematical analysis of the sketch.
This implementation provides a number of capabilities not discussed in the paper or provided in the Python prototype.
- The Python prototype only implemented high accuracy for low ranks. This implementation provides the user with the ability to choose either high rank accuracy or low rank accuracy at the time of sketch construction.
- The Python prototype only implemented a comparison criterion of "INCLUSIVE". This implementation allows the user to switch back and forth between the "INCLUSIVE" criterion and the "EXCLUSIVE" criterion.
- This implementation provides extensive debug visibility into the operation of the sketch with two levels of detail output. This is not only useful for debugging, but is a powerful tool to help users understand how the sketch works.
- Author:
- Edo Liberty, Pavel Vesely, Lee Rhodes
- See Also:
QuantilesAPI
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface org.apache.datasketches.quantilescommon.QuantilesFloatsAPI
QuantilesFloatsAPI.FloatsPartitionBoundaries
-
-
Field Summary
-
Fields inherited from interface org.apache.datasketches.quantilescommon.QuantilesAPI
EMPTY_MSG, MEM_REQ_SVR_NULL_MSG, NOT_SINGLE_ITEM_MSG, TGT_IS_READ_ONLY_MSG, UNSUPPORTED_MSG
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static ReqSketchBuilder
builder()
Returns a new ReqSketchBuilderdouble[]
getCDF(float[] splitPoints, QuantileSearchCriteria searchCrit)
Returns an approximation to the Cumulative Distribution Function (CDF) of the input stream as a monotonically increasing array of double ranks (or cumulative probabilities) on the interval [0.0, 1.0], given a set of splitPoints.boolean
getHighRankAccuracyMode()
If true, the high ranks are prioritized for better accuracy.int
getK()
Gets the user configured parameter k, which controls the accuracy of the sketch and its memory space usage.float
getMaxItem()
Returns the maximum item of the stream.float
getMinItem()
Returns the minimum item of the stream.long
getN()
Gets the length of the input stream.int
getNumRetained()
Gets the number of quantiles retained by the sketch.QuantilesFloatsAPI.FloatsPartitionBoundaries
getPartitionBoundaries(int numEquallyWeighted, QuantileSearchCriteria searchCrit)
This method returns an instance ofFloatsPartitionBoundaries
which provides sufficient information for the user to create the given number of equally weighted partitions.double[]
getPMF(float[] splitPoints, QuantileSearchCriteria searchCrit)
Returns an approximation to the Probability Mass Function (PMF) of the input stream as an array of probability masses as doubles on the interval [0.0, 1.0], given a set of splitPoints.float
getQuantile(double normRank, QuantileSearchCriteria searchCrit)
Gets the approximate quantile of the given normalized rank and the given search criterion.float
getQuantileLowerBound(double rank)
Gets the lower bound of the quantile confidence interval in which the quantile of the given rank exists.float
getQuantileLowerBound(double rank, int numStdDev)
float[]
getQuantiles(double[] normRanks, QuantileSearchCriteria searchCrit)
Gets an array of quantiles from the given array of normalized ranks.float
getQuantileUpperBound(double rank)
Gets the upper bound of the quantile confidence interval in which the true quantile of the given rank exists.float
getQuantileUpperBound(double rank, int numStdDev)
double
getRank(float quantile, QuantileSearchCriteria searchCrit)
Gets the normalized rank corresponding to the given a quantile.double
getRankLowerBound(double rank)
Gets the lower bound of the rank confidence interval in which the true rank of the given rank exists.double
getRankLowerBound(double rank, int numStdDev)
Gets an approximate lower bound rank of the given normalized rank.double[]
getRanks(float[] quantiles, QuantileSearchCriteria searchCrit)
Gets an array of normalized ranks corresponding to the given array of quantiles and the given search criterion.double
getRankUpperBound(double rank)
Gets the upper bound of the rank confidence interval in which the true rank of the given rank exists.double
getRankUpperBound(double rank, int numStdDev)
Gets an approximate upper bound rank of the given rank.static double
getRSE(int k, double rank, boolean hra, long totalN)
Returns an a priori estimate of relative standard error (RSE, expressed as a number in [0,1]).int
getSerializedSizeBytes()
Returns the current number of bytes this Sketch would require if serialized.FloatsSortedView
getSortedView()
Gets the sorted view of this sketchboolean
hasMemory()
Returns true if this sketch's data structure is backed by Memory or WritableMemory.static ReqSketch
heapify(org.apache.datasketches.memory.Memory mem)
Returns an ReqSketch on the heap from a Memory image of the sketch.boolean
isDirect()
Returns true if this sketch's data structure is off-heap (a.k.a., Direct or Native memory).boolean
isEmpty()
Returns true if this sketch is empty.boolean
isEstimationMode()
Returns true if this sketch is in estimation mode.boolean
isReadOnly()
Returns true if this sketch is read only.QuantilesFloatsSketchIterator
iterator()
Gets the iterator for this sketch, which is not sorted.ReqSketch
merge(ReqSketch other)
Merge other sketch into this one.void
reset()
Resets this sketch to the empty state.byte[]
toByteArray()
Returns a byte array representation of this sketch.String
toString()
Returns a summary of the key parameters of the sketch.void
update(float item)
Updates this sketch with the given item.String
viewCompactorDetail(String fmt, boolean allData)
A detailed, human readable view of the sketch compactors and their data.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface org.apache.datasketches.quantilescommon.QuantilesFloatsAPI
getCDF, getPartitionBoundaries, getPMF, getQuantile, getQuantiles, getRank, getRanks
-
-
-
-
Method Detail
-
builder
public static final ReqSketchBuilder builder()
Returns a new ReqSketchBuilder- Returns:
- a new ReqSketchBuilder
-
heapify
public static ReqSketch heapify(org.apache.datasketches.memory.Memory mem)
Returns an ReqSketch on the heap from a Memory image of the sketch.- Parameters:
mem
- The Memory object holding a valid image of an ReqSketch- Returns:
- an ReqSketch on the heap from a Memory image of the sketch.
-
getK
public int getK()
Description copied from interface:QuantilesAPI
Gets the user configured parameter k, which controls the accuracy of the sketch and its memory space usage.- Specified by:
getK
in interfaceQuantilesAPI
- Returns:
- the user configured parameter k, which controls the accuracy of the sketch and its memory space usage.
-
getCDF
public double[] getCDF(float[] splitPoints, QuantileSearchCriteria searchCrit)
Description copied from interface:QuantilesFloatsAPI
Returns an approximation to the Cumulative Distribution Function (CDF) of the input stream as a monotonically increasing array of double ranks (or cumulative probabilities) on the interval [0.0, 1.0], given a set of splitPoints.The resulting approximations have a probabilistic guarantee that can be obtained from the getNormalizedRankError(false) function.
- Specified by:
getCDF
in interfaceQuantilesFloatsAPI
- Parameters:
splitPoints
- an array of m unique, monotonically increasing items (of the same type as the input items) that divide the item input domain into m+1 overlapping intervals.The start of each interval is below the lowest item retained by the sketch corresponding to a zero rank or zero probability, and the end of the interval is the rank or cumulative probability corresponding to the split point.
The (m+1)th interval represents 100% of the distribution represented by the sketch and consistent with the definition of a cumulative probability distribution, thus the (m+1)th rank or probability in the returned array is always 1.0.
If a split point exactly equals a retained item of the sketch and the search criterion is:
- INCLUSIVE, the resulting cumulative probability will include that item.
- EXCLUSIVE, the resulting cumulative probability will not include the weight of that split point.
It is not recommended to include either the minimum or maximum items of the input stream.
searchCrit
- the desired search criteria.- Returns:
- a discrete CDF array of m+1 double ranks (or cumulative probabilities) on the interval [0.0, 1.0].
-
getHighRankAccuracyMode
public boolean getHighRankAccuracyMode()
If true, the high ranks are prioritized for better accuracy. Otherwise the low ranks are prioritized for better accuracy. This state is chosen during sketch construction.- Returns:
- the high ranks accuracy state.
-
getMaxItem
public float getMaxItem()
Description copied from interface:QuantilesFloatsAPI
Returns the maximum item of the stream. This is provided for convenience, but may be different from the largest item retained by the sketch algorithm.- Specified by:
getMaxItem
in interfaceQuantilesFloatsAPI
- Returns:
- the maximum item of the stream
-
getMinItem
public float getMinItem()
Description copied from interface:QuantilesFloatsAPI
Returns the minimum item of the stream. This is provided for convenience, but is distinct from the smallest item retained by the sketch algorithm.- Specified by:
getMinItem
in interfaceQuantilesFloatsAPI
- Returns:
- the minimum item of the stream
-
getN
public long getN()
Description copied from interface:QuantilesAPI
Gets the length of the input stream.- Specified by:
getN
in interfaceQuantilesAPI
- Returns:
- the length of the input stream.
-
getPMF
public double[] getPMF(float[] splitPoints, QuantileSearchCriteria searchCrit)
Description copied from interface:QuantilesFloatsAPI
Returns an approximation to the Probability Mass Function (PMF) of the input stream as an array of probability masses as doubles on the interval [0.0, 1.0], given a set of splitPoints.The resulting approximations have a probabilistic guarantee that can be obtained from the getNormalizedRankError(true) function.
- Specified by:
getPMF
in interfaceQuantilesFloatsAPI
- Parameters:
splitPoints
- an array of m unique, monotonically increasing items (of the same type as the input items) that divide the item input domain into m+1 consecutive, non-overlapping intervals.Each interval except for the end intervals starts with a split point and ends with the next split point in sequence.
The first interval starts below the lowest item retained by the sketch corresponding to a zero rank or zero probability, and ends with the first split point
The last (m+1)th interval starts with the last split point and ends after the last item retained by the sketch corresponding to a rank or probability of 1.0.
The sum of the probability masses of all (m+1) intervals is 1.0.
If the search criterion is:
- INCLUSIVE, and the upper split point of an interval equals an item retained by the sketch, the interval will include that item. If the lower split point equals an item retained by the sketch, the interval will exclude that item.
- EXCLUSIVE, and the upper split point of an interval equals an item retained by the sketch, the interval will exclude that item. If the lower split point equals an item retained by the sketch, the interval will include that item.
It is not recommended to include either the minimum or maximum items of the input stream.
searchCrit
- the desired search criteria.- Returns:
- a PMF array of m+1 probability masses as doubles on the interval [0.0, 1.0].
-
getQuantile
public float getQuantile(double normRank, QuantileSearchCriteria searchCrit)
Description copied from interface:QuantilesFloatsAPI
Gets the approximate quantile of the given normalized rank and the given search criterion.- Specified by:
getQuantile
in interfaceQuantilesFloatsAPI
- Parameters:
normRank
- the given normalized rank, a double in the range [0.0, 1.0].searchCrit
- If INCLUSIVE, the given rank includes all quantiles ≤ the quantile directly corresponding to the given rank. If EXCLUSIVE, he given rank includes all quantiles < the quantile directly corresponding to the given rank.- Returns:
- the approximate quantile given the normalized rank.
- See Also:
QuantileSearchCriteria
-
getQuantiles
public float[] getQuantiles(double[] normRanks, QuantileSearchCriteria searchCrit)
Description copied from interface:QuantilesFloatsAPI
Gets an array of quantiles from the given array of normalized ranks.- Specified by:
getQuantiles
in interfaceQuantilesFloatsAPI
- Parameters:
normRanks
- the given array of normalized ranks, each of which must be in the interval [0.0,1.0].searchCrit
- if INCLUSIVE, the given ranks include all quantiles ≤ the quantile directly corresponding to each rank.- Returns:
- an array of quantiles corresponding to the given array of normalized ranks.
- See Also:
QuantileSearchCriteria
-
getQuantileLowerBound
public float getQuantileLowerBound(double rank)
Gets the lower bound of the quantile confidence interval in which the quantile of the given rank exists.Although it is possible to estimate the probablity that the true quantile exists within the quantile confidence interval specified by the upper and lower quantile bounds, it is not possible to guarantee the width of the quantile confidence interval as an additive or multiplicative percent of the true quantile.
The approximate probability that the true quantile is within the confidence interval specified by the upper and lower quantile bounds for this sketch is 0.95.- Specified by:
getQuantileLowerBound
in interfaceQuantilesFloatsAPI
- Parameters:
rank
- the given normalized rank- Returns:
- the lower bound of the quantile confidence interval in which the quantile of the given rank exists.
-
getQuantileLowerBound
public float getQuantileLowerBound(double rank, int numStdDev)
-
getQuantileUpperBound
public float getQuantileUpperBound(double rank)
Gets the upper bound of the quantile confidence interval in which the true quantile of the given rank exists.Although it is possible to estimate the probablity that the true quantile exists within the quantile confidence interval specified by the upper and lower quantile bounds, it is not possible to guarantee the width of the quantile interval as an additive or multiplicative percent of the true quantile.
The approximate probability that the true quantile is within the confidence interval specified by the upper and lower quantile bounds for this sketch is 0.95.- Specified by:
getQuantileUpperBound
in interfaceQuantilesFloatsAPI
- Parameters:
rank
- the given normalized rank- Returns:
- the upper bound of the quantile confidence interval in which the true quantile of the given rank exists.
-
getQuantileUpperBound
public float getQuantileUpperBound(double rank, int numStdDev)
-
getRank
public double getRank(float quantile, QuantileSearchCriteria searchCrit)
Description copied from interface:QuantilesFloatsAPI
Gets the normalized rank corresponding to the given a quantile.- Specified by:
getRank
in interfaceQuantilesFloatsAPI
- Parameters:
quantile
- the given quantilesearchCrit
- if INCLUSIVE the given quantile is included into the rank.- Returns:
- the normalized rank corresponding to the given quantile.
- See Also:
QuantileSearchCriteria
-
getRankLowerBound
public double getRankLowerBound(double rank)
Gets the lower bound of the rank confidence interval in which the true rank of the given rank exists. The approximate probability that the true rank is within the confidence interval specified by the upper and lower rank bounds for this sketch is 0.95.- Parameters:
rank
- the given normalized rank.- Returns:
- the lower bound of the rank confidence interval in which the true rank of the given rank exists.
-
getRankLowerBound
public double getRankLowerBound(double rank, int numStdDev)
Gets an approximate lower bound rank of the given normalized rank.- Parameters:
rank
- the given rank, a number between 0 and 1.0.numStdDev
- the number of standard deviations. Must be 1, 2, or 3.- Returns:
- an approximate lower bound rank.
-
getRanks
public double[] getRanks(float[] quantiles, QuantileSearchCriteria searchCrit)
Description copied from interface:QuantilesFloatsAPI
Gets an array of normalized ranks corresponding to the given array of quantiles and the given search criterion.- Specified by:
getRanks
in interfaceQuantilesFloatsAPI
- Parameters:
quantiles
- the given array of quantilessearchCrit
- if INCLUSIVE, the given quantiles include the rank directly corresponding to each quantile.- Returns:
- an array of normalized ranks corresponding to the given array of quantiles.
- See Also:
QuantileSearchCriteria
-
getRankUpperBound
public double getRankUpperBound(double rank)
Gets the upper bound of the rank confidence interval in which the true rank of the given rank exists. The approximate probability that the true rank is within the confidence interval specified by the upper and lower rank bounds for this sketch is 0.95.- Parameters:
rank
- the given normalized rank.- Returns:
- the upper bound of the rank confidence interval in which the true rank of the given rank exists.
-
getRankUpperBound
public double getRankUpperBound(double rank, int numStdDev)
Gets an approximate upper bound rank of the given rank.- Parameters:
rank
- the given rank, a number between 0 and 1.0.numStdDev
- the number of standard deviations. Must be 1, 2, or 3.- Returns:
- an approximate upper bound rank.
-
getNumRetained
public int getNumRetained()
Description copied from interface:QuantilesAPI
Gets the number of quantiles retained by the sketch.- Specified by:
getNumRetained
in interfaceQuantilesAPI
- Returns:
- the number of quantiles retained by the sketch
-
getSerializedSizeBytes
public int getSerializedSizeBytes()
Description copied from interface:QuantilesFloatsAPI
Returns the current number of bytes this Sketch would require if serialized.- Specified by:
getSerializedSizeBytes
in interfaceQuantilesFloatsAPI
- Returns:
- the number of bytes this sketch would require if serialized.
-
getSortedView
public FloatsSortedView getSortedView()
Description copied from interface:QuantilesFloatsAPI
Gets the sorted view of this sketch- Specified by:
getSortedView
in interfaceQuantilesFloatsAPI
- Returns:
- the sorted view of this sketch
-
isEmpty
public boolean isEmpty()
Description copied from interface:QuantilesAPI
Returns true if this sketch is empty.- Specified by:
isEmpty
in interfaceQuantilesAPI
- Returns:
- true if this sketch is empty.
-
isEstimationMode
public boolean isEstimationMode()
Description copied from interface:QuantilesAPI
Returns true if this sketch is in estimation mode.- Specified by:
isEstimationMode
in interfaceQuantilesAPI
- Returns:
- true if this sketch is in estimation mode.
-
iterator
public QuantilesFloatsSketchIterator iterator()
Description copied from interface:QuantilesFloatsAPI
Gets the iterator for this sketch, which is not sorted.- Specified by:
iterator
in interfaceQuantilesFloatsAPI
- Returns:
- the iterator for this sketch
-
merge
public ReqSketch merge(ReqSketch other)
Merge other sketch into this one. The other sketch is not modified.- Parameters:
other
- sketch to be merged into this one.- Returns:
- this
-
reset
public void reset()
Resets this sketch to the empty state. If the sketch is read only this does nothing.The parameter k will not change.
The parameters k, highRankAccuracy, and reqDebug will not change.
- Specified by:
reset
in interfaceQuantilesAPI
-
toByteArray
public byte[] toByteArray()
Description copied from interface:QuantilesFloatsAPI
Returns a byte array representation of this sketch.- Specified by:
toByteArray
in interfaceQuantilesFloatsAPI
- Returns:
- a byte array representation of this sketch.
-
toString
public String toString()
Description copied from interface:QuantilesAPI
Returns a summary of the key parameters of the sketch.- Specified by:
toString
in interfaceQuantilesAPI
- Returns:
- a summary of the key parameters of the sketch.
-
update
public void update(float item)
Description copied from interface:QuantilesFloatsAPI
Updates this sketch with the given item.- Specified by:
update
in interfaceQuantilesFloatsAPI
- Parameters:
item
- from a stream of quantiles. NaNs are ignored.
-
viewCompactorDetail
public String viewCompactorDetail(String fmt, boolean allData)
A detailed, human readable view of the sketch compactors and their data. Each compactor string is prepended by the compactor lgWeight, the current number of retained quantiles of the compactor and the current nominal capacity of the compactor.- Parameters:
fmt
- the format string for the quantiles; example: "%4.0f".allData
- all the retained quantiles for the sketch will be output by compactor level. Otherwise, just a summary will be output.- Returns:
- a detailed view of the compactors and their data
-
getPartitionBoundaries
public QuantilesFloatsAPI.FloatsPartitionBoundaries getPartitionBoundaries(int numEquallyWeighted, QuantileSearchCriteria searchCrit)
Description copied from interface:QuantilesFloatsAPI
This method returns an instance ofFloatsPartitionBoundaries
which provides sufficient information for the user to create the given number of equally weighted partitions.- Specified by:
getPartitionBoundaries
in interfaceQuantilesFloatsAPI
- Parameters:
numEquallyWeighted
- an integer that specifies the number of equally weighted partitions betweengetMinItem()
andgetMaxItem()
. This must be a positive integer greater than zero.- A 1 will return: minItem, maxItem.
- A 2 will return: minItem, median quantile, maxItem.
- Etc.
searchCrit
- If INCLUSIVE, all the returned quantiles are the upper boundaries of the equally weighted partitions with the exception of the lowest returned quantile, which is the lowest boundary of the lowest ranked partition. If EXCLUSIVE, all the returned quantiles are the lower boundaries of the equally weighted partitions with the exception of the highest returned quantile, which is the upper boundary of the highest ranked partition.- Returns:
- an instance of
FloatsPartitionBoundaries
.
-
getRSE
public static double getRSE(int k, double rank, boolean hra, long totalN)
Returns an a priori estimate of relative standard error (RSE, expressed as a number in [0,1]). Derived from Lemma 12 in https://arxiv.org/abs/2004.01668v2, but the constant factors were adjusted based on empirical measurements.- Parameters:
k
- the given size of krank
- the given normalized rank, a number in [0,1].hra
- if true High Rank Accuracy mode is being selected, otherwise, Low Rank Accuracy.totalN
- an estimate of the total number of items submitted to the sketch.- Returns:
- an a priori estimate of relative standard error (RSE, expressed as a number in [0,1]).
-
hasMemory
public boolean hasMemory()
Description copied from interface:QuantilesAPI
Returns true if this sketch's data structure is backed by Memory or WritableMemory.- Specified by:
hasMemory
in interfaceQuantilesAPI
- Returns:
- true if this sketch's data structure is backed by Memory or WritableMemory.
-
isDirect
public boolean isDirect()
Description copied from interface:QuantilesAPI
Returns true if this sketch's data structure is off-heap (a.k.a., Direct or Native memory).- Specified by:
isDirect
in interfaceQuantilesAPI
- Returns:
- true if this sketch's data structure is off-heap (a.k.a., Direct or Native memory).
-
isReadOnly
public boolean isReadOnly()
Description copied from interface:QuantilesAPI
Returns true if this sketch is read only.- Specified by:
isReadOnly
in interfaceQuantilesAPI
- Returns:
- true if this sketch is read only.
-
-