Class KllDoublesSketch
- java.lang.Object
-
- org.apache.datasketches.kll.KllSketch
-
- org.apache.datasketches.kll.KllDoublesSketch
-
- All Implemented Interfaces:
QuantilesAPI,QuantilesDoublesAPI
public abstract class KllDoublesSketch extends KllSketch implements QuantilesDoublesAPI
This variation of the KllSketch implements primitive doubles.- See Also:
KllSketch
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.datasketches.kll.KllSketch
KllSketch.SketchType
-
Nested classes/interfaces inherited from interface org.apache.datasketches.quantilescommon.QuantilesDoublesAPI
QuantilesDoublesAPI.DoublesPartitionBoundaries
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description double[]getCDF(double[] splitPoints, QuantileSearchCriteria searchCrit)Returns an approximation to the Cumulative Distribution Function (CDF) of the input stream as a monotonically increasing array of double ranks (or cumulative probabilities) on the interval [0.0, 1.0], given a set of splitPoints.doublegetMaxItem()Returns the maximum item of the stream.static intgetMaxSerializedSizeBytes(int k, long n, boolean updatableMemoryFormat)Returns upper bound on the serialized size of a KllDoublesSketch given the following parameters.doublegetMinItem()Returns the minimum item of the stream.QuantilesDoublesAPI.DoublesPartitionBoundariesgetPartitionBoundaries(int numEquallyWeighted, QuantileSearchCriteria searchCrit)This method returns an instance ofDoublesPartitionBoundarieswhich provides sufficient information for the user to create the given number of equally weighted partitions.double[]getPMF(double[] splitPoints, QuantileSearchCriteria searchCrit)Returns an approximation to the Probability Mass Function (PMF) of the input stream as an array of probability masses as doubles on the interval [0.0, 1.0], given a set of splitPoints.doublegetQuantile(double rank, QuantileSearchCriteria searchCrit)Gets the approximate quantile of the given normalized rank and the given search criterion.doublegetQuantileLowerBound(double rank)Gets the lower bound of the quantile confidence interval in which the quantile of the given rank exists.double[]getQuantiles(double[] ranks, QuantileSearchCriteria searchCrit)Gets an array of quantiles from the given array of normalized ranks.doublegetQuantileUpperBound(double rank)Gets the upper bound of the quantile confidence interval in which the true quantile of the given rank exists.doublegetRank(double quantile, QuantileSearchCriteria searchCrit)Gets the normalized rank corresponding to the given a quantile.doublegetRankLowerBound(double rank)Gets the lower bound of the rank confidence interval in which the true rank of the given rank exists.double[]getRanks(double[] quantiles, QuantileSearchCriteria searchCrit)Gets an array of normalized ranks corresponding to the given array of quantiles and the given search criterion.doublegetRankUpperBound(double rank)Gets the upper bound of the rank confidence interval in which the true rank of the given rank exists.DoublesSortedViewgetSortedView()Gets the sorted view of this sketchstatic KllDoublesSketchheapify(org.apache.datasketches.memory.Memory srcMem)Factory heapify takes a compact sketch image in Memory and instantiates an on-heap sketch.QuantilesDoublesSketchIteratoriterator()Gets the iterator for this sketch, which is not sorted.static KllDoublesSketchnewDirectInstance(int k, org.apache.datasketches.memory.WritableMemory dstMem, org.apache.datasketches.memory.MemoryRequestServer memReqSvr)Create a new direct instance of this sketch with a given k.static KllDoublesSketchnewDirectInstance(org.apache.datasketches.memory.WritableMemory dstMem, org.apache.datasketches.memory.MemoryRequestServer memReqSvr)Create a new direct instance of this sketch with the default k.static KllDoublesSketchnewHeapInstance()Create a new heap instance of this sketch with the default k = 200.static KllDoublesSketchnewHeapInstance(int k)Create a new heap instance of this sketch with a given parameter k.voidreset()Resets this sketch to the empty state.byte[]toByteArray()Returns a byte array representation of this sketch.voidupdate(double item)Updates this sketch with the given item.static KllDoublesSketchwrap(org.apache.datasketches.memory.Memory srcMem)Wrap a sketch around the given read only compact source Memory containing sketch data that originated from this sketch.static KllDoublesSketchwritableWrap(org.apache.datasketches.memory.WritableMemory srcMem, org.apache.datasketches.memory.MemoryRequestServer memReqSvr)Wrap a sketch around the given source Writable Memory containing sketch data that originated from this sketch.-
Methods inherited from class org.apache.datasketches.kll.KllSketch
getCurrentCompactSerializedSizeBytes, getCurrentUpdatableSerializedSizeBytes, getK, getKFromEpsilon, getMaxSerializedSizeBytes, getN, getNormalizedRankError, getNormalizedRankError, getNumRetained, getSerializedSizeBytes, hasMemory, isDirect, isEmpty, isEstimationMode, isMemoryUpdatableFormat, isReadOnly, isSameResource, merge, toString, toString
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface org.apache.datasketches.quantilescommon.QuantilesAPI
getK, getN, getNumRetained, hasMemory, isDirect, isEmpty, isEstimationMode, isReadOnly, toString
-
Methods inherited from interface org.apache.datasketches.quantilescommon.QuantilesDoublesAPI
getCDF, getPartitionBoundaries, getPMF, getQuantile, getQuantiles, getRank, getRanks, getSerializedSizeBytes
-
-
-
-
Method Detail
-
heapify
public static KllDoublesSketch heapify(org.apache.datasketches.memory.Memory srcMem)
Factory heapify takes a compact sketch image in Memory and instantiates an on-heap sketch. The resulting sketch will not retain any link to the source Memory.- Parameters:
srcMem- a compact Memory image of a sketch serialized by this sketch. See Memory- Returns:
- a heap-based sketch based on the given Memory.
-
newDirectInstance
public static KllDoublesSketch newDirectInstance(org.apache.datasketches.memory.WritableMemory dstMem, org.apache.datasketches.memory.MemoryRequestServer memReqSvr)
Create a new direct instance of this sketch with the default k. The default k = 200 results in a normalized rank error of about 1.65%. Larger k will have smaller error but the sketch will be larger (and slower).- Parameters:
dstMem- the given destination WritableMemory object for use by the sketchmemReqSvr- the given MemoryRequestServer to request a larger WritableMemory- Returns:
- a new direct instance of this sketch
-
newDirectInstance
public static KllDoublesSketch newDirectInstance(int k, org.apache.datasketches.memory.WritableMemory dstMem, org.apache.datasketches.memory.MemoryRequestServer memReqSvr)
Create a new direct instance of this sketch with a given k.- Parameters:
k- parameter that controls size of the sketch and accuracy of estimates.dstMem- the given destination WritableMemory object for use by the sketchmemReqSvr- the given MemoryRequestServer to request a larger WritableMemory- Returns:
- a new direct instance of this sketch
-
newHeapInstance
public static KllDoublesSketch newHeapInstance()
Create a new heap instance of this sketch with the default k = 200. The default k = 200 results in a normalized rank error of about 1.65%. Larger K will have smaller error but the sketch will be larger (and slower).- Returns:
- new KllDoublesSketch on the Java heap.
-
newHeapInstance
public static KllDoublesSketch newHeapInstance(int k)
Create a new heap instance of this sketch with a given parameter k. k can be between 8, inclusive, and 65535, inclusive. The default k = 200 results in a normalized rank error of about 1.65%. Larger K will have smaller error but the sketch will be larger (and slower).- Parameters:
k- parameter that controls size of the sketch and accuracy of estimates.- Returns:
- new KllDoublesSketch on the Java heap.
-
wrap
public static KllDoublesSketch wrap(org.apache.datasketches.memory.Memory srcMem)
Wrap a sketch around the given read only compact source Memory containing sketch data that originated from this sketch.- Parameters:
srcMem- the read only source Memory- Returns:
- instance of this sketch
-
writableWrap
public static KllDoublesSketch writableWrap(org.apache.datasketches.memory.WritableMemory srcMem, org.apache.datasketches.memory.MemoryRequestServer memReqSvr)
Wrap a sketch around the given source Writable Memory containing sketch data that originated from this sketch.- Parameters:
srcMem- a WritableMemory that contains data.memReqSvr- the given MemoryRequestServer to request a larger WritableMemory- Returns:
- instance of this sketch
-
getMaxSerializedSizeBytes
public static int getMaxSerializedSizeBytes(int k, long n, boolean updatableMemoryFormat)Returns upper bound on the serialized size of a KllDoublesSketch given the following parameters.- Parameters:
k- parameter that controls size of the sketch and accuracy of estimatesn- stream lengthupdatableMemoryFormat- true if updatable Memory format, otherwise the standard compact format.- Returns:
- upper bound on the serialized size of a KllSketch.
-
getCDF
public double[] getCDF(double[] splitPoints, QuantileSearchCriteria searchCrit)Description copied from interface:QuantilesDoublesAPIReturns an approximation to the Cumulative Distribution Function (CDF) of the input stream as a monotonically increasing array of double ranks (or cumulative probabilities) on the interval [0.0, 1.0], given a set of splitPoints.The resulting approximations have a probabilistic guarantee that can be obtained from the getNormalizedRankError(false) function.
- Specified by:
getCDFin interfaceQuantilesDoublesAPI- Parameters:
splitPoints- an array of m unique, monotonically increasing items (of the same type as the input items) that divide the item input domain into m+1 overlapping intervals.The start of each interval is below the lowest item retained by the sketch corresponding to a zero rank or zero probability, and the end of the interval is the rank or cumulative probability corresponding to the split point.
The (m+1)th interval represents 100% of the distribution represented by the sketch and consistent with the definition of a cumulative probability distribution, thus the (m+1)th rank or probability in the returned array is always 1.0.
If a split point exactly equals a retained item of the sketch and the search criterion is:
- INCLUSIVE, the resulting cumulative probability will include that item.
- EXCLUSIVE, the resulting cumulative probability will not include the weight of that split point.
It is not recommended to include either the minimum or maximum items of the input stream.
searchCrit- the desired search criteria.- Returns:
- a discrete CDF array of m+1 double ranks (or cumulative probabilities) on the interval [0.0, 1.0].
-
getMaxItem
public double getMaxItem()
Description copied from interface:QuantilesDoublesAPIReturns the maximum item of the stream. This is provided for convenience, but may be different from the largest item retained by the sketch algorithm.- Specified by:
getMaxItemin interfaceQuantilesDoublesAPI- Returns:
- the maximum item of the stream
-
getMinItem
public double getMinItem()
Description copied from interface:QuantilesDoublesAPIReturns the minimum item of the stream. This is provided for convenience, but is distinct from the smallest item retained by the sketch algorithm.- Specified by:
getMinItemin interfaceQuantilesDoublesAPI- Returns:
- the minimum item of the stream
-
getPartitionBoundaries
public QuantilesDoublesAPI.DoublesPartitionBoundaries getPartitionBoundaries(int numEquallyWeighted, QuantileSearchCriteria searchCrit)
Description copied from interface:QuantilesDoublesAPIThis method returns an instance ofDoublesPartitionBoundarieswhich provides sufficient information for the user to create the given number of equally weighted partitions.- Specified by:
getPartitionBoundariesin interfaceQuantilesDoublesAPI- Parameters:
numEquallyWeighted- an integer that specifies the number of equally weighted partitions betweengetMinItem()andgetMaxItem(). This must be a positive integer greater than zero.- A 1 will return: minItem, maxItem.
- A 2 will return: minItem, median quantile, maxItem.
- Etc.
searchCrit- If INCLUSIVE, all the returned quantiles are the upper boundaries of the equally weighted partitions with the exception of the lowest returned quantile, which is the lowest boundary of the lowest ranked partition. If EXCLUSIVE, all the returned quantiles are the lower boundaries of the equally weighted partitions with the exception of the highest returned quantile, which is the upper boundary of the highest ranked partition.- Returns:
- an instance of
DoublesPartitionBoundaries.
-
getPMF
public double[] getPMF(double[] splitPoints, QuantileSearchCriteria searchCrit)Description copied from interface:QuantilesDoublesAPIReturns an approximation to the Probability Mass Function (PMF) of the input stream as an array of probability masses as doubles on the interval [0.0, 1.0], given a set of splitPoints.The resulting approximations have a probabilistic guarantee that can be obtained from the getNormalizedRankError(true) function.
- Specified by:
getPMFin interfaceQuantilesDoublesAPI- Parameters:
splitPoints- an array of m unique, monotonically increasing items (of the same type as the input items) that divide the item input domain into m+1 consecutive, non-overlapping intervals.Each interval except for the end intervals starts with a split point and ends with the next split point in sequence.
The first interval starts below the lowest item retained by the sketch corresponding to a zero rank or zero probability, and ends with the first split point
The last (m+1)th interval starts with the last split point and ends after the last item retained by the sketch corresponding to a rank or probability of 1.0.
The sum of the probability masses of all (m+1) intervals is 1.0.
If the search criterion is:
- INCLUSIVE, and the upper split point of an interval equals an item retained by the sketch, the interval will include that item. If the lower split point equals an item retained by the sketch, the interval will exclude that item.
- EXCLUSIVE, and the upper split point of an interval equals an item retained by the sketch, the interval will exclude that item. If the lower split point equals an item retained by the sketch, the interval will include that item.
It is not recommended to include either the minimum or maximum items of the input stream.
searchCrit- the desired search criteria.- Returns:
- a PMF array of m+1 probability masses as doubles on the interval [0.0, 1.0].
-
getQuantile
public double getQuantile(double rank, QuantileSearchCriteria searchCrit)Description copied from interface:QuantilesDoublesAPIGets the approximate quantile of the given normalized rank and the given search criterion.- Specified by:
getQuantilein interfaceQuantilesDoublesAPI- Parameters:
rank- the given normalized rank, a double in the range [0.0, 1.0].searchCrit- If INCLUSIVE, the given rank includes all quantiles ≤ the quantile directly corresponding to the given rank. If EXCLUSIVE, he given rank includes all quantiles < the quantile directly corresponding to the given rank.- Returns:
- the approximate quantile given the normalized rank.
- See Also:
QuantileSearchCriteria
-
getQuantiles
public double[] getQuantiles(double[] ranks, QuantileSearchCriteria searchCrit)Description copied from interface:QuantilesDoublesAPIGets an array of quantiles from the given array of normalized ranks.- Specified by:
getQuantilesin interfaceQuantilesDoublesAPI- Parameters:
ranks- the given array of normalized ranks, each of which must be in the interval [0.0,1.0].searchCrit- if INCLUSIVE, the given ranks include all quantiles ≤ the quantile directly corresponding to each rank.- Returns:
- an array of quantiles corresponding to the given array of normalized ranks.
- See Also:
QuantileSearchCriteria
-
getQuantileLowerBound
public double getQuantileLowerBound(double rank)
Gets the lower bound of the quantile confidence interval in which the quantile of the given rank exists.Although it is possible to estimate the probability that the true quantile exists within the quantile confidence interval specified by the upper and lower quantile bounds, it is not possible to guarantee the width of the quantile confidence interval as an additive or multiplicative percent of the true quantile.
The approximate probability that the true quantile is within the confidence interval specified by the upper and lower quantile bounds for this sketch is 0.99.- Specified by:
getQuantileLowerBoundin interfaceQuantilesDoublesAPI- Parameters:
rank- the given normalized rank- Returns:
- the lower bound of the quantile confidence interval in which the quantile of the given rank exists.
-
getQuantileUpperBound
public double getQuantileUpperBound(double rank)
Gets the upper bound of the quantile confidence interval in which the true quantile of the given rank exists.Although it is possible to estimate the probability that the true quantile exists within the quantile confidence interval specified by the upper and lower quantile bounds, it is not possible to guarantee the width of the quantile interval as an additive or multiplicative percent of the true quantile.
The approximate probability that the true quantile is within the confidence interval specified by the upper and lower quantile bounds for this sketch is 0.99.- Specified by:
getQuantileUpperBoundin interfaceQuantilesDoublesAPI- Parameters:
rank- the given normalized rank- Returns:
- the upper bound of the quantile confidence interval in which the true quantile of the given rank exists.
-
getRank
public double getRank(double quantile, QuantileSearchCriteria searchCrit)Description copied from interface:QuantilesDoublesAPIGets the normalized rank corresponding to the given a quantile.- Specified by:
getRankin interfaceQuantilesDoublesAPI- Parameters:
quantile- the given quantilesearchCrit- if INCLUSIVE the given quantile is included into the rank.- Returns:
- the normalized rank corresponding to the given quantile
- See Also:
QuantileSearchCriteria
-
getRankLowerBound
public double getRankLowerBound(double rank)
Gets the lower bound of the rank confidence interval in which the true rank of the given rank exists. The approximate probability that the true rank is within the confidence interval specified by the upper and lower rank bounds for this sketch is 0.99.- Specified by:
getRankLowerBoundin interfaceQuantilesAPI- Parameters:
rank- the given normalized rank.- Returns:
- the lower bound of the rank confidence interval in which the true rank of the given rank exists.
-
getRankUpperBound
public double getRankUpperBound(double rank)
Gets the upper bound of the rank confidence interval in which the true rank of the given rank exists. The approximate probability that the true rank is within the confidence interval specified by the upper and lower rank bounds for this sketch is 0.99.- Specified by:
getRankUpperBoundin interfaceQuantilesAPI- Parameters:
rank- the given normalized rank.- Returns:
- the upper bound of the rank confidence interval in which the true rank of the given rank exists.
-
getRanks
public double[] getRanks(double[] quantiles, QuantileSearchCriteria searchCrit)Description copied from interface:QuantilesDoublesAPIGets an array of normalized ranks corresponding to the given array of quantiles and the given search criterion.- Specified by:
getRanksin interfaceQuantilesDoublesAPI- Parameters:
quantiles- the given array of quantilessearchCrit- if INCLUSIVE, the given quantiles include the rank directly corresponding to each quantile.- Returns:
- an array of normalized ranks corresponding to the given array of quantiles.
- See Also:
QuantileSearchCriteria
-
getSortedView
public DoublesSortedView getSortedView()
Description copied from interface:QuantilesDoublesAPIGets the sorted view of this sketch- Specified by:
getSortedViewin interfaceQuantilesDoublesAPI- Returns:
- the sorted view of this sketch
-
iterator
public QuantilesDoublesSketchIterator iterator()
Description copied from interface:QuantilesDoublesAPIGets the iterator for this sketch, which is not sorted.- Specified by:
iteratorin interfaceQuantilesDoublesAPI- Returns:
- the iterator for this sketch
-
reset
public final void reset()
Resets this sketch to the empty state. If the sketch is read only this does nothing.The parameter k will not change.
The parameter k will not change.
- Specified by:
resetin interfaceQuantilesAPI
-
toByteArray
public byte[] toByteArray()
Description copied from interface:QuantilesDoublesAPIReturns a byte array representation of this sketch.- Specified by:
toByteArrayin interfaceQuantilesDoublesAPI- Returns:
- a byte array representation of this sketch.
-
update
public void update(double item)
Description copied from interface:QuantilesDoublesAPIUpdates this sketch with the given item.- Specified by:
updatein interfaceQuantilesDoublesAPI- Parameters:
item- from a stream of quantiles. NaNs are ignored.
-
-