Package org.apache.datasketches.tuple
Class Sketch<S extends Summary>
java.lang.Object
org.apache.datasketches.tuple.Sketch<S>
- Type Parameters:
S
- Type of Summary
- Direct Known Subclasses:
CompactSketch
,UpdatableSketch
This is an equivalent to org.apache.datasketches.theta.Sketch with
addition of a user-defined Summary object associated with every unique entry
in the sketch.
-
Method Summary
Modifier and TypeMethodDescriptionabstract CompactSketch<S>
compact()
Converts this sketch to a CompactSketch on the Java heap.abstract int
getCountLessThanThetaLong
(long thetaLong) Gets the number of hash values less than the given theta expressed as a long.double
Estimates the cardinality of the set (number of unique values presented to the sketch)double
getEstimate
(int numSubsetEntries) Gets the estimate of the true distinct population of subset tuples represented by the count of entries in a subset of the total retained entries of the sketch.double
getLowerBound
(int numStdDev) Gets the approximate lower error bound given the specified number of Standard Deviations.double
getLowerBound
(int numStdDev, int numSubsetEntries) Gets the estimate of the lower bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.abstract int
Gets the Summary Factory class of type Sdouble
getTheta()
Gets the value of theta as a double between zero and onelong
Returns Theta as a longdouble
getUpperBound
(int numStdDev) Gets the approximate upper error bound given the specified number of Standard Deviations.double
getUpperBound
(int numStdDev, int numSubsetEntries) Gets the estimate of the upper bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.boolean
isEmpty()
boolean
Returns true if the sketch is Estimation Mode (as opposed to Exact Mode).abstract TupleSketchIterator<S>
iterator()
Returns a SketchIteratorabstract byte[]
This is to serialize a sketch instance to a byte array.toString()
-
Method Details
-
compact
Converts this sketch to a CompactSketch on the Java heap.If this sketch is already in compact form this operation returns this.
- Returns:
- this sketch as a CompactSketch on the Java heap.
-
getEstimate
public double getEstimate()Estimates the cardinality of the set (number of unique values presented to the sketch)- Returns:
- best estimate of the number of unique values
-
getUpperBound
public double getUpperBound(int numStdDev) Gets the approximate upper error bound given the specified number of Standard Deviations. This will return getEstimate() if isEmpty() is true.- Parameters:
numStdDev
- See Number of Standard Deviations- Returns:
- the upper bound.
-
getLowerBound
public double getLowerBound(int numStdDev) Gets the approximate lower error bound given the specified number of Standard Deviations. This will return getEstimate() if isEmpty() is true.- Parameters:
numStdDev
- See Number of Standard Deviations- Returns:
- the lower bound.
-
getEstimate
public double getEstimate(int numSubsetEntries) Gets the estimate of the true distinct population of subset tuples represented by the count of entries in a subset of the total retained entries of the sketch.- Parameters:
numSubsetEntries
- number of entries for a chosen subset of the sketch.- Returns:
- the estimate of the true distinct population of subset tuples represented by the count of entries in a subset of the total retained entries of the sketch.
-
getLowerBound
public double getLowerBound(int numStdDev, int numSubsetEntries) Gets the estimate of the lower bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.- Parameters:
numStdDev
- See Number of Standard DeviationsnumSubsetEntries
- number of entries for a chosen subset of the sketch.- Returns:
- the estimate of the lower bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.
-
getUpperBound
public double getUpperBound(int numStdDev, int numSubsetEntries) Gets the estimate of the upper bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.- Parameters:
numStdDev
- See Number of Standard DeviationsnumSubsetEntries
- number of entries for a chosen subset of the sketch.- Returns:
- the estimate of the upper bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.
-
isEmpty
public boolean isEmpty()- Returns:
- true if empty.
-
isEstimationMode
public boolean isEstimationMode()Returns true if the sketch is Estimation Mode (as opposed to Exact Mode). This is true if theta < 1.0 AND isEmpty() is false.- Returns:
- true if the sketch is in estimation mode.
-
getRetainedEntries
public abstract int getRetainedEntries()- Returns:
- number of retained entries
-
getCountLessThanThetaLong
public abstract int getCountLessThanThetaLong(long thetaLong) Gets the number of hash values less than the given theta expressed as a long.- Parameters:
thetaLong
- the given theta as a long between zero and Long.MAX_VALUE.- Returns:
- the number of hash values less than the given thetaLong.
-
getSummaryFactory
Gets the Summary Factory class of type S- Returns:
- the Summary Factory class of type S
-
getTheta
public double getTheta()Gets the value of theta as a double between zero and one- Returns:
- the value of theta as a double
-
toByteArray
public abstract byte[] toByteArray()This is to serialize a sketch instance to a byte array.As of 3.0.0, serializing an UpdatableSketch is deprecated. This capability will be removed in a future release. Serializing a CompactSketch is not deprecated.
- Returns:
- serialized representation of the sketch
-
iterator
Returns a SketchIterator- Returns:
- a SketchIterator
-
getThetaLong
public long getThetaLong()Returns Theta as a long- Returns:
- Theta as a long
-
toString
-