Package org.apache.datasketches.tuple
Class Sketch<S extends Summary>
- java.lang.Object
-
- org.apache.datasketches.tuple.Sketch<S>
-
- Type Parameters:
S
- Type of Summary
- Direct Known Subclasses:
CompactSketch
,UpdatableSketch
public abstract class Sketch<S extends Summary> extends Object
This is an equivalent to org.apache.datasketches.theta.Sketch with addition of a user-defined Summary object associated with every unique entry in the sketch.
-
-
Field Summary
Fields Modifier and Type Field Description protected static byte
PREAMBLE_LONGS
protected SummaryFactory<S>
summaryFactory_
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description abstract CompactSketch<S>
compact()
Converts this sketch to a CompactSketch on the Java heap.abstract int
getCountLessThanThetaLong(long thetaLong)
Gets the number of hash values less than the given theta expressed as a long.double
getEstimate()
Estimates the cardinality of the set (number of unique values presented to the sketch)double
getEstimate(int numSubsetEntries)
Gets the estimate of the true distinct population of subset tuples represented by the count of entries in a subset of the total retained entries of the sketch.double
getLowerBound(int numStdDev)
Gets the approximate lower error bound given the specified number of Standard Deviations.double
getLowerBound(int numStdDev, int numSubsetEntries)
Gets the estimate of the lower bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.abstract int
getRetainedEntries()
SummaryFactory<S>
getSummaryFactory()
Gets the Summary Factory class of type Sdouble
getTheta()
Gets the value of theta as a double between zero and onelong
getThetaLong()
Returns Theta as a longdouble
getUpperBound(int numStdDev)
Gets the approximate upper error bound given the specified number of Standard Deviations.double
getUpperBound(int numStdDev, int numSubsetEntries)
Gets the estimate of the upper bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.boolean
isEmpty()
boolean
isEstimationMode()
Returns true if the sketch is Estimation Mode (as opposed to Exact Mode).abstract TupleSketchIterator<S>
iterator()
Returns a SketchIteratorabstract byte[]
toByteArray()
This is to serialize a sketch instance to a byte array.String
toString()
-
-
-
Field Detail
-
PREAMBLE_LONGS
protected static final byte PREAMBLE_LONGS
- See Also:
- Constant Field Values
-
summaryFactory_
protected SummaryFactory<S extends Summary> summaryFactory_
-
-
Method Detail
-
compact
public abstract CompactSketch<S> compact()
Converts this sketch to a CompactSketch on the Java heap.If this sketch is already in compact form this operation returns this.
- Returns:
- this sketch as a CompactSketch on the Java heap.
-
getEstimate
public double getEstimate()
Estimates the cardinality of the set (number of unique values presented to the sketch)- Returns:
- best estimate of the number of unique values
-
getUpperBound
public double getUpperBound(int numStdDev)
Gets the approximate upper error bound given the specified number of Standard Deviations. This will return getEstimate() if isEmpty() is true.- Parameters:
numStdDev
- See Number of Standard Deviations- Returns:
- the upper bound.
-
getLowerBound
public double getLowerBound(int numStdDev)
Gets the approximate lower error bound given the specified number of Standard Deviations. This will return getEstimate() if isEmpty() is true.- Parameters:
numStdDev
- See Number of Standard Deviations- Returns:
- the lower bound.
-
getEstimate
public double getEstimate(int numSubsetEntries)
Gets the estimate of the true distinct population of subset tuples represented by the count of entries in a subset of the total retained entries of the sketch.- Parameters:
numSubsetEntries
- number of entries for a chosen subset of the sketch.- Returns:
- the estimate of the true distinct population of subset tuples represented by the count of entries in a subset of the total retained entries of the sketch.
-
getLowerBound
public double getLowerBound(int numStdDev, int numSubsetEntries)
Gets the estimate of the lower bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.- Parameters:
numStdDev
- See Number of Standard DeviationsnumSubsetEntries
- number of entries for a chosen subset of the sketch.- Returns:
- the estimate of the lower bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.
-
getUpperBound
public double getUpperBound(int numStdDev, int numSubsetEntries)
Gets the estimate of the upper bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.- Parameters:
numStdDev
- See Number of Standard DeviationsnumSubsetEntries
- number of entries for a chosen subset of the sketch.- Returns:
- the estimate of the upper bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.
-
isEmpty
public boolean isEmpty()
- Returns:
- true if empty.
-
isEstimationMode
public boolean isEstimationMode()
Returns true if the sketch is Estimation Mode (as opposed to Exact Mode). This is true if theta < 1.0 AND isEmpty() is false.- Returns:
- true if the sketch is in estimation mode.
-
getRetainedEntries
public abstract int getRetainedEntries()
- Returns:
- number of retained entries
-
getCountLessThanThetaLong
public abstract int getCountLessThanThetaLong(long thetaLong)
Gets the number of hash values less than the given theta expressed as a long.- Parameters:
thetaLong
- the given theta as a long between zero and Long.MAX_VALUE.- Returns:
- the number of hash values less than the given thetaLong.
-
getSummaryFactory
public SummaryFactory<S> getSummaryFactory()
Gets the Summary Factory class of type S- Returns:
- the Summary Factory class of type S
-
getTheta
public double getTheta()
Gets the value of theta as a double between zero and one- Returns:
- the value of theta as a double
-
toByteArray
public abstract byte[] toByteArray()
This is to serialize a sketch instance to a byte array.As of 3.0.0, serializing an UpdatableSketch is deprecated. This capability will be removed in a future release. Serializing a CompactSketch is not deprecated.
- Returns:
- serialized representation of the sketch
-
iterator
public abstract TupleSketchIterator<S> iterator()
Returns a SketchIterator- Returns:
- a SketchIterator
-
getThetaLong
public long getThetaLong()
Returns Theta as a long- Returns:
- Theta as a long
-
-