Class TupleSketch<S extends Summary>
java.lang.Object
org.apache.datasketches.tuple.TupleSketch<S>
- Type Parameters:
S- Type of Summary
- Direct Known Subclasses:
CompactTupleSketch, UpdatableTupleSketch
The top-level class for all Tuple sketches. This class is never constructed directly.
Use the UpdatableTupleSketchBuilder() methods to create UpdatableTupleSketches.
This is similar to
ThetaSketch with
addition of a user-defined Summary object associated with every unique entry
in the sketch.-
Method Summary
Modifier and TypeMethodDescriptionabstract CompactTupleSketch<S> compact()Converts this TupleSketch to a CompactTupleSketch on the Java heap.static <S extends Summary>
TupleSketch<S> Creates an empty CompactTupleSketch.abstract intgetCountLessThanThetaLong(long thetaLong) Gets the number of hash values less than the given theta expressed as a long.doubleEstimates the cardinality of the set (number of unique values presented to the sketch)doublegetEstimate(int numSubsetEntries) Gets the estimate of the true distinct population of subset tuples represented by the count of entries in a subset of the total retained entries of the sketch.doublegetLowerBound(int numStdDev) Gets the approximate lower error bound given the specified number of Standard Deviations.doublegetLowerBound(int numStdDev, int numSubsetEntries) Gets the estimate of the lower bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.abstract intReturns number of retained entriesGets the Summary Factory class of type SdoublegetTheta()Gets the value of theta as a double between zero and onelongReturns Theta as a longdoublegetUpperBound(int numStdDev) Gets the approximate upper error bound given the specified number of Standard Deviations.doublegetUpperBound(int numStdDev, int numSubsetEntries) Gets the estimate of the upper bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.static <S extends Summary>
TupleSketch<S> heapifySketch(MemorySegment seg, SummaryDeserializer<S> deserializer) Instantiate a TupleSketch from a given MemorySegment.static <U, S extends UpdatableSummary<U>>
UpdatableTupleSketch<U, S> heapifyUpdatableSketch(MemorySegment seg, SummaryDeserializer<S> deserializer, SummaryFactory<S> summaryFactory) Instantiate an UpdatableTupleSketch from a given MemorySegment on the heap,booleanisEmpty()booleanReturns true if the sketch is Estimation Mode (as opposed to Exact Mode).abstract TupleSketchIterator<S> iterator()Returns a SketchIteratorabstract byte[]Serialize this sketch to a byte array.toString()
-
Method Details
-
compact
Converts this TupleSketch to a CompactTupleSketch on the Java heap.If this sketch is already in compact form this operation returns this.
- Returns:
- this sketch as a CompactTupleSketch on the Java heap.
-
getEstimate
public double getEstimate()Estimates the cardinality of the set (number of unique values presented to the sketch)- Returns:
- best estimate of the number of unique values
-
getUpperBound
public double getUpperBound(int numStdDev) Gets the approximate upper error bound given the specified number of Standard Deviations. This will return getEstimate() if isEmpty() is true.- Parameters:
numStdDev- See Number of Standard Deviations- Returns:
- the upper bound.
-
getLowerBound
public double getLowerBound(int numStdDev) Gets the approximate lower error bound given the specified number of Standard Deviations. This will return getEstimate() if isEmpty() is true.- Parameters:
numStdDev- See Number of Standard Deviations- Returns:
- the lower bound.
-
getEstimate
public double getEstimate(int numSubsetEntries) Gets the estimate of the true distinct population of subset tuples represented by the count of entries in a subset of the total retained entries of the sketch.- Parameters:
numSubsetEntries- number of entries for a chosen subset of the sketch.- Returns:
- the estimate of the true distinct population of subset tuples represented by the count of entries in a subset of the total retained entries of the sketch.
-
getLowerBound
public double getLowerBound(int numStdDev, int numSubsetEntries) Gets the estimate of the lower bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.- Parameters:
numStdDev- See Number of Standard DeviationsnumSubsetEntries- number of entries for a chosen subset of the sketch.- Returns:
- the estimate of the lower bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.
-
getUpperBound
public double getUpperBound(int numStdDev, int numSubsetEntries) Gets the estimate of the upper bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.- Parameters:
numStdDev- See Number of Standard DeviationsnumSubsetEntries- number of entries for a chosen subset of the sketch.- Returns:
- the estimate of the upper bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.
-
isEmpty
-
isEstimationMode
public boolean isEstimationMode()Returns true if the sketch is Estimation Mode (as opposed to Exact Mode). This is true if theta < 1.0 AND isEmpty() is false.- Returns:
- true if the sketch is in estimation mode.
-
getRetainedEntries
public abstract int getRetainedEntries()Returns number of retained entries- Returns:
- number of retained entries
-
getCountLessThanThetaLong
public abstract int getCountLessThanThetaLong(long thetaLong) Gets the number of hash values less than the given theta expressed as a long.- Parameters:
thetaLong- the given theta as a long in the range (zero, Long.MAX_VALUE].- Returns:
- the number of hash values less than the given thetaLong.
-
getSummaryFactory
Gets the Summary Factory class of type S- Returns:
- the Summary Factory class of type S
-
getTheta
public double getTheta()Gets the value of theta as a double between zero and one- Returns:
- the value of theta as a double
-
toByteArray
public abstract byte[] toByteArray()Serialize this sketch to a byte array.As of 3.0.0, serializing an UpdatableTupleSketch is deprecated. This capability will be removed in a future release. Serializing a CompactTupleSketch is not deprecated.
- Returns:
- serialized representation of this sketch.
-
iterator
Returns a SketchIterator- Returns:
- a SketchIterator
-
getThetaLong
public long getThetaLong()Returns Theta as a long- Returns:
- Theta as a long
-
toString
-
heapifyUpdatableSketch
public static <U, S extends UpdatableSummary<U>> UpdatableTupleSketch<U,S> heapifyUpdatableSketch(MemorySegment seg, SummaryDeserializer<S> deserializer, SummaryFactory<S> summaryFactory) Instantiate an UpdatableTupleSketch from a given MemorySegment on the heap,- Type Parameters:
U- Type of update valueS- Type of Summary- Parameters:
seg- MemorySegment object representing an UpdatableTupleSketchdeserializer- instance of SummaryDeserializersummaryFactory- instance of SummaryFactory- Returns:
- UpdatableTupleSketch created from its MemorySegment representation
-
heapifySketch
public static <S extends Summary> TupleSketch<S> heapifySketch(MemorySegment seg, SummaryDeserializer<S> deserializer) Instantiate a TupleSketch from a given MemorySegment.- Type Parameters:
S- Type of Summary- Parameters:
seg- MemorySegment object representing a TupleSketchdeserializer- instance of SummaryDeserializer- Returns:
- TupleSketch created from its MemorySegment representation
-
createEmptySketch
Creates an empty CompactTupleSketch.- Type Parameters:
S- Type of Summary- Returns:
- an empty instance of a CompactTupleSketch
-