Class ThetaSketch
- All Implemented Interfaces:
MemorySegmentStatus
- Direct Known Subclasses:
CompactThetaSketch, UpdatableThetaSketch
- Author:
- Lee Rhodes
-
Method Summary
Modifier and TypeMethodDescriptioncompact()Converts this sketch to a ordered CompactThetaSketch.abstract CompactThetaSketchcompact(boolean dstOrdered, MemorySegment dstSeg) Convert this sketch to a CompactThetaSketch.abstract intReturns the number of storage bytes required for this ThetaSketch if its current state were compacted.static intgetCompactSketchMaxBytes(int lgNomEntries) Returns the maximum number of storage bytes required for a CompactThetaSketch given the configured log_base2 of the number of nominal entries, which is a power of 2.intgetCountLessThanThetaLong(long thetaLong) Gets the number of hash values less than the given theta expressed as a long.abstract intReturns the number of storage bytes required for this sketch in its current state.abstract doubleGets the unique count estimate.static doublegetEstimate(MemorySegment srcSeg) Gets the estimate from the given MemorySegmentabstract FamilyReturns the Family that this sketch belongs todoublegetLowerBound(int numStdDev) Gets the approximate lower error bound given the specified number of Standard Deviations.static doublegetLowerBound(int numStdDev, MemorySegment srcSeg) Gets the approximate lower error bound from a valid MemorySegment image of a ThetaSketch given the specified number of Standard Deviations.static intgetMaxCompactSketchBytes(int numberOfEntries) Returns the maximum number of storage bytes required for a CompactThetaSketch with the given number of actual entries.static intgetMaxUpdateSketchBytes(int nomEntries) Returns the maximum number of storage bytes required for an UpdatableThetaSketch with the given number of nominal entries (power of 2).intReturns the number of valid entries that have been retained by the sketch.abstract intgetRetainedEntries(boolean valid) Returns the number of entries that have been retained by the sketch.static intgetRetainedEntries(MemorySegment srcSeg) Returns the number of valid entries that have been retained by the sketch from the given MemorySegmentstatic intReturns the serialization version from the given MemorySegmentdoublegetTheta()Gets the value of theta as a double with a value between zero and oneabstract longGets the value of theta as a longstatic intgetUpdateSketchMaxBytes(int lgNomEntries) Returns the maximum number of storage bytes required for an UpdatableThetaSketch with the given log_base2 of the nominal entries.doublegetUpperBound(int numStdDev) Gets the approximate upper error bound given the specified number of Standard Deviations.static doublegetUpperBound(int numStdDev, MemorySegment srcSeg) Gets the approximate upper error bound from a valid MemorySegment image of a ThetaSketch given the specified number of Standard Deviations.static ThetaSketchheapify(MemorySegment srcSeg) Heapify takes the sketch image in MemorySegment and instantiates an on-heap ThetaSketch.static ThetaSketchheapify(MemorySegment srcSeg, long expectedSeed) Heapify takes the sketch image in MemorySegment and instantiates an on-heap ThetaSketch.abstract booleanReturns true if this sketch is in compact form.abstract booleanisEmpty()booleanReturns true if the sketch is Estimation Mode (as opposed to Exact Mode).abstract booleanReturns true if internal cache is orderedabstract HashIteratoriterator()Returns a HashIterator that can be used to iterate over the retained hash values of the Theta sketch.abstract byte[]Serialize this sketch to a byte array form.toString()Returns a human readable summary of the sketch.toString(boolean sketchSummary, boolean dataDetail, int width, boolean hexMode) Gets a human readable listing of contents and summary of the given sketch.static StringtoString(byte[] byteArr) Returns a human readable string of the preamble of a byte array image of a ThetaSketch.static StringtoString(MemorySegment seg) Returns a human readable string of the preamble of a MemorySegment image of a ThetaSketch.static ThetaSketchwrap(MemorySegment srcSeg) Wrap takes the sketch image in the given MemorySegment and refers to it directly.static ThetaSketchwrap(MemorySegment srcSeg, long expectedSeed) Wrap takes the sketch image in the given MemorySegment and refers to it directly.Methods inherited from interface MemorySegmentStatus
hasMemorySegment, isOffHeap, isSameResource
-
Method Details
-
heapify
Heapify takes the sketch image in MemorySegment and instantiates an on-heap ThetaSketch.The resulting sketch will not retain any link to the source MemorySegment.
For UpdatableThetaSketches this method checks if the Default Update Seed
was used to create the source MemorySegment image.- Parameters:
srcSeg- an image of a ThetaSketch.- Returns:
- a ThetaSketch on the heap.
-
heapify
Heapify takes the sketch image in MemorySegment and instantiates an on-heap ThetaSketch.The resulting sketch will not retain any link to the source MemorySegment.
For UpdatableThetaSketches this method checks if the expectedSeed was used to create the source MemorySegment image.
- Parameters:
srcSeg- an image of a ThetaSketch that was created using the given expectedSeed.expectedSeed- the seed used to validate the given MemorySegment image. See Update Hash Seed. Compact sketches store a 16-bit hash of the seed, but not the seed itself.- Returns:
- a ThetaSketch on the heap.
-
wrap
Wrap takes the sketch image in the given MemorySegment and refers to it directly. There is no data copying onto the java heap. The wrap operation enables fast read-only merging and access to all the public read-only API.Only sketches that have been explicitly stored as direct sketches can be wrapped.
Wrapping any subclass of this class that is empty or contains only a single item will result in on-heap equivalent forms of empty and single item sketch respectively. This is actually faster and consumes less overall space.
This method checks if the Default Update Seed was used to create the source MemorySegment image.
- Parameters:
srcSeg- a MemorySegment with an image of a ThetaSketch.- Returns:
- a read-only ThetaSketch backed by the given MemorySegment
-
wrap
Wrap takes the sketch image in the given MemorySegment and refers to it directly. There is no data copying onto the java heap. The wrap operation enables fast read-only merging and access to all the public read-only API.Only sketches that have been explicitly stored as direct sketches can be wrapped.
Wrapping any subclass of this class that is empty or contains only a single item will result in on-heap equivalent forms of empty and single item sketch respectively. This is actually faster and consumes less overall space.
This method checks if the given expectedSeed was used to create the source MemorySegment image.
- Parameters:
srcSeg- a MemorySegment with an image of a ThetaSketch.expectedSeed- the seed used to validate the given MemorySegment image. See Update Hash Seed.- Returns:
- a read-only ThetaSketch backed by the given MemorySegment.
-
compact
Converts this sketch to a ordered CompactThetaSketch.If this.isCompact() == true this method returns this, otherwise, this method is equivalent to
compact(true, null).A CompactThetaSketch is always immutable.
- Returns:
- this sketch as an ordered CompactThetaSketch.
-
compact
Convert this sketch to a CompactThetaSketch.If this sketch is a type of UpdatableThetaSketch, the compacting process converts the hash table of the UpdatableThetaketch to a simple list of the valid hash values. Any hash values of zero or equal-to or greater than theta will be discarded. The number of valid values remaining in the CompactThetaSketch depends on a number of factors, but may be larger or smaller than Nominal Entries (or k). It will never exceed 2k. If it is critical to always limit the size to no more than k, then rebuild() should be called on the UpdatableThetaSketch prior to calling this method.
A CompactThetaSketch is always immutable.
A new CompactThetaSketch object is created:
- if dstSeg!= null
- if dstSeg == null and this.hasMemorySegment() == true
- if dstSeg == null and this has more than 1 item and this.isOrdered() == false and dstOrdered == true.
Otherwise, this operation returns this.
- Parameters:
dstOrdered- assumed true if this sketch is empty or has only one value See Destination OrdereddstSeg- See Destination MemorySegment.- Returns:
- this sketch as a CompactThetaSketch.
-
getCompactBytes
public abstract int getCompactBytes()Returns the number of storage bytes required for this ThetaSketch if its current state were compacted. It this sketch is already in the compact form this is equivalent to callinggetCurrentBytes().- Returns:
- number of compact bytes
-
getCountLessThanThetaLong
public int getCountLessThanThetaLong(long thetaLong) Gets the number of hash values less than the given theta expressed as a long.- Parameters:
thetaLong- the given theta as a long between zero and Long.MAX_VALUE.- Returns:
- the number of hash values less than the given thetaLong.
-
getCurrentBytes
public abstract int getCurrentBytes()Returns the number of storage bytes required for this sketch in its current state.- Returns:
- the number of storage bytes required for this sketch
-
getEstimate
public abstract double getEstimate()Gets the unique count estimate.- Returns:
- the sketch's best estimate of the cardinality of the input stream.
-
getEstimate
Gets the estimate from the given MemorySegment- Parameters:
srcSeg- the given MemorySegment- Returns:
- the result estimate
-
getFamily
Returns the Family that this sketch belongs to- Returns:
- the Family that this sketch belongs to
-
getLowerBound
public double getLowerBound(int numStdDev) Gets the approximate lower error bound given the specified number of Standard Deviations. This will return getEstimate() if isEmpty() is true.- Parameters:
numStdDev- See Number of Standard Deviations- Returns:
- the lower bound.
-
getMaxCompactSketchBytes
public static int getMaxCompactSketchBytes(int numberOfEntries) Returns the maximum number of storage bytes required for a CompactThetaSketch with the given number of actual entries.- Parameters:
numberOfEntries- the actual number of retained entries stored in the sketch.- Returns:
- the maximum number of storage bytes required for a CompactThetaSketch with the given number of retained entries.
-
getCompactSketchMaxBytes
public static int getCompactSketchMaxBytes(int lgNomEntries) Returns the maximum number of storage bytes required for a CompactThetaSketch given the configured log_base2 of the number of nominal entries, which is a power of 2.- Parameters:
lgNomEntries- Nominal Entries- Returns:
- the maximum number of storage bytes required for a CompactThetaSketch with the given lgNomEntries.
-
getMaxUpdateSketchBytes
public static int getMaxUpdateSketchBytes(int nomEntries) Returns the maximum number of storage bytes required for an UpdatableThetaSketch with the given number of nominal entries (power of 2).- Parameters:
nomEntries- Nominal Entries This will become the ceiling power of 2 if it is not.- Returns:
- the maximum number of storage bytes required for a UpdatableThetaSketch with the given nomEntries
-
getUpdateSketchMaxBytes
public static int getUpdateSketchMaxBytes(int lgNomEntries) Returns the maximum number of storage bytes required for an UpdatableThetaSketch with the given log_base2 of the nominal entries.- Parameters:
lgNomEntries- log_base2 of Nominal Entries- Returns:
- the maximum number of storage bytes required for a UpdatableThetaSketch with the given lgNomEntries
-
getRetainedEntries
public int getRetainedEntries()Returns the number of valid entries that have been retained by the sketch. For the AlphaSketch this returns only valid entries.- Returns:
- the number of valid retained entries.
-
getRetainedEntries
public abstract int getRetainedEntries(boolean valid) Returns the number of entries that have been retained by the sketch.- Parameters:
valid- This parameter is only relevant for the AlphaSketch. if true, returns the number of valid entries, which are less than theta and used for estimation. Otherwise, return the number of all entries, valid or not, that are currently in the internal sketch cache.- Returns:
- the number of retained entries
-
getRetainedEntries
Returns the number of valid entries that have been retained by the sketch from the given MemorySegment- Parameters:
srcSeg- the given MemorySegment that has an image of a ThetaSketch- Returns:
- the number of valid retained entries
-
getSerializationVersion
Returns the serialization version from the given MemorySegment- Parameters:
seg- the sketch MemorySegment- Returns:
- the serialization version from the MemorySegment
-
getTheta
public double getTheta()Gets the value of theta as a double with a value between zero and one- Returns:
- the value of theta as a double
-
getThetaLong
public abstract long getThetaLong()Gets the value of theta as a long- Returns:
- the value of theta as a long
-
getUpperBound
public double getUpperBound(int numStdDev) Gets the approximate upper error bound given the specified number of Standard Deviations. This will return getEstimate() if isEmpty() is true.- Parameters:
numStdDev- See Number of Standard Deviations- Returns:
- the upper bound.
-
isCompact
public abstract boolean isCompact()Returns true if this sketch is in compact form.- Returns:
- true if this sketch is in compact form.
-
isEmpty
-
isEstimationMode
public boolean isEstimationMode()Returns true if the sketch is Estimation Mode (as opposed to Exact Mode). This is true if theta < 1.0 AND isEmpty() is false.- Returns:
- true if the sketch is in estimation mode.
-
isOrdered
public abstract boolean isOrdered()Returns true if internal cache is ordered- Returns:
- true if internal cache is ordered
-
iterator
Returns a HashIterator that can be used to iterate over the retained hash values of the Theta sketch.- Returns:
- a HashIterator that can be used to iterate over the retained hash values of the Theta sketch.
-
toByteArray
public abstract byte[] toByteArray()Serialize this sketch to a byte array form.- Returns:
- byte array of this sketch
-
toString
-
toString
Gets a human readable listing of contents and summary of the given sketch. This can be a very long string. If this sketch is in a "dirty" state there may be values in the dataDetail view that are ≥ theta.- Parameters:
sketchSummary- If true the sketch summary will be output at the end.dataDetail- If true, includes all valid hash values in the sketch.width- The number of columns of hash values. Default is 8.hexMode- If true, hashes will be output in hex.- Returns:
- The result string, which can be very long.
-
toString
Returns a human readable string of the preamble of a byte array image of a ThetaSketch.- Parameters:
byteArr- the given byte array- Returns:
- a human readable string of the preamble of a byte array image of a ThetaSketch.
-
toString
Returns a human readable string of the preamble of a MemorySegment image of a ThetaSketch.- Parameters:
seg- the given MemorySegment object- Returns:
- a human readable string of the preamble of a MemorySegment image of a ThetaSketch.
-
getLowerBound
Gets the approximate lower error bound from a valid MemorySegment image of a ThetaSketch given the specified number of Standard Deviations. This will return getEstimate() if isEmpty() is true.- Parameters:
numStdDev- See Number of Standard DeviationssrcSeg- the source MemorySegment- Returns:
- the lower bound.
-
getUpperBound
Gets the approximate upper error bound from a valid MemorySegment image of a ThetaSketch given the specified number of Standard Deviations. This will return getEstimate() if isEmpty() is true.- Parameters:
numStdDev- See Number of Standard DeviationssrcSeg- the source MemorySegment- Returns:
- the upper bound.
-