Class CompactThetaSketch
- All Implemented Interfaces:
MemorySegmentStatus
A CompactThetaSketch is the simplest form of a ThetaSketches. It consists of a compact list (i.e., no intervening spaces) of hash values, which may be ordered or not, a value for theta and a seed hash. A CompactThetaSketch is immutable (read-only), and the space required when stored is only the space required for the hash values and 8 to 24 bytes of preamble. An empty CompactThetaSketch consumes only 8 bytes.
- Author:
- Lee Rhodes
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionabstract CompactThetaSketchcompact(boolean dstOrdered, MemorySegment dstSeg) Convert this sketch to a CompactThetaSketch.intReturns the number of storage bytes required for this ThetaSketch if its current state were compacted.doubleGets the unique count estimate.Returns the Family that this sketch belongs tobooleanReturns true if this object's internal data is backed by a MemorySegment, which may be on-heap or off-heap.static CompactThetaSketchheapify(MemorySegment srcSeg) Heapify takes a CompactThetaSketch image in a MemorySegment and instantiates an on-heap CompactThetaSketch.static CompactThetaSketchheapify(MemorySegment srcSeg, long expectedSeed) Heapify takes a CompactThetaSketch image in a MemorySegment and instantiates an on-heap CompactThetaSketch.booleanReturns true if this sketch is in compact form.booleanReturns true if this object's internal data is backed by an off-heap (direct or native)) MemorySegment.booleanisSameResource(MemorySegment that) Returns true if an internally referenced MemorySegment has the same backing resource as that, or equivalently, if their two memory regions overlap.byte[]gets the sketch as a compressed byte arraystatic CompactThetaSketchwrap(byte[] bytes) Wrap takes the sketch image in the given byte array and refers to it directly.static CompactThetaSketchwrap(byte[] bytes, long expectedSeed) Wrap takes the sketch image in the given byte array and refers to it directly.static CompactThetaSketchwrap(MemorySegment srcSeg) Wrap takes the CompactThetaSketch image in given MemorySegment and refers to it directly.static CompactThetaSketchwrap(MemorySegment srcSeg, long expectedSeed) Wrap takes the sketch image in the given MemorySegment and refers to it directly.Methods inherited from class ThetaSketch
compact, getCompactSketchMaxBytes, getCountLessThanThetaLong, getCurrentBytes, getEstimate, getLowerBound, getLowerBound, getMaxCompactSketchBytes, getMaxUpdateSketchBytes, getRetainedEntries, getRetainedEntries, getRetainedEntries, getSerializationVersion, getTheta, getThetaLong, getUpdateSketchMaxBytes, getUpperBound, getUpperBound, isEmpty, isEstimationMode, isOrdered, iterator, toByteArray, toString, toString, toString, toString
-
Constructor Details
-
CompactThetaSketch
public CompactThetaSketch()No argument constructor.
-
-
Method Details
-
heapify
Heapify takes a CompactThetaSketch image in a MemorySegment and instantiates an on-heap CompactThetaSketch.The resulting sketch will not retain any link to the source MemorySegment and all of its data will be copied to the heap CompactThetaSketch.
The
DEFAULT_UPDATE_SEEDis assumed.- Parameters:
srcSeg- an image of a CompactThetaSketch.- Returns:
- a CompactThetaSketch on the heap.
-
heapify
Heapify takes a CompactThetaSketch image in a MemorySegment and instantiates an on-heap CompactThetaSketch.The resulting sketch will not retain any link to the source MemorySegment and all of its data will be copied to the heap CompactThetaSketch.
This method checks if the given expectedSeed was used to create the source MemorySegment image.
- Parameters:
srcSeg- an image of a CompactThetaSketch that was created using the given expectedSeed.expectedSeed- the seed used to validate the given MemorySegment image. See Update Hash Seed.- Returns:
- a CompactThetaSketch on the heap.
-
wrap
Wrap takes the CompactThetaSketch image in given MemorySegment and refers to it directly. There is no data copying onto the java heap. The wrap operation enables fast read-only merging and access to all the public read-only API.Wrapping any subclass of this class that is empty or contains only a single item will result in heapified forms of empty and single item sketch respectively. This is actually faster and consumes less overall space.
The
DEFAULT_UPDATE_SEEDis assumed.- Parameters:
srcSeg- an image of a CompactThetaSketch.- Returns:
- a CompactThetaSketch backed by the given MemorySegment.
-
wrap
Wrap takes the sketch image in the given MemorySegment and refers to it directly. There is no data copying onto the java heap. The wrap operation enables fast read-only merging and access to all the public read-only API.Wrapping any subclass of this class that is empty or contains only a single item will result in heapified forms of empty and single item sketch respectively. This is actually faster and consumes less overall space.
This method checks if the given expectedSeed was used to create the source MemorySegment image.
- Parameters:
srcSeg- an image of a CompactThetaSketch that was created using the given expectedSeed.expectedSeed- the seed used to validate the given MemorySegment image. See Update Hash Seed.- Returns:
- a CompactThetaketch backed by the given MemorySegment.
-
wrap
Wrap takes the sketch image in the given byte array and refers to it directly. There is no data copying onto the java heap. The wrap operation enables fast read-only merging and access to all the public read-only API.Only sketches that have been explicitly stored as direct sketches can be wrapped.
Wrapping any subclass of this class that is empty or contains only a single item will result in heapified forms of empty and single item sketch respectively. This is actually faster and consumes less overall space.
This method checks if the DEFAULT_UPDATE_SEED was used to create the source byte array image.
- Parameters:
bytes- a byte array image of a CompactThetaSketch that was created using the DEFAULT_UPDATE_SEED.- Returns:
- a CompactThetaSketch backed by the given byte array except as above.
-
wrap
Wrap takes the sketch image in the given byte array and refers to it directly. There is no data copying onto the java heap. The wrap operation enables fast read-only merging and access to all the public read-only API.Only sketches that have been explicitly stored as direct sketches can be wrapped.
Wrapping any subclass of this class that is empty or contains only a single item will result in heapified forms of empty and single item sketch respectively. This is actually faster and consumes less overall space.
This method checks if the given expectedSeed was used to create the source byte array image.
- Parameters:
bytes- a byte array image of a CompactThetaSketch that was created using the given expectedSeed.expectedSeed- the seed used to validate the given byte array image. See Update Hash Seed.- Returns:
- a CompactThetaSketch backed by the given byte array except as above.
-
compact
Description copied from class:ThetaSketchConvert this sketch to a CompactThetaSketch.If this sketch is a type of UpdatableThetaSketch, the compacting process converts the hash table of the UpdatableThetaketch to a simple list of the valid hash values. Any hash values of zero or equal-to or greater than theta will be discarded. The number of valid values remaining in the CompactThetaSketch depends on a number of factors, but may be larger or smaller than Nominal Entries (or k). It will never exceed 2k. If it is critical to always limit the size to no more than k, then rebuild() should be called on the UpdatableThetaSketch prior to calling this method.
A CompactThetaSketch is always immutable.
A new CompactThetaSketch object is created:
- if dstSeg!= null
- if dstSeg == null and this.hasMemorySegment() == true
- if dstSeg == null and this has more than 1 item and this.isOrdered() == false and dstOrdered == true.
Otherwise, this operation returns this.
- Specified by:
compactin classThetaSketch- Parameters:
dstOrdered- assumed true if this sketch is empty or has only one value See Destination OrdereddstSeg- See Destination MemorySegment.- Returns:
- this sketch as a CompactThetaSketch.
-
getCompactBytes
public int getCompactBytes()Description copied from class:ThetaSketchReturns the number of storage bytes required for this ThetaSketch if its current state were compacted. It this sketch is already in the compact form this is equivalent to callingThetaSketch.getCurrentBytes().- Specified by:
getCompactBytesin classThetaSketch- Returns:
- number of compact bytes
-
getFamily
Description copied from class:ThetaSketchReturns the Family that this sketch belongs to- Specified by:
getFamilyin classThetaSketch- Returns:
- the Family that this sketch belongs to
-
hasMemorySegment
public boolean hasMemorySegment()Description copied from interface:MemorySegmentStatusReturns true if this object's internal data is backed by a MemorySegment, which may be on-heap or off-heap.- Returns:
- true if this object's internal data is backed by a MemorySegment.
-
isCompact
public boolean isCompact()Description copied from class:ThetaSketchReturns true if this sketch is in compact form.- Specified by:
isCompactin classThetaSketch- Returns:
- true if this sketch is in compact form.
-
isOffHeap
public boolean isOffHeap()Description copied from interface:MemorySegmentStatusReturns true if this object's internal data is backed by an off-heap (direct or native)) MemorySegment.- Returns:
- true if this object's internal data is backed by an off-heap (direct or native)) MemorySegment.
-
isSameResource
Description copied from interface:MemorySegmentStatusReturns true if an internally referenced MemorySegment has the same backing resource as that, or equivalently, if their two memory regions overlap. This applies to both on-heap and off-heap MemorySegments.Note: If both segments are on-heap and not read-only, it can be determined if they were derived from the same backing memory (array). However, this is not always possible off-heap. Because of this asymmetry, this definition of "isSameResource" is confined to the existence of an overlap.
- Parameters:
that- The given MemorySegment.- Returns:
- true if an internally referenced MemorySegment has the same backing resource as that.
-
getEstimate
public double getEstimate()Description copied from class:ThetaSketchGets the unique count estimate.- Specified by:
getEstimatein classThetaSketch- Returns:
- the sketch's best estimate of the cardinality of the input stream.
-
toByteArrayCompressed
public byte[] toByteArrayCompressed()gets the sketch as a compressed byte array- Returns:
- the sketch as a compressed byte array
-