Class CompactSketch
- All Implemented Interfaces:
MemoryStatus
A CompactSketch is the simplest form of a Theta Sketch. It consists of a compact list (i.e., no intervening spaces) of hash values, which may be ordered or not, a value for theta and a seed hash. A CompactSketch is immutable (read-only), and the space required when stored is only the space required for the hash values and 8 to 24 bytes of preamble. An empty CompactSketch consumes only 8 bytes.
- Author:
- Lee Rhodes
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionabstract CompactSketch
compact
(boolean dstOrdered, org.apache.datasketches.memory.WritableMemory dstMem) Convert this sketch to a CompactSketch.int
Returns the number of storage bytes required for this Sketch if its current state were compacted.Returns the Family that this sketch belongs tostatic CompactSketch
heapify
(org.apache.datasketches.memory.Memory srcMem) Heapify takes a CompactSketch image in Memory and instantiates an on-heap CompactSketch.static CompactSketch
heapify
(org.apache.datasketches.memory.Memory srcMem, long expectedSeed) Heapify takes a CompactSketch image in Memory and instantiates an on-heap CompactSketch.boolean
Returns true if this sketch is in compact form.byte[]
gets the sketch as a compressed byte arraystatic CompactSketch
wrap
(org.apache.datasketches.memory.Memory srcMem) Wrap takes the CompactSketch image in given Memory and refers to it directly.static CompactSketch
wrap
(org.apache.datasketches.memory.Memory srcMem, long expectedSeed) Wrap takes the sketch image in the given Memory and refers to it directly.Methods inherited from class org.apache.datasketches.theta.Sketch
compact, getCompactSketchMaxBytes, getCountLessThanThetaLong, getCurrentBytes, getEstimate, getLowerBound, getMaxCompactSketchBytes, getMaxUpdateSketchBytes, getRetainedEntries, getRetainedEntries, getSerializationVersion, getTheta, getThetaLong, getUpperBound, isEmpty, isEstimationMode, isOrdered, iterator, toByteArray, toString, toString, toString, toString
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface org.apache.datasketches.common.MemoryStatus
hasMemory, isDirect, isSameResource
-
Constructor Details
-
CompactSketch
public CompactSketch()
-
-
Method Details
-
heapify
Heapify takes a CompactSketch image in Memory and instantiates an on-heap CompactSketch.The resulting sketch will not retain any link to the source Memory and all of its data will be copied to the heap CompactSketch.
This method assumes that the sketch image was created with the correct hash seed, so it is not checked. The resulting on-heap CompactSketch will be given the seedHash derived from the given sketch image. However, Serial Version 1 sketch images do not have a seedHash field, so the resulting heapified CompactSketch will be given the hash of the DEFAULT_UPDATE_SEED.
- Parameters:
srcMem
- an image of a CompactSketch. See Memory.- Returns:
- a CompactSketch on the heap.
-
heapify
public static CompactSketch heapify(org.apache.datasketches.memory.Memory srcMem, long expectedSeed) Heapify takes a CompactSketch image in Memory and instantiates an on-heap CompactSketch.The resulting sketch will not retain any link to the source Memory and all of its data will be copied to the heap CompactSketch.
This method checks if the given expectedSeed was used to create the source Memory image. However, SerialVersion 1 sketch images cannot be checked as they don't have a seedHash field, so the resulting heapified CompactSketch will be given the hash of the expectedSeed.
- Parameters:
srcMem
- an image of a CompactSketch that was created using the given expectedSeed. See Memory.expectedSeed
- the seed used to validate the given Memory image. See Update Hash Seed.- Returns:
- a CompactSketch on the heap.
-
wrap
Wrap takes the CompactSketch image in given Memory and refers to it directly. There is no data copying onto the java heap. The wrap operation enables fast read-only merging and access to all the public read-only API.Only "Direct" Serialization Version 3 (i.e, OpenSource) sketches that have been explicitly stored as direct sketches can be wrapped. Wrapping earlier serial version sketches will result in a heapify operation. These early versions were never designed to "wrap".
Wrapping any subclass of this class that is empty or contains only a single item will result in heapified forms of empty and single item sketch respectively. This is actually faster and consumes less overall memory.
This method assumes that the sketch image was created with the correct hash seed, so it is not checked. However, Serial Version 1 sketch images do not have a seedHash field, so the resulting on-heap CompactSketch will be given the hash of the DEFAULT_UPDATE_SEED.
- Parameters:
srcMem
- an image of a Sketch. See Memory.- Returns:
- a CompactSketch backed by the given Memory except as above.
-
wrap
Wrap takes the sketch image in the given Memory and refers to it directly. There is no data copying onto the java heap. The wrap operation enables fast read-only merging and access to all the public read-only API.Only "Direct" Serialization Version 3 (i.e, OpenSource) sketches that have been explicitly stored as direct sketches can be wrapped. Wrapping earlier serial version sketches will result in a heapify operation. These early versions were never designed to "wrap".
Wrapping any subclass of this class that is empty or contains only a single item will result in heapified forms of empty and single item sketch respectively. This is actually faster and consumes less overall memory.
This method checks if the given expectedSeed was used to create the source Memory image. However, SerialVersion 1 sketches cannot be checked as they don't have a seedHash field, so the resulting heapified CompactSketch will be given the hash of the expectedSeed.
- Parameters:
srcMem
- an image of a Sketch that was created using the given expectedSeed. See MemoryexpectedSeed
- the seed used to validate the given Memory image. See Update Hash Seed.- Returns:
- a CompactSketch backed by the given Memory except as above.
-
compact
public abstract CompactSketch compact(boolean dstOrdered, org.apache.datasketches.memory.WritableMemory dstMem) Description copied from class:Sketch
Convert this sketch to a CompactSketch.If this sketch is a type of UpdateSketch, the compacting process converts the hash table of the UpdateSketch to a simple list of the valid hash values. Any hash values of zero or equal-to or greater than theta will be discarded. The number of valid values remaining in the CompactSketch depends on a number of factors, but may be larger or smaller than Nominal Entries (or k). It will never exceed 2k. If it is critical to always limit the size to no more than k, then rebuild() should be called on the UpdateSketch prior to calling this method.
A CompactSketch is always immutable.
A new CompactSketch object is created:
- if dstMem != null
- if dstMem == null and this.hasMemory() == true
- if dstMem == null and this has more than 1 item and this.isOrdered() == false and dstOrdered == true.
Otherwise, this operation returns this.
- Specified by:
compact
in classSketch
- Parameters:
dstOrdered
- assumed true if this sketch is empty or has only one value See Destination OrdereddstMem
- See Destination Memory.- Returns:
- this sketch as a CompactSketch.
-
getCompactBytes
public int getCompactBytes()Description copied from class:Sketch
Returns the number of storage bytes required for this Sketch if its current state were compacted. It this sketch is already in the compact form this is equivalent to callingSketch.getCurrentBytes()
.- Specified by:
getCompactBytes
in classSketch
- Returns:
- number of compact bytes
-
getFamily
Description copied from class:Sketch
Returns the Family that this sketch belongs to -
isCompact
public boolean isCompact()Description copied from class:Sketch
Returns true if this sketch is in compact form. -
toByteArrayCompressed
public byte[] toByteArrayCompressed()gets the sketch as a compressed byte array- Returns:
- the sketch as a compressed byte array
-