Class Sketch
- java.lang.Object
-
- org.apache.datasketches.theta.Sketch
-
- All Implemented Interfaces:
MemoryStatus
- Direct Known Subclasses:
CompactSketch
,UpdateSketch
public abstract class Sketch extends Object implements MemoryStatus
The top-level class for all theta sketches. This class is never constructed directly. Use the UpdateSketch.builder() methods to create UpdateSketches.- Author:
- Lee Rhodes
-
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description CompactSketch
compact()
Converts this sketch to a ordered CompactSketch.abstract CompactSketch
compact(boolean dstOrdered, org.apache.datasketches.memory.WritableMemory dstMem)
Convert this sketch to a CompactSketch.abstract int
getCompactBytes()
Returns the number of storage bytes required for this Sketch if its current state were compacted.static int
getCompactSketchMaxBytes(int lgNomEntries)
Returns the maximum number of storage bytes required for a CompactSketch given the configured log_base2 of the number of nominal entries, which is a power of 2.int
getCountLessThanThetaLong(long thetaLong)
Gets the number of hash values less than the given theta expressed as a long.abstract int
getCurrentBytes()
Returns the number of storage bytes required for this sketch in its current state.abstract double
getEstimate()
Gets the unique count estimate.abstract Family
getFamily()
Returns the Family that this sketch belongs todouble
getLowerBound(int numStdDev)
Gets the approximate lower error bound given the specified number of Standard Deviations.static int
getMaxCompactSketchBytes(int numberOfEntries)
Returns the maximum number of storage bytes required for a CompactSketch with the given number of actual entries.static int
getMaxUpdateSketchBytes(int nomEntries)
Returns the maximum number of storage bytes required for an UpdateSketch with the given number of nominal entries (power of 2).int
getRetainedEntries()
Returns the number of valid entries that have been retained by the sketch.abstract int
getRetainedEntries(boolean valid)
Returns the number of entries that have been retained by the sketch.static int
getSerializationVersion(org.apache.datasketches.memory.Memory mem)
Returns the serialization version from the given Memorydouble
getTheta()
Gets the value of theta as a double with a value between zero and oneabstract long
getThetaLong()
Gets the value of theta as a longdouble
getUpperBound(int numStdDev)
Gets the approximate upper error bound given the specified number of Standard Deviations.static Sketch
heapify(org.apache.datasketches.memory.Memory srcMem)
Heapify takes the sketch image in Memory and instantiates an on-heap Sketch.static Sketch
heapify(org.apache.datasketches.memory.Memory srcMem, long expectedSeed)
Heapify takes the sketch image in Memory and instantiates an on-heap Sketch.abstract boolean
isCompact()
Returns true if this sketch is in compact form.abstract boolean
isEmpty()
boolean
isEstimationMode()
Returns true if the sketch is Estimation Mode (as opposed to Exact Mode).abstract boolean
isOrdered()
Returns true if internal cache is orderedabstract HashIterator
iterator()
Returns a HashIterator that can be used to iterate over the retained hash values of the Theta sketch.abstract byte[]
toByteArray()
Serialize this sketch to a byte array form.String
toString()
Returns a human readable summary of the sketch.String
toString(boolean sketchSummary, boolean dataDetail, int width, boolean hexMode)
Gets a human readable listing of contents and summary of the given sketch.static String
toString(byte[] byteArr)
Returns a human readable string of the preamble of a byte array image of a Theta Sketch.static String
toString(org.apache.datasketches.memory.Memory mem)
Returns a human readable string of the preamble of a Memory image of a Theta Sketch.static Sketch
wrap(org.apache.datasketches.memory.Memory srcMem)
Wrap takes the sketch image in the given Memory and refers to it directly.static Sketch
wrap(org.apache.datasketches.memory.Memory srcMem, long expectedSeed)
Wrap takes the sketch image in the given Memory and refers to it directly.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface org.apache.datasketches.common.MemoryStatus
hasMemory, isDirect, isSameResource
-
-
-
-
Method Detail
-
heapify
public static Sketch heapify(org.apache.datasketches.memory.Memory srcMem)
Heapify takes the sketch image in Memory and instantiates an on-heap Sketch.The resulting sketch will not retain any link to the source Memory.
For Update Sketches this method checks if the Default Update Seed
was used to create the source Memory image.For Compact Sketches this method assumes that the sketch image was created with the correct hash seed, so it is not checked.
- Parameters:
srcMem
- an image of a Sketch. See Memory.- Returns:
- a Sketch on the heap.
-
heapify
public static Sketch heapify(org.apache.datasketches.memory.Memory srcMem, long expectedSeed)
Heapify takes the sketch image in Memory and instantiates an on-heap Sketch.The resulting sketch will not retain any link to the source Memory.
For Update and Compact Sketches this method checks if the given expectedSeed was used to create the source Memory image. However, SerialVersion 1 sketches cannot be checked.
- Parameters:
srcMem
- an image of a Sketch that was created using the given expectedSeed. See Memory.expectedSeed
- the seed used to validate the given Memory image. See Update Hash Seed. Compact sketches store a 16-bit hash of the seed, but not the seed itself.- Returns:
- a Sketch on the heap.
-
wrap
public static Sketch wrap(org.apache.datasketches.memory.Memory srcMem)
Wrap takes the sketch image in the given Memory and refers to it directly. There is no data copying onto the java heap. The wrap operation enables fast read-only merging and access to all the public read-only API.Only "Direct" Serialization Version 3 (i.e, OpenSource) sketches that have been explicitly stored as direct sketches can be wrapped. Wrapping earlier serial version sketches will result in a on-heap CompactSketch where all data will be copied to the heap. These early versions were never designed to "wrap".
Wrapping any subclass of this class that is empty or contains only a single item will result in on-heap equivalent forms of empty and single item sketch respectively. This is actually faster and consumes less overall memory.
For Update Sketches this method checks if the Default Update Seed
was used to create the source Memory image.For Compact Sketches this method assumes that the sketch image was created with the correct hash seed, so it is not checked.
- Parameters:
srcMem
- an image of a Sketch. See Memory.- Returns:
- a Sketch backed by the given Memory
-
wrap
public static Sketch wrap(org.apache.datasketches.memory.Memory srcMem, long expectedSeed)
Wrap takes the sketch image in the given Memory and refers to it directly. There is no data copying onto the java heap. The wrap operation enables fast read-only merging and access to all the public read-only API.Only "Direct" Serialization Version 3 (i.e, OpenSource) sketches that have been explicitly stored as direct sketches can be wrapped. Wrapping earlier serial version sketches will result in a on-heap CompactSketch where all data will be copied to the heap. These early versions were never designed to "wrap".
Wrapping any subclass of this class that is empty or contains only a single item will result in on-heap equivalent forms of empty and single item sketch respectively. This is actually faster and consumes less overall memory.
For Update and Compact Sketches this method checks if the given expectedSeed was used to create the source Memory image. However, SerialVersion 1 sketches cannot be checked.
- Parameters:
srcMem
- an image of a Sketch. See MemoryexpectedSeed
- the seed used to validate the given Memory image. See Update Hash Seed.- Returns:
- a UpdateSketch backed by the given Memory except as above.
-
compact
public CompactSketch compact()
Converts this sketch to a ordered CompactSketch.If this.isCompact() == true this method returns this, otherwise, this method is equivalent to
compact(true, null)
.A CompactSketch is always immutable.
- Returns:
- this sketch as an ordered CompactSketch.
-
compact
public abstract CompactSketch compact(boolean dstOrdered, org.apache.datasketches.memory.WritableMemory dstMem)
Convert this sketch to a CompactSketch.If this sketch is a type of UpdateSketch, the compacting process converts the hash table of the UpdateSketch to a simple list of the valid hash values. Any hash values of zero or equal-to or greater than theta will be discarded. The number of valid values remaining in the CompactSketch depends on a number of factors, but may be larger or smaller than Nominal Entries (or k). It will never exceed 2k. If it is critical to always limit the size to no more than k, then rebuild() should be called on the UpdateSketch prior to calling this method.
A CompactSketch is always immutable.
A new CompactSketch object is created:
- if dstMem != null
- if dstMem == null and this.hasMemory() == true
- if dstMem == null and this has more than 1 item and this.isOrdered() == false and dstOrdered == true.
Otherwise, this operation returns this.
- Parameters:
dstOrdered
- assumed true if this sketch is empty or has only one value See Destination OrdereddstMem
- See Destination Memory.- Returns:
- this sketch as a CompactSketch.
-
getCompactBytes
public abstract int getCompactBytes()
Returns the number of storage bytes required for this Sketch if its current state were compacted. It this sketch is already in the compact form this is equivalent to callinggetCurrentBytes()
.- Returns:
- number of compact bytes
-
getCountLessThanThetaLong
public int getCountLessThanThetaLong(long thetaLong)
Gets the number of hash values less than the given theta expressed as a long.- Parameters:
thetaLong
- the given theta as a long between zero and Long.MAX_VALUE.- Returns:
- the number of hash values less than the given thetaLong.
-
getCurrentBytes
public abstract int getCurrentBytes()
Returns the number of storage bytes required for this sketch in its current state.- Returns:
- the number of storage bytes required for this sketch
-
getEstimate
public abstract double getEstimate()
Gets the unique count estimate.- Returns:
- the sketch's best estimate of the cardinality of the input stream.
-
getFamily
public abstract Family getFamily()
Returns the Family that this sketch belongs to- Returns:
- the Family that this sketch belongs to
-
getLowerBound
public double getLowerBound(int numStdDev)
Gets the approximate lower error bound given the specified number of Standard Deviations. This will return getEstimate() if isEmpty() is true.- Parameters:
numStdDev
- See Number of Standard Deviations- Returns:
- the lower bound.
-
getMaxCompactSketchBytes
public static int getMaxCompactSketchBytes(int numberOfEntries)
Returns the maximum number of storage bytes required for a CompactSketch with the given number of actual entries.- Parameters:
numberOfEntries
- the actual number of retained entries stored in the sketch.- Returns:
- the maximum number of storage bytes required for a CompactSketch with the given number of retained entries.
-
getCompactSketchMaxBytes
public static int getCompactSketchMaxBytes(int lgNomEntries)
Returns the maximum number of storage bytes required for a CompactSketch given the configured log_base2 of the number of nominal entries, which is a power of 2.- Parameters:
lgNomEntries
- Nominal Entries- Returns:
- the maximum number of storage bytes required for a CompactSketch with the given lgNomEntries.
-
getMaxUpdateSketchBytes
public static int getMaxUpdateSketchBytes(int nomEntries)
Returns the maximum number of storage bytes required for an UpdateSketch with the given number of nominal entries (power of 2).- Parameters:
nomEntries
- Nominal Entries This will become the ceiling power of 2 if it is not.- Returns:
- the maximum number of storage bytes required for a UpdateSketch with the given nomEntries
-
getRetainedEntries
public int getRetainedEntries()
Returns the number of valid entries that have been retained by the sketch.- Returns:
- the number of valid retained entries
-
getRetainedEntries
public abstract int getRetainedEntries(boolean valid)
Returns the number of entries that have been retained by the sketch.- Parameters:
valid
- if true, returns the number of valid entries, which are less than theta and used for estimation. Otherwise, return the number of all entries, valid or not, that are currently in the internal sketch cache.- Returns:
- the number of retained entries
-
getSerializationVersion
public static int getSerializationVersion(org.apache.datasketches.memory.Memory mem)
Returns the serialization version from the given Memory- Parameters:
mem
- the sketch Memory- Returns:
- the serialization version from the Memory
-
getTheta
public double getTheta()
Gets the value of theta as a double with a value between zero and one- Returns:
- the value of theta as a double
-
getThetaLong
public abstract long getThetaLong()
Gets the value of theta as a long- Returns:
- the value of theta as a long
-
getUpperBound
public double getUpperBound(int numStdDev)
Gets the approximate upper error bound given the specified number of Standard Deviations. This will return getEstimate() if isEmpty() is true.- Parameters:
numStdDev
- See Number of Standard Deviations- Returns:
- the upper bound.
-
isCompact
public abstract boolean isCompact()
Returns true if this sketch is in compact form.- Returns:
- true if this sketch is in compact form.
-
isEmpty
public abstract boolean isEmpty()
- Returns:
- true if empty.
-
isEstimationMode
public boolean isEstimationMode()
Returns true if the sketch is Estimation Mode (as opposed to Exact Mode). This is true if theta < 1.0 AND isEmpty() is false.- Returns:
- true if the sketch is in estimation mode.
-
isOrdered
public abstract boolean isOrdered()
Returns true if internal cache is ordered- Returns:
- true if internal cache is ordered
-
iterator
public abstract HashIterator iterator()
Returns a HashIterator that can be used to iterate over the retained hash values of the Theta sketch.- Returns:
- a HashIterator that can be used to iterate over the retained hash values of the Theta sketch.
-
toByteArray
public abstract byte[] toByteArray()
Serialize this sketch to a byte array form.- Returns:
- byte array of this sketch
-
toString
public String toString()
Returns a human readable summary of the sketch. This method is equivalent to the parameterized call:
Sketch.toString(sketch, true, false, 8, true);
-
toString
public String toString(boolean sketchSummary, boolean dataDetail, int width, boolean hexMode)
Gets a human readable listing of contents and summary of the given sketch. This can be a very long string. If this sketch is in a "dirty" state there may be values in the dataDetail view that are ≥ theta.- Parameters:
sketchSummary
- If true the sketch summary will be output at the end.dataDetail
- If true, includes all valid hash values in the sketch.width
- The number of columns of hash values. Default is 8.hexMode
- If true, hashes will be output in hex.- Returns:
- The result string, which can be very long.
-
toString
public static String toString(byte[] byteArr)
Returns a human readable string of the preamble of a byte array image of a Theta Sketch.- Parameters:
byteArr
- the given byte array- Returns:
- a human readable string of the preamble of a byte array image of a Theta Sketch.
-
toString
public static String toString(org.apache.datasketches.memory.Memory mem)
Returns a human readable string of the preamble of a Memory image of a Theta Sketch.- Parameters:
mem
- the given Memory object- Returns:
- a human readable string of the preamble of a Memory image of a Theta Sketch.
-
-