Class CountMinSketch
java.lang.Object
org.apache.datasketches.count.CountMinSketch
Java implementation of the CountMin sketch data structure of Cormode and Muthukrishnan.
This implementation is inspired by and compatible with the datasketches-cpp version by Charlie Dickens.
The CountMin sketch is a probabilistic data structure that provides frequency estimates for items
in a data stream. It uses multiple hash functions to distribute items across a two-dimensional array,
providing approximate counts with configurable error bounds.
Reference: http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf
-
Method Summary
Modifier and TypeMethodDescriptionstatic CountMinSketchdeserialize(byte[] b, long seed) Deserializes a CountMinSketch from the provided byte array.longgetEstimate(byte[] item) Returns the estimated frequency for the given item.longgetEstimate(long item) Returns the estimated frequency for the given item.longgetEstimate(String item) Returns the estimated frequency for the given item.longgetLowerBound(byte[] item) Returns the lower bound of the estimated frequency for the given item.longgetLowerBound(long item) Returns the lower bound of the estimated frequency for the given item.longgetLowerBound(String item) Returns the lower bound of the estimated frequency for the given item.intReturns the number of buckets per hash function.byteReturns the number of hash functions used in this sketch.doubleReturns the relative error of the sketch.longgetSeed_()Returns the hash seed used by this sketch.longReturns the total weight of all items inserted into the sketch.longgetUpperBound(byte[] item) Returns the upper bound of the estimated frequency for the given item.longgetUpperBound(long item) Returns the upper bound of the estimated frequency for the given item.longgetUpperBound(String item) Returns the upper bound of the estimated frequency for the given item.booleanisEmpty()Checks if the CountMinSketch has processed any items.voidmerge(CountMinSketch other) Merges another CountMinSketch into this one.static intsuggestNumBuckets(double relativeError) Suggests an appropriate number of buckets per hash function for a given relative error.static bytesuggestNumHashes(double confidence) Suggests an appropriate number of hash functions to use for a given confidence level.byte[]Returns the sketch as a byte array.voidupdate(byte[] item, long weight) Updates the sketch with the provided item and weight.voidupdate(long item, long weight) Updates the sketch with the provided item and weight.voidUpdates the sketch with the provided item and weight.
-
Method Details
-
isEmpty
public boolean isEmpty()Checks if the CountMinSketch has processed any items.- Returns:
- True if the sketch is empty, otherwise false.
-
getNumHashes_
public byte getNumHashes_()Returns the number of hash functions used in this sketch.- Returns:
- The number of hash functions.
-
getNumBuckets_
public int getNumBuckets_()Returns the number of buckets per hash function.- Returns:
- The number of buckets.
-
getSeed_
public long getSeed_()Returns the hash seed used by this sketch.- Returns:
- The seed value.
-
getTotalWeight_
public long getTotalWeight_()Returns the total weight of all items inserted into the sketch.- Returns:
- The total weight.
-
getRelativeError
public double getRelativeError()Returns the relative error of the sketch.- Returns:
- The relative error.
-
suggestNumHashes
public static byte suggestNumHashes(double confidence) Suggests an appropriate number of hash functions to use for a given confidence level.- Parameters:
confidence- The desired confidence level between 0 and 1.- Returns:
- Suggested number of hash functions.
-
suggestNumBuckets
public static int suggestNumBuckets(double relativeError) Suggests an appropriate number of buckets per hash function for a given relative error.- Parameters:
relativeError- The desired relative error.- Returns:
- Suggested number of buckets.
-
update
public void update(long item, long weight) Updates the sketch with the provided item and weight.- Parameters:
item- The item to update.weight- The weight of the item.
-
update
Updates the sketch with the provided item and weight.- Parameters:
item- The item to update.weight- The weight of the item.
-
update
public void update(byte[] item, long weight) Updates the sketch with the provided item and weight.- Parameters:
item- The item to update.weight- The weight of the item.
-
getEstimate
public long getEstimate(long item) Returns the estimated frequency for the given item.- Parameters:
item- The item to estimate.- Returns:
- Estimated frequency.
-
getEstimate
Returns the estimated frequency for the given item.- Parameters:
item- The item to estimate.- Returns:
- Estimated frequency.
-
getEstimate
public long getEstimate(byte[] item) Returns the estimated frequency for the given item.- Parameters:
item- The item to estimate.- Returns:
- Estimated frequency.
-
getUpperBound
public long getUpperBound(long item) Returns the upper bound of the estimated frequency for the given item.- Parameters:
item- The item to estimate.- Returns:
- Upper bound of estimated frequency.
-
getUpperBound
Returns the upper bound of the estimated frequency for the given item.- Parameters:
item- The item to estimate.- Returns:
- Upper bound of estimated frequency.
-
getUpperBound
public long getUpperBound(byte[] item) Returns the upper bound of the estimated frequency for the given item.- Parameters:
item- The item to estimate.- Returns:
- Upper bound of estimated frequency.
-
getLowerBound
public long getLowerBound(long item) Returns the lower bound of the estimated frequency for the given item.- Parameters:
item- The item to estimate.- Returns:
- Lower bound of estimated frequency.
-
getLowerBound
Returns the lower bound of the estimated frequency for the given item.- Parameters:
item- The item to estimate.- Returns:
- Lower bound of estimated frequency.
-
getLowerBound
public long getLowerBound(byte[] item) Returns the lower bound of the estimated frequency for the given item.- Parameters:
item- The item to estimate.- Returns:
- Lower bound of estimated frequency.
-
merge
Merges another CountMinSketch into this one. The sketches must have the same configuration.- Parameters:
other- The other sketch to merge.
-
toByteArray
public byte[] toByteArray()Returns the sketch as a byte array.- Returns:
- the result byte array
-
deserialize
Deserializes a CountMinSketch from the provided byte array.- Parameters:
b- The byte array containing the serialized sketch.seed- The seed used during serialization.- Returns:
- The deserialized CountMinSketch.
-