CountMin Sketch
The CountMin sketch, as described in Cormode and Muthukrishnan in http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf, is used for approximate Frequency Estimation. For an item \(x\) with frequency \(f_x\), the sketch provides an estimate, \(\hat{f_x}\), such that \(f_x \approx \hat{f_x}.\) The sketch guarantees that \(f_x \le \hat{f_x}\) and provides a probabilistic upper bound which is dependent on the size parameters. The sketch provides an estimate of the occurrence frequency for any queried item but, in contrast to the Frequent Items Sketch, this sketch does not provide a list of heavy hitters.
- class count_min_sketch(*args, **kwargs)
Static Methods:
- deserialize(bytes: bytes) _datasketches.count_min_sketch
Reads a bytes object and returns the corresponding count_min_sketch
- suggest_num_buckets(relative_error: float) int
Suggests the number of buckets needed to achieve an accuracy within the provided relative_error. For example, when relative_error = 0.05, the returned frequency estimates satisfy the ‘relative_error’ guarantee that never overestimates the weights but may underestimate the weights by 5% of the total weight in the sketch. Returns the number of hash buckets at every level of the sketch required in order to obtain the specified relative error.
- suggest_num_hashes(confidence: float) int
Suggests the number of hashes needed to achieve the provided confidence. For example, with 95% confidence, frequency estimates satisfy the ‘relative_error’ guarantee. Returns the number of hash functions that are required in order to achieve the specified confidence of the sketch. confidence = 1 - delta, with delta denoting the sketch failure probability.
Non-static Methods:
- get_estimate
Overloaded function.
get_estimate(self, item: int) -> float
Returns an estimate of the frequency of the provided 64-bit integer value
get_estimate(self, item: str) -> float
Returns an estimate of the frequency of the provided string
- get_lower_bound
Overloaded function.
get_lower_bound(self, item: int) -> float
Returns a lower bound on the estimate for the given 64-bit integer value
get_lower_bound(self, item: str) -> float
Returns a lower bound on the estimate for the provided string
- get_relative_error
Returns the maximum permissible error for any frequency estimate query
- get_serialized_size_bytes
Returns the size in bytes of the serialized image of the sketch
- get_upper_bound
Overloaded function.
get_upper_bound(self, item: int) -> float
Returns an upper bound on the estimate for the given 64-bit integer value
get_upper_bound(self, item: str) -> float
Returns an upper bound on the estimate for the provided string
- is_empty
Returns True if the sketch has seen no items, otherwise False
- merge
Merges the provided other sketch into this one
- property num_buckets
The configured number of buckets for the sketch
- property num_hashes
The configured number of hashes for the sketch
- property seed
The base hash seed for the sketch
- serialize
Serializes the sketch into a bytes object
- to_string
Produces a string summary of the sketch
- property total_weight
The total weight currently inserted into the stream
- update
Overloaded function.
update(self, item: int, weight: float = 1.0) -> None
Updates the sketch with the given 64-bit integer value
update(self, item: str, weight: float = 1.0) -> None
Updates the sketch with the given string