CountMin Sketch

The CountMin sketch, as described in Cormode and Muthukrishnan in http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf, is used for approximate Frequency Estimation. For an item \(x\) with frequency \(f_x\), the sketch provides an estimate, \(\hat{f_x}\), such that \(f_x \approx \hat{f_x}.\) The sketch guarantees that \(f_x \le \hat{f_x}\) and provides a probabilistic upper bound which is dependent on the size parameters. The sketch provides an estimate of the occurrence frequency for any queried item but, in contrast to the Frequent Items Sketch, this sketch does not provide a list of heavy hitters.

class count_min_sketch

Static Methods:

deserialize(bytes: bytes) _datasketches.count_min_sketch

Reads a bytes object and returns the corresponding count_min_sketch

suggest_num_buckets(relative_error: float) int

Suggests the number of buckets needed to achieve an accuracy within the provided relative_error. For example, when relative_error = 0.05, the returned frequency estimates satisfy the ‘relative_error’ guarantee that never overestimates the weights but may underestimate the weights by 5% of the total weight in the sketch. Returns the number of hash buckets at every level of the sketch required in order to obtain the specified relative error.

suggest_num_hashes(confidence: float) int

Suggests the number of hashes needed to achieve the provided confidence. For example, with 95% confidence, frequency estimates satisfy the ‘relative_error’ guarantee. Returns the number of hash functions that are required in order to achieve the specified confidence of the sketch. confidence = 1 - delta, with delta denoting the sketch failure probability.

Non-static Methods:

get_estimate

Overloaded function.

  1. get_estimate(self, item: int) -> float

Returns an estimate of the frequency of the provided 64-bit integer value

  1. get_estimate(self, item: str) -> float

Returns an estimate of the frequency of the provided string

get_lower_bound

Overloaded function.

  1. get_lower_bound(self, item: int) -> float

Returns a lower bound on the estimate for the given 64-bit integer value

  1. get_lower_bound(self, item: str) -> float

Returns a lower bound on the estimate for the provided string

get_relative_error

Returns the maximum permissible error for any frequency estimate query

get_serialized_size_bytes

Returns the size in bytes of the serialized image of the sketch

get_upper_bound

Overloaded function.

  1. get_upper_bound(self, item: int) -> float

Returns an upper bound on the estimate for the given 64-bit integer value

  1. get_upper_bound(self, item: str) -> float

Returns an upper bound on the estimate for the provided string

is_empty

Returns True if the sketch has seen no items, otherwise False

merge

Merges the provided other sketch into this one

property num_buckets

The configured number of buckets for the sketch

property num_hashes

The configured number of hashes for the sketch

property seed

The base hash seed for the sketch

serialize

Serializes the sketch into a bytes object

to_string

Produces a string summary of the sketch

property total_weight

The total weight currently inserted into the stream

update

Overloaded function.

  1. update(self, item: int, weight: float = 1.0) -> None

Updates the sketch with the given 64-bit integer value

  1. update(self, item: str, weight: float = 1.0) -> None

Updates the sketch with the given string