CountMin Sketch
The CountMin sketch, as described in Cormode and Muthukrishnan in
http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf,
is used for approximate Frequency Estimation.
For an item
- class count_min_sketch(*args, **kwargs)
Static Methods:
- deserialize(bytes: bytes) _datasketches.count_min_sketch
Reads a bytes object and returns the corresponding count_min_sketch
- suggest_num_buckets(relative_error: float) int
Suggests the number of buckets needed to achieve an accuracy within the provided relative_error. For example, when relative_error = 0.05, the returned frequency estimates satisfy the ‘relative_error’ guarantee that never overestimates the weights but may underestimate the weights by 5% of the total weight in the sketch. Returns the number of hash buckets at every level of the sketch required in order to obtain the specified relative error.
- suggest_num_hashes(confidence: float) int
Suggests the number of hashes needed to achieve the provided confidence. For example, with 95% confidence, frequency estimates satisfy the ‘relative_error’ guarantee. Returns the number of hash functions that are required in order to achieve the specified confidence of the sketch. confidence = 1 - delta, with delta denoting the sketch failure probability.
Non-static Methods:
- get_estimate
Overloaded function.
get_estimate(self, item: int) -> float
Returns an estimate of the frequency of the provided 64-bit integer value
get_estimate(self, item: str) -> float
Returns an estimate of the frequency of the provided string
- get_lower_bound
Overloaded function.
get_lower_bound(self, item: int) -> float
Returns a lower bound on the estimate for the given 64-bit integer value
get_lower_bound(self, item: str) -> float
Returns a lower bound on the estimate for the provided string
- get_relative_error
Returns the maximum permissible error for any frequency estimate query
- get_serialized_size_bytes
Returns the size in bytes of the serialized image of the sketch
- get_upper_bound
Overloaded function.
get_upper_bound(self, item: int) -> float
Returns an upper bound on the estimate for the given 64-bit integer value
get_upper_bound(self, item: str) -> float
Returns an upper bound on the estimate for the provided string
- is_empty
Returns True if the sketch has seen no items, otherwise False
- merge
Merges the provided other sketch into this one
- property num_buckets
The configured number of buckets for the sketch
- property num_hashes
The configured number of hashes for the sketch
- property seed
The base hash seed for the sketch
- serialize
Serializes the sketch into a bytes object
- to_string
Produces a string summary of the sketch
- property total_weight
The total weight currently inserted into the stream
- update
Overloaded function.
update(self, item: int, weight: float = 1.0) -> None
Updates the sketch with the given 64-bit integer value
update(self, item: str, weight: float = 1.0) -> None
Updates the sketch with the given string