Compressed Probabilistic Counting (CPC)
High performance C++ implementation of Compressed Probabilistic Counting (CPC) Sketch. This is a unique-counting sketch that implements the Compressed Probabilistic Counting (CPC, a.k.a FM85) algorithms developed by Kevin Lang in his paper Back to the Future: an Even More Nearly Optimal Cardinality Estimation Algorithm. This sketch is extremely space-efficient when serialized. In an apples-to-apples empirical comparison against compressed HyperLogLog sketches, this new algorithm simultaneously wins on the two dimensions of the space/accuracy tradeoff and produces sketches that are smaller than the entropy of HLL, so no possible implementation of compressed HLL can match its space efficiency for a given accuracy. As described in the paper this sketch implements a newly developed ICON estimator algorithm that survives unioning operations, another well-known estimator, the Historical Inverse Probability (HIP) estimator does not. The update speed performance of this sketch is quite fast and is comparable to the speed of HLL. The unioning (merging) capability of this sketch also allows for merging of sketches with different configurations of K. For additional security this sketch can be configured with a user-specified hash seed.
- class cpc_sketch(*args, **kwargs)
Static Methods:
- deserialize(bytes: bytes) _datasketches.cpc_sketch
Reads a bytes object and returns the corresponding cpc_sketch
Non-static Methods:
- __init__(self, lg_k: int = 11, seed: int = 9001) None
Creates a new CPC sketch
- Parameters:
lg_k (int, optional) – base 2 logarithm of the number of bins in the sketch
seed (int, optional) – seed value for the hash function
- get_estimate
Estimate of the distinct count of the input stream
- get_lower_bound
Returns an approximate lower bound on the estimate for kappa values in {1, 2, 3}, roughly corresponding to standard deviations
- get_upper_bound
Returns an approximate upper bound on the estimate for kappa values in {1, 2, 3}, roughly corresponding to standard deviations
- is_empty
Returns True if the sketch is empty, otherwise False
- property lg_k
Configured lg_k of this sketch
- serialize
Serializes the sketch into a bytes object
- to_string
Produces a string summary of the sketch
- update
Overloaded function.
update(self, datum: int) -> None
Updates the sketch with the given 64-bit integer value
update(self, datum: float) -> None
Updates the sketch with the given 64-bit floating point
update(self, datum: str) -> None
Updates the sketch with the given string