Compressed Probabilistic Counting (CPC)

High performance C++ implementation of Compressed Probabilistic Counting (CPC) Sketch. This is a unique-counting sketch that implements the Compressed Probabilistic Counting (CPC, a.k.a FM85) algorithms developed by Kevin Lang in his paper Back to the Future: an Even More Nearly Optimal Cardinality Estimation Algorithm. This sketch is extremely space-efficient when serialized. In an apples-to-apples empirical comparison against compressed HyperLogLog sketches, this new algorithm simultaneously wins on the two dimensions of the space/accuracy tradeoff and produces sketches that are smaller than the entropy of HLL, so no possible implementation of compressed HLL can match its space efficiency for a given accuracy. As described in the paper this sketch implements a newly developed ICON estimator algorithm that survives unioning operations, another well-known estimator, the Historical Inverse Probability (HIP) estimator does not. The update speed performance of this sketch is quite fast and is comparable to the speed of HLL. The unioning (merging) capability of this sketch also allows for merging of sketches with different configurations of K. For additional security this sketch can be configured with a user-specified hash seed.

class cpc_sketch

Static Methods:

deserialize(bytes: bytes) _datasketches.cpc_sketch

Reads a bytes object and returns the corresponding cpc_sketch

Non-static Methods:

__init__(self, lg_k: int = 11, seed: int = 9001) None

Creates a new CPC sketch

  • lg_k (int, optional) – base 2 logarithm of the number of bins in the sketch

  • seed (int, optional) – seed value for the hash function


Estimate of the distinct count of the input stream


Returns an approximate lower bound on the estimate for kappa values in {1, 2, 3}, roughly corresponding to standard deviations


Returns an approximate upper bound on the estimate for kappa values in {1, 2, 3}, roughly corresponding to standard deviations


Returns True if the sketch is empty, otherwise False

property lg_k

Configured lg_k of this sketch


Serializes the sketch into a bytes object


Produces a string summary of the sketch


Overloaded function.

  1. update(self, datum: int) -> None

Updates the sketch with the given 64-bit integer value

  1. update(self, datum: float) -> None

Updates the sketch with the given 64-bit floating point

  1. update(self, datum: str) -> None

Updates the sketch with the given string

class cpc_union
__init__(self, lg_k: int, seed: int = 9001) None

Returns a CPC sketch with the result of the union


Updates the union with the provided CPC sketch