High performance C++ implementation of Compressed Probabilistic Counting (CPC) Sketch.
More...
#include <cpc_sketch.hpp>
|
| cpc_sketch_alloc (uint8_t lg_k=cpc_constants::DEFAULT_LG_K, uint64_t seed=DEFAULT_SEED, const A &allocator=A()) |
| Creates an instance of the sketch given the lg_k parameter and hash seed. More...
|
|
A | get_allocator () const |
|
uint8_t | get_lg_k () const |
|
bool | is_empty () const |
|
double | get_estimate () const |
|
double | get_lower_bound (unsigned kappa) const |
| Returns the approximate lower error bound given a parameter kappa (1, 2 or 3). More...
|
|
double | get_upper_bound (unsigned kappa) const |
| Returns the approximate upper error bound given a parameter kappa (1, 2 or 3). More...
|
|
void | update (const std::string &value) |
| Update this sketch with a given string. More...
|
|
void | update (uint64_t value) |
| Update this sketch with a given unsigned 64-bit integer. More...
|
|
void | update (int64_t value) |
| Update this sketch with a given signed 64-bit integer. More...
|
|
void | update (uint32_t value) |
| Update this sketch with a given unsigned 32-bit integer. More...
|
|
void | update (int32_t value) |
| Update this sketch with a given signed 32-bit integer. More...
|
|
void | update (uint16_t value) |
| Update this sketch with a given unsigned 16-bit integer. More...
|
|
void | update (int16_t value) |
| Update this sketch with a given signed 16-bit integer. More...
|
|
void | update (uint8_t value) |
| Update this sketch with a given unsigned 8-bit integer. More...
|
|
void | update (int8_t value) |
| Update this sketch with a given signed 8-bit integer. More...
|
|
void | update (double value) |
| Update this sketch with a given double-precision floating point value. More...
|
|
void | update (float value) |
| Update this sketch with a given floating point value. More...
|
|
void | update (const void *value, size_t size) |
| Update this sketch with given data of any type. More...
|
|
string< A > | to_string () const |
| Returns a human-readable summary of this sketch. More...
|
|
void | serialize (std::ostream &os) const |
| This method serializes the sketch into a given stream in a binary form. More...
|
|
vector_bytes | serialize (unsigned header_size_bytes=0) const |
| This method serializes the sketch as a vector of bytes. More...
|
|
|
static cpc_sketch_alloc< A > | deserialize (std::istream &is, uint64_t seed=DEFAULT_SEED, const A &allocator=A()) |
| This method deserializes a sketch from a given stream. More...
|
|
static cpc_sketch_alloc< A > | deserialize (const void *bytes, size_t size, uint64_t seed=DEFAULT_SEED, const A &allocator=A()) |
| This method deserializes a sketch from a given array of bytes. More...
|
|
static size_t | get_max_serialized_size_bytes (uint8_t lg_k) |
| The actual size of a compressed CPC sketch has a small random variance, but the following empirically measured size should be large enough for at least 99.9 percent of sketches. More...
|
|
template<typename A>
class datasketches::cpc_sketch_alloc< A >
High performance C++ implementation of Compressed Probabilistic Counting (CPC) Sketch.
This is a very compact (in serialized form) distinct counting sketch. The theory is described in the following paper: https://arxiv.org/abs/1708.06839
- Author
- Kevin Lang
-
Alexander Saydakov
◆ cpc_sketch_alloc()
Creates an instance of the sketch given the lg_k parameter and hash seed.
- Parameters
-
lg_k | base 2 logarithm of the number of bins in the sketch |
seed | for hash function |
allocator | instance of an allocator |
◆ get_allocator()
◆ get_lg_k()
- Returns
- configured lg_k of this sketch
◆ is_empty()
- Returns
- true if this sketch represents an empty set
◆ get_estimate()
- Returns
- estimate of the distinct count of the input stream
◆ get_lower_bound()
double get_lower_bound |
( |
unsigned |
kappa | ) |
const |
Returns the approximate lower error bound given a parameter kappa (1, 2 or 3).
This parameter is similar to the number of standard deviations of the normal distribution and corresponds to approximately 67%, 95% and 99% confidence intervals.
- Parameters
-
kappa | parameter to specify confidence interval (1, 2 or 3) |
- Returns
- the lower bound
◆ get_upper_bound()
double get_upper_bound |
( |
unsigned |
kappa | ) |
const |
Returns the approximate upper error bound given a parameter kappa (1, 2 or 3).
This parameter is similar to the number of standard deviations of the normal distribution and corresponds to approximately 67%, 95% and 99% confidence intervals.
- Parameters
-
kappa | parameter to specify confidence interval (1, 2 or 3) |
- Returns
- the upper bound
◆ update() [1/12]
void update |
( |
const std::string & |
value | ) |
|
Update this sketch with a given string.
- Parameters
-
value | string to update the sketch with |
◆ update() [2/12]
void update |
( |
uint64_t |
value | ) |
|
Update this sketch with a given unsigned 64-bit integer.
- Parameters
-
value | uint64_t to update the sketch with |
◆ update() [3/12]
void update |
( |
int64_t |
value | ) |
|
Update this sketch with a given signed 64-bit integer.
- Parameters
-
value | int64_t to update the sketch with |
◆ update() [4/12]
void update |
( |
uint32_t |
value | ) |
|
Update this sketch with a given unsigned 32-bit integer.
For compatibility with Java implementation.
- Parameters
-
value | uint32_t to update the sketch with |
◆ update() [5/12]
void update |
( |
int32_t |
value | ) |
|
Update this sketch with a given signed 32-bit integer.
For compatibility with Java implementation.
- Parameters
-
value | int32_t to update the sketch with |
◆ update() [6/12]
void update |
( |
uint16_t |
value | ) |
|
Update this sketch with a given unsigned 16-bit integer.
For compatibility with Java implementation.
- Parameters
-
value | uint16_t to update the sketch with |
◆ update() [7/12]
void update |
( |
int16_t |
value | ) |
|
Update this sketch with a given signed 16-bit integer.
For compatibility with Java implementation.
- Parameters
-
value | int16_t to update the sketch with |
◆ update() [8/12]
void update |
( |
uint8_t |
value | ) |
|
Update this sketch with a given unsigned 8-bit integer.
For compatibility with Java implementation.
- Parameters
-
value | uint8_t to update the sketch with |
◆ update() [9/12]
void update |
( |
int8_t |
value | ) |
|
Update this sketch with a given signed 8-bit integer.
For compatibility with Java implementation.
- Parameters
-
value | int8_t to update the sketch with |
◆ update() [10/12]
void update |
( |
double |
value | ) |
|
Update this sketch with a given double-precision floating point value.
For compatibility with Java implementation.
- Parameters
-
value | double to update the sketch with |
◆ update() [11/12]
void update |
( |
float |
value | ) |
|
Update this sketch with a given floating point value.
For compatibility with Java implementation.
- Parameters
-
value | float to update the sketch with |
◆ update() [12/12]
void update |
( |
const void * |
value, |
|
|
size_t |
size |
|
) |
| |
Update this sketch with given data of any type.
This is a "universal" update that covers all cases above, but may produce different hashes. Be very careful to hash input values consistently using the same approach both over time and on different platforms and while passing sketches between C++ environment and Java environment. Otherwise two sketches that should represent overlapping sets will be disjoint For instance, for signed 32-bit values call update(int32_t) method above, which does widening conversion to int64_t, if compatibility with Java is expected
- Parameters
-
value | pointer to the data |
size | of the data in bytes |
◆ to_string()
Returns a human-readable summary of this sketch.
- Returns
- a human-readable summary of this sketch
◆ serialize() [1/2]
void serialize |
( |
std::ostream & |
os | ) |
const |
This method serializes the sketch into a given stream in a binary form.
- Parameters
-
◆ serialize() [2/2]
auto serialize |
( |
unsigned |
header_size_bytes = 0 | ) |
const |
This method serializes the sketch as a vector of bytes.
An optional header can be reserved in front of the sketch. It is an uninitialized space of a given size. This header is used in Datasketches PostgreSQL extension.
- Parameters
-
header_size_bytes | space to reserve in front of the sketch |
- Returns
- serialized sketch as a vector of bytes
◆ deserialize() [1/2]
cpc_sketch_alloc< A > deserialize |
( |
std::istream & |
is, |
|
|
uint64_t |
seed = DEFAULT_SEED , |
|
|
const A & |
allocator = A() |
|
) |
| |
|
static |
This method deserializes a sketch from a given stream.
- Parameters
-
is | input stream |
seed | the seed for the hash function that was used to create the sketch |
allocator | instance of an Allocator |
- Returns
- an instance of a sketch
◆ deserialize() [2/2]
cpc_sketch_alloc< A > deserialize |
( |
const void * |
bytes, |
|
|
size_t |
size, |
|
|
uint64_t |
seed = DEFAULT_SEED , |
|
|
const A & |
allocator = A() |
|
) |
| |
|
static |
This method deserializes a sketch from a given array of bytes.
- Parameters
-
bytes | pointer to the array of bytes |
size | the size of the array |
seed | the seed for the hash function that was used to create the sketch |
allocator | instance of an Allocator |
- Returns
- an instance of the sketch
◆ get_max_serialized_size_bytes()
size_t get_max_serialized_size_bytes |
( |
uint8_t |
lg_k | ) |
|
|
static |
The actual size of a compressed CPC sketch has a small random variance, but the following empirically measured size should be large enough for at least 99.9 percent of sketches.
For small values of n the size can be much smaller.
- Parameters
-
lg_k | the given value of lg_k. |
- Returns
- the estimated maximum compressed serialized size of a sketch.
The documentation for this class was generated from the following files: