datasketches-cpp
Public Member Functions | Static Public Member Functions | List of all members
cpc_sketch_alloc< A > Class Template Reference

High performance C++ implementation of Compressed Probabilistic Counting (CPC) Sketch. More...

#include <cpc_sketch.hpp>

Public Member Functions

 cpc_sketch_alloc (uint8_t lg_k=cpc_constants::DEFAULT_LG_K, uint64_t seed=DEFAULT_SEED, const A &allocator=A())
 Creates an instance of the sketch given the lg_k parameter and hash seed. More...
 
get_allocator () const
 
uint8_t get_lg_k () const
 
bool is_empty () const
 
double get_estimate () const
 
double get_lower_bound (unsigned kappa) const
 Returns the approximate lower error bound given a parameter kappa (1, 2 or 3). More...
 
double get_upper_bound (unsigned kappa) const
 Returns the approximate upper error bound given a parameter kappa (1, 2 or 3). More...
 
void update (const std::string &value)
 Update this sketch with a given string. More...
 
void update (uint64_t value)
 Update this sketch with a given unsigned 64-bit integer. More...
 
void update (int64_t value)
 Update this sketch with a given signed 64-bit integer. More...
 
void update (uint32_t value)
 Update this sketch with a given unsigned 32-bit integer. More...
 
void update (int32_t value)
 Update this sketch with a given signed 32-bit integer. More...
 
void update (uint16_t value)
 Update this sketch with a given unsigned 16-bit integer. More...
 
void update (int16_t value)
 Update this sketch with a given signed 16-bit integer. More...
 
void update (uint8_t value)
 Update this sketch with a given unsigned 8-bit integer. More...
 
void update (int8_t value)
 Update this sketch with a given signed 8-bit integer. More...
 
void update (double value)
 Update this sketch with a given double-precision floating point value. More...
 
void update (float value)
 Update this sketch with a given floating point value. More...
 
void update (const void *value, size_t size)
 Update this sketch with given data of any type. More...
 
string< A > to_string () const
 Returns a human-readable summary of this sketch. More...
 
void serialize (std::ostream &os) const
 This method serializes the sketch into a given stream in a binary form. More...
 
vector_bytes serialize (unsigned header_size_bytes=0) const
 This method serializes the sketch as a vector of bytes. More...
 

Static Public Member Functions

static cpc_sketch_alloc< A > deserialize (std::istream &is, uint64_t seed=DEFAULT_SEED, const A &allocator=A())
 This method deserializes a sketch from a given stream. More...
 
static cpc_sketch_alloc< A > deserialize (const void *bytes, size_t size, uint64_t seed=DEFAULT_SEED, const A &allocator=A())
 This method deserializes a sketch from a given array of bytes. More...
 
static size_t get_max_serialized_size_bytes (uint8_t lg_k)
 The actual size of a compressed CPC sketch has a small random variance, but the following empirically measured size should be large enough for at least 99.9 percent of sketches. More...
 

Detailed Description

template<typename A>
class datasketches::cpc_sketch_alloc< A >

High performance C++ implementation of Compressed Probabilistic Counting (CPC) Sketch.

This is a very compact (in serialized form) distinct counting sketch. The theory is described in the following paper: https://arxiv.org/abs/1708.06839

Author
Kevin Lang
Alexander Saydakov

Constructor & Destructor Documentation

◆ cpc_sketch_alloc()

cpc_sketch_alloc ( uint8_t  lg_k = cpc_constants::DEFAULT_LG_K,
uint64_t  seed = DEFAULT_SEED,
const A &  allocator = A() 
)
explicit

Creates an instance of the sketch given the lg_k parameter and hash seed.

Parameters
lg_kbase 2 logarithm of the number of bins in the sketch
seedfor hash function
allocatorinstance of an allocator

Member Function Documentation

◆ get_allocator()

A get_allocator
Returns
allocator

◆ get_lg_k()

uint8_t get_lg_k
Returns
configured lg_k of this sketch

◆ is_empty()

bool is_empty
Returns
true if this sketch represents an empty set

◆ get_estimate()

double get_estimate
Returns
estimate of the distinct count of the input stream

◆ get_lower_bound()

double get_lower_bound ( unsigned  kappa) const

Returns the approximate lower error bound given a parameter kappa (1, 2 or 3).

This parameter is similar to the number of standard deviations of the normal distribution and corresponds to approximately 67%, 95% and 99% confidence intervals.

Parameters
kappaparameter to specify confidence interval (1, 2 or 3)
Returns
the lower bound

◆ get_upper_bound()

double get_upper_bound ( unsigned  kappa) const

Returns the approximate upper error bound given a parameter kappa (1, 2 or 3).

This parameter is similar to the number of standard deviations of the normal distribution and corresponds to approximately 67%, 95% and 99% confidence intervals.

Parameters
kappaparameter to specify confidence interval (1, 2 or 3)
Returns
the upper bound

◆ update() [1/12]

void update ( const std::string &  value)

Update this sketch with a given string.

Parameters
valuestring to update the sketch with

◆ update() [2/12]

void update ( uint64_t  value)

Update this sketch with a given unsigned 64-bit integer.

Parameters
valueuint64_t to update the sketch with

◆ update() [3/12]

void update ( int64_t  value)

Update this sketch with a given signed 64-bit integer.

Parameters
valueint64_t to update the sketch with

◆ update() [4/12]

void update ( uint32_t  value)

Update this sketch with a given unsigned 32-bit integer.

For compatibility with Java implementation.

Parameters
valueuint32_t to update the sketch with

◆ update() [5/12]

void update ( int32_t  value)

Update this sketch with a given signed 32-bit integer.

For compatibility with Java implementation.

Parameters
valueint32_t to update the sketch with

◆ update() [6/12]

void update ( uint16_t  value)

Update this sketch with a given unsigned 16-bit integer.

For compatibility with Java implementation.

Parameters
valueuint16_t to update the sketch with

◆ update() [7/12]

void update ( int16_t  value)

Update this sketch with a given signed 16-bit integer.

For compatibility with Java implementation.

Parameters
valueint16_t to update the sketch with

◆ update() [8/12]

void update ( uint8_t  value)

Update this sketch with a given unsigned 8-bit integer.

For compatibility with Java implementation.

Parameters
valueuint8_t to update the sketch with

◆ update() [9/12]

void update ( int8_t  value)

Update this sketch with a given signed 8-bit integer.

For compatibility with Java implementation.

Parameters
valueint8_t to update the sketch with

◆ update() [10/12]

void update ( double  value)

Update this sketch with a given double-precision floating point value.

For compatibility with Java implementation.

Parameters
valuedouble to update the sketch with

◆ update() [11/12]

void update ( float  value)

Update this sketch with a given floating point value.

For compatibility with Java implementation.

Parameters
valuefloat to update the sketch with

◆ update() [12/12]

void update ( const void *  value,
size_t  size 
)

Update this sketch with given data of any type.

This is a "universal" update that covers all cases above, but may produce different hashes. Be very careful to hash input values consistently using the same approach both over time and on different platforms and while passing sketches between C++ environment and Java environment. Otherwise two sketches that should represent overlapping sets will be disjoint For instance, for signed 32-bit values call update(int32_t) method above, which does widening conversion to int64_t, if compatibility with Java is expected

Parameters
valuepointer to the data
sizeof the data in bytes

◆ to_string()

string< A > to_string

Returns a human-readable summary of this sketch.

Returns
a human-readable summary of this sketch

◆ serialize() [1/2]

void serialize ( std::ostream &  os) const

This method serializes the sketch into a given stream in a binary form.

Parameters
osoutput stream

◆ serialize() [2/2]

auto serialize ( unsigned  header_size_bytes = 0) const

This method serializes the sketch as a vector of bytes.

An optional header can be reserved in front of the sketch. It is an uninitialized space of a given size. This header is used in Datasketches PostgreSQL extension.

Parameters
header_size_bytesspace to reserve in front of the sketch
Returns
serialized sketch as a vector of bytes

◆ deserialize() [1/2]

cpc_sketch_alloc< A > deserialize ( std::istream &  is,
uint64_t  seed = DEFAULT_SEED,
const A &  allocator = A() 
)
static

This method deserializes a sketch from a given stream.

Parameters
isinput stream
seedthe seed for the hash function that was used to create the sketch
allocatorinstance of an Allocator
Returns
an instance of a sketch

◆ deserialize() [2/2]

cpc_sketch_alloc< A > deserialize ( const void *  bytes,
size_t  size,
uint64_t  seed = DEFAULT_SEED,
const A &  allocator = A() 
)
static

This method deserializes a sketch from a given array of bytes.

Parameters
bytespointer to the array of bytes
sizethe size of the array
seedthe seed for the hash function that was used to create the sketch
allocatorinstance of an Allocator
Returns
an instance of the sketch

◆ get_max_serialized_size_bytes()

size_t get_max_serialized_size_bytes ( uint8_t  lg_k)
static

The actual size of a compressed CPC sketch has a small random variance, but the following empirically measured size should be large enough for at least 99.9 percent of sketches.

For small values of n the size can be much smaller.

Parameters
lg_kthe given value of lg_k.
Returns
the estimated maximum compressed serialized size of a sketch.

The documentation for this class was generated from the following files: