datasketches-cpp
Public Member Functions | Static Public Member Functions | List of all members
ebpps_sketch< T, A > Class Template Reference

An implementation of an Exact and Bounded Sampling Proportional to Size sketch. More...

#include <ebpps_sketch.hpp>

Public Member Functions

 ebpps_sketch (uint32_t k, const A &allocator=A())
 Constructor. More...
 
void update (const T &item, double weight=1.0)
 Updates this sketch with the given data item with the given weight. More...
 
void update (T &&item, double weight=1.0)
 Updates this sketch with the given data item with the given weight. More...
 
void merge (const ebpps_sketch< T, A > &sketch)
 Merges the provided sketch into the current one. More...
 
void merge (ebpps_sketch< T, A > &&sketch)
 Merges the provided sketch into the current one. More...
 
result_type get_result () const
 Returns a copy of the current sample, as a std::vector.
 
uint32_t get_k () const
 Returns the configured maximum sample size. More...
 
uint64_t get_n () const
 Returns the number of items processed by the sketch, regardless of item weight. More...
 
double get_cumulative_weight () const
 Returns the cumulative weight of items processed by the sketch. More...
 
double get_c () const
 Returns the expected number of samples returned upon a call to get_result() or the creation of an iterator. More...
 
bool is_empty () const
 Returns true if the sketch is empty. More...
 
get_allocator () const
 Returns an instance of the allocator for this sketch. More...
 
void reset ()
 Resets the sketch to its default, empty state.
 
template<typename SerDe = serde<T>>
size_t get_serialized_size_bytes (const SerDe &sd=SerDe()) const
 Computes size needed to serialize the current state of the sketch. More...
 
template<typename SerDe = serde<T>>
vector_bytes serialize (unsigned header_size_bytes=0, const SerDe &sd=SerDe()) const
 This method serializes the sketch as a vector of bytes. More...
 
template<typename SerDe = serde<T>>
void serialize (std::ostream &os, const SerDe &sd=SerDe()) const
 This method serializes the sketch into a given stream in a binary form. More...
 
string< A > to_string () const
 Prints a summary of the sketch. More...
 
string< A > items_to_string () const
 Prints the raw sketch items to a string. More...
 
ebpps_sample< T, A >::const_iterator begin () const
 Iterator pointing to the first item in the sketch. More...
 
ebpps_sample< T, A >::const_iterator end () const
 Iterator pointing to the past-the-end item in the sketch. More...
 

Static Public Member Functions

template<typename SerDe = serde<T>>
static ebpps_sketch deserialize (const void *bytes, size_t size, const SerDe &sd=SerDe(), const A &allocator=A())
 This method deserializes a sketch from a given array of bytes. More...
 
template<typename SerDe = serde<T>>
static ebpps_sketch deserialize (std::istream &is, const SerDe &sd=SerDe(), const A &allocator=A())
 This method deserializes a sketch from a given stream. More...
 

Detailed Description

template<typename T, typename A = std::allocator<T>>
class datasketches::ebpps_sketch< T, A >

An implementation of an Exact and Bounded Sampling Proportional to Size sketch.

From: "Exact PPS Sampling with Bounded Sample Size", B. Hentschel, P. J. Haas, Y. Tian. Information Processing Letters, 2023.

This sketch samples data from a stream of items proportional to the weight of each item. The sample guarantees the presence of an item in the result is proportional to that item's portion of the total weight seen by the sketch, and returns a sample no larger than size k.

The sample may be smaller than k and the resulting size of the sample potentially includes a probabilistic component, meaning the resulting sample size is not always constant.

Author
Jon Malkin

Constructor & Destructor Documentation

◆ ebpps_sketch()

ebpps_sketch ( uint32_t  k,
const A &  allocator = A() 
)
explicit

Constructor.

Parameters
ksketch size
allocatorinstance of an allocator

Member Function Documentation

◆ update() [1/2]

void update ( const T &  item,
double  weight = 1.0 
)

Updates this sketch with the given data item with the given weight.

This method takes an lvalue.

Parameters
iteman item from a stream of items
weightthe weight of the item

◆ update() [2/2]

void update ( T &&  item,
double  weight = 1.0 
)

Updates this sketch with the given data item with the given weight.

This method takes an rvalue.

Parameters
iteman item from a stream of items
weightthe weight of the item

◆ merge() [1/2]

void merge ( const ebpps_sketch< T, A > &  sketch)

Merges the provided sketch into the current one.

This method takes an lvalue.

Parameters
sketchthe sketch to merge into the current object

◆ merge() [2/2]

void merge ( ebpps_sketch< T, A > &&  sketch)

Merges the provided sketch into the current one.

This method takes an rvalue.

Parameters
sketchthe sketch to merge into the current object

◆ get_k()

uint32_t get_k
inline

Returns the configured maximum sample size.

Returns
configured maximum sample size

◆ get_n()

uint64_t get_n
inline

Returns the number of items processed by the sketch, regardless of item weight.

Returns
count of items processed by the sketch

◆ get_cumulative_weight()

double get_cumulative_weight
inline

Returns the cumulative weight of items processed by the sketch.

Returns
cumulative weight of items seen

◆ get_c()

double get_c
inline

Returns the expected number of samples returned upon a call to get_result() or the creation of an iterator.

The number is a floating point value, where the fractional portion represents the probability of including a "partial item" from the sample.

The value C should be no larger than the sketch's configured value of k, although numerical precision limitations mean it may exceed k by double precision floating point error margins in certain cases.

Returns
The expected number of samples returned when querying the sketch

◆ is_empty()

bool is_empty
inline

Returns true if the sketch is empty.

Returns
empty flag

◆ get_allocator()

A get_allocator

Returns an instance of the allocator for this sketch.

Returns
allocator

◆ get_serialized_size_bytes()

size_t get_serialized_size_bytes ( const SerDe &  sd = SerDe()) const
inline

Computes size needed to serialize the current state of the sketch.

Parameters
sdinstance of a SerDe
Returns
size in bytes needed to serialize this sketch

◆ serialize() [1/2]

vector_bytes serialize ( unsigned  header_size_bytes = 0,
const SerDe &  sd = SerDe() 
) const

This method serializes the sketch as a vector of bytes.

An optional header can be reserved in front of the sketch. It is a blank space of a given size. This header is used in Datasketches PostgreSQL extension.

Parameters
header_size_bytesspace to reserve in front of the sketch
sdinstance of a SerDe

◆ serialize() [2/2]

void serialize ( std::ostream &  os,
const SerDe &  sd = SerDe() 
) const

This method serializes the sketch into a given stream in a binary form.

Parameters
osoutput stream
sdinstance of a SerDe

◆ deserialize() [1/2]

static ebpps_sketch deserialize ( const void *  bytes,
size_t  size,
const SerDe &  sd = SerDe(),
const A &  allocator = A() 
)
static

This method deserializes a sketch from a given array of bytes.

Parameters
bytespointer to the array of bytes
sizethe size of the array
sdinstance of a SerDe
allocatorinstance of an allocator
Returns
an instance of a sketch

◆ deserialize() [2/2]

static ebpps_sketch deserialize ( std::istream &  is,
const SerDe &  sd = SerDe(),
const A &  allocator = A() 
)
static

This method deserializes a sketch from a given stream.

Parameters
isinput stream
sdinstance of a SerDe
allocatorinstance of an allocator
Returns
an instance of a sketch

◆ to_string()

string< A > to_string

Prints a summary of the sketch.

Returns
the summary as a string

◆ items_to_string()

string< A > items_to_string

Prints the raw sketch items to a string.

Only works for type T with a defined std::ostream& operator<<(std::ostream&, const T&) and is kept separate from to_string() to allow compilation even if T does not have such an operator defined.

Returns
a string with the sketch items

◆ begin()

ebpps_sample< T, A >::const_iterator begin

Iterator pointing to the first item in the sketch.

If the sketch is empty, the returned iterator must not be dereferenced or incremented.

Returns
iterator pointing to the first item in the sketch

◆ end()

ebpps_sample< T, A >::const_iterator end

Iterator pointing to the past-the-end item in the sketch.

The past-the-end item is the hypothetical item that would follow the last item. It does not point to any item, and must not be dereferenced or incremented.

Returns
iterator pointing to the past-the-end item in the sketch

The documentation for this class was generated from the following files: