datasketches-cpp
|
An implementation of an Exact and Bounded Sampling Proportional to Size sketch. More...
#include <ebpps_sketch.hpp>
Public Member Functions | |
ebpps_sketch (uint32_t k, const A &allocator=A()) | |
Constructor. | |
void | update (const T &item, double weight=1.0) |
Updates this sketch with the given data item with the given weight. | |
void | update (T &&item, double weight=1.0) |
Updates this sketch with the given data item with the given weight. | |
void | merge (const ebpps_sketch< T, A > &sketch) |
Merges the provided sketch into the current one. | |
void | merge (ebpps_sketch< T, A > &&sketch) |
Merges the provided sketch into the current one. | |
result_type | get_result () const |
Returns a copy of the current sample, as a std::vector. | |
uint32_t | get_k () const |
Returns the configured maximum sample size. | |
uint64_t | get_n () const |
Returns the number of items processed by the sketch, regardless of item weight. | |
double | get_cumulative_weight () const |
Returns the cumulative weight of items processed by the sketch. | |
double | get_c () const |
Returns the expected number of samples returned upon a call to get_result() or the creation of an iterator. | |
bool | is_empty () const |
Returns true if the sketch is empty. | |
A | get_allocator () const |
Returns an instance of the allocator for this sketch. | |
void | reset () |
Resets the sketch to its default, empty state. | |
template<typename SerDe = serde<T>> | |
size_t | get_serialized_size_bytes (const SerDe &sd=SerDe()) const |
Computes size needed to serialize the current state of the sketch. | |
template<typename SerDe = serde<T>> | |
vector_bytes | serialize (unsigned header_size_bytes=0, const SerDe &sd=SerDe()) const |
This method serializes the sketch as a vector of bytes. | |
template<typename SerDe = serde<T>> | |
void | serialize (std::ostream &os, const SerDe &sd=SerDe()) const |
This method serializes the sketch into a given stream in a binary form. | |
string< A > | to_string () const |
Prints a summary of the sketch. | |
string< A > | items_to_string () const |
Prints the raw sketch items to a string. | |
ebpps_sample< T, A >::const_iterator | begin () const |
Iterator pointing to the first item in the sketch. | |
ebpps_sample< T, A >::const_iterator | end () const |
Iterator pointing to the past-the-end item in the sketch. | |
Static Public Member Functions | |
template<typename SerDe = serde<T>> | |
static ebpps_sketch | deserialize (const void *bytes, size_t size, const SerDe &sd=SerDe(), const A &allocator=A()) |
This method deserializes a sketch from a given array of bytes. | |
template<typename SerDe = serde<T>> | |
static ebpps_sketch | deserialize (std::istream &is, const SerDe &sd=SerDe(), const A &allocator=A()) |
This method deserializes a sketch from a given stream. | |
An implementation of an Exact and Bounded Sampling Proportional to Size sketch.
From: "Exact PPS Sampling with Bounded Sample Size", B. Hentschel, P. J. Haas, Y. Tian. Information Processing Letters, 2023.
This sketch samples data from a stream of items proportional to the weight of each item. The sample guarantees the presence of an item in the result is proportional to that item's portion of the total weight seen by the sketch, and returns a sample no larger than size k.
The sample may be smaller than k and the resulting size of the sample potentially includes a probabilistic component, meaning the resulting sample size is not always constant.
|
explicit |
Constructor.
k | sketch size |
allocator | instance of an allocator |
void update | ( | const T & | item, |
double | weight = 1.0 |
||
) |
Updates this sketch with the given data item with the given weight.
This method takes an lvalue.
item | an item from a stream of items |
weight | the weight of the item |
void update | ( | T && | item, |
double | weight = 1.0 |
||
) |
Updates this sketch with the given data item with the given weight.
This method takes an rvalue.
item | an item from a stream of items |
weight | the weight of the item |
void merge | ( | const ebpps_sketch< T, A > & | sketch | ) |
Merges the provided sketch into the current one.
This method takes an lvalue.
sketch | the sketch to merge into the current object |
void merge | ( | ebpps_sketch< T, A > && | sketch | ) |
Merges the provided sketch into the current one.
This method takes an rvalue.
sketch | the sketch to merge into the current object |
|
inline |
Returns the configured maximum sample size.
|
inline |
Returns the number of items processed by the sketch, regardless of item weight.
|
inline |
Returns the cumulative weight of items processed by the sketch.
|
inline |
Returns the expected number of samples returned upon a call to get_result() or the creation of an iterator.
The number is a floating point value, where the fractional portion represents the probability of including a "partial item" from the sample.
The value C should be no larger than the sketch's configured value of k, although numerical precision limitations mean it may exceed k by double precision floating point error margins in certain cases.
|
inline |
Returns true if the sketch is empty.
A get_allocator | ( | ) | const |
Returns an instance of the allocator for this sketch.
|
inline |
Computes size needed to serialize the current state of the sketch.
sd | instance of a SerDe |
vector_bytes serialize | ( | unsigned | header_size_bytes = 0 , |
const SerDe & | sd = SerDe() |
||
) | const |
This method serializes the sketch as a vector of bytes.
An optional header can be reserved in front of the sketch. It is a blank space of a given size. This header is used in Datasketches PostgreSQL extension.
header_size_bytes | space to reserve in front of the sketch |
sd | instance of a SerDe |
void serialize | ( | std::ostream & | os, |
const SerDe & | sd = SerDe() |
||
) | const |
This method serializes the sketch into a given stream in a binary form.
os | output stream |
sd | instance of a SerDe |
|
static |
This method deserializes a sketch from a given array of bytes.
bytes | pointer to the array of bytes |
size | the size of the array |
sd | instance of a SerDe |
allocator | instance of an allocator |
|
static |
This method deserializes a sketch from a given stream.
is | input stream |
sd | instance of a SerDe |
allocator | instance of an allocator |
string< A > to_string | ( | ) | const |
Prints a summary of the sketch.
string< A > items_to_string | ( | ) | const |
Prints the raw sketch items to a string.
Only works for type T with a defined std::ostream& operator<<(std::ostream&, const T&) and is kept separate from to_string() to allow compilation even if T does not have such an operator defined.
ebpps_sample< T, A >::const_iterator begin | ( | ) | const |
Iterator pointing to the first item in the sketch.
If the sketch is empty, the returned iterator must not be dereferenced or incremented.
ebpps_sample< T, A >::const_iterator end | ( | ) | const |
Iterator pointing to the past-the-end item in the sketch.
The past-the-end item is the hypothetical item that would follow the last item. It does not point to any item, and must not be dereferenced or incremented.