An implementation of an Exact and Bounded Sampling Proportional to Size sketch.
More...
|
| ebpps_sketch (uint32_t k, const A &allocator=A()) |
| Constructor. More...
|
|
void | update (const T &item, double weight=1.0) |
| Updates this sketch with the given data item with the given weight. More...
|
|
void | update (T &&item, double weight=1.0) |
| Updates this sketch with the given data item with the given weight. More...
|
|
void | merge (const ebpps_sketch< T, A > &sketch) |
| Merges the provided sketch into the current one. More...
|
|
void | merge (ebpps_sketch< T, A > &&sketch) |
| Merges the provided sketch into the current one. More...
|
|
result_type | get_result () const |
| Returns a copy of the current sample, as a std::vector.
|
|
uint32_t | get_k () const |
| Returns the configured maximum sample size. More...
|
|
uint64_t | get_n () const |
| Returns the number of items processed by the sketch, regardless of item weight. More...
|
|
double | get_cumulative_weight () const |
| Returns the cumulative weight of items processed by the sketch. More...
|
|
double | get_c () const |
| Returns the expected number of samples returned upon a call to get_result() or the creation of an iterator. More...
|
|
bool | is_empty () const |
| Returns true if the sketch is empty. More...
|
|
A | get_allocator () const |
| Returns an instance of the allocator for this sketch. More...
|
|
void | reset () |
| Resets the sketch to its default, empty state.
|
|
template<typename SerDe = serde<T>> |
size_t | get_serialized_size_bytes (const SerDe &sd=SerDe()) const |
| Computes size needed to serialize the current state of the sketch. More...
|
|
template<typename SerDe = serde<T>> |
vector_bytes | serialize (unsigned header_size_bytes=0, const SerDe &sd=SerDe()) const |
| This method serializes the sketch as a vector of bytes. More...
|
|
template<typename SerDe = serde<T>> |
void | serialize (std::ostream &os, const SerDe &sd=SerDe()) const |
| This method serializes the sketch into a given stream in a binary form. More...
|
|
string< A > | to_string () const |
| Prints a summary of the sketch. More...
|
|
string< A > | items_to_string () const |
| Prints the raw sketch items to a string. More...
|
|
ebpps_sample< T, A >::const_iterator | begin () const |
| Iterator pointing to the first item in the sketch. More...
|
|
ebpps_sample< T, A >::const_iterator | end () const |
| Iterator pointing to the past-the-end item in the sketch. More...
|
|
template<typename T, typename A = std::allocator<T>>
class datasketches::ebpps_sketch< T, A >
An implementation of an Exact and Bounded Sampling Proportional to Size sketch.
From: "Exact PPS Sampling with Bounded Sample Size", B. Hentschel, P. J. Haas, Y. Tian. Information Processing Letters, 2023.
This sketch samples data from a stream of items proportional to the weight of each item. The sample guarantees the presence of an item in the result is proportional to that item's portion of the total weight seen by the sketch, and returns a sample no larger than size k.
The sample may be smaller than k and the resulting size of the sample potentially includes a probabilistic component, meaning the resulting sample size is not always constant.
- Author
- Jon Malkin
Returns the expected number of samples returned upon a call to get_result() or the creation of an iterator.
The number is a floating point value, where the fractional portion represents the probability of including a "partial item" from the sample.
The value C should be no larger than the sketch's configured value of k, although numerical precision limitations mean it may exceed k by double precision floating point error margins in certain cases.
- Returns
- The expected number of samples returned when querying the sketch