datasketches-cpp
|
This sketch samples data from a stream of items. More...
#include <var_opt_sketch.hpp>
Public Member Functions | |
var_opt_sketch (uint32_t k, resize_factor rf=var_opt_constants::DEFAULT_RESIZE_FACTOR, const A &allocator=A()) | |
Constructor. More... | |
var_opt_sketch (const var_opt_sketch &other) | |
Copy constructor. More... | |
var_opt_sketch (var_opt_sketch &&other) noexcept | |
Move constructor. More... | |
var_opt_sketch & | operator= (const var_opt_sketch &other) |
Copy assignment. More... | |
var_opt_sketch & | operator= (var_opt_sketch &&other) |
Move assignment. More... | |
void | update (const T &item, double weight=1.0) |
Updates this sketch with the given data item with the given weight. More... | |
void | update (T &&item, double weight=1.0) |
Updates this sketch with the given data item with the given weight. More... | |
uint32_t | get_k () const |
Returns the configured maximum sample size. More... | |
uint64_t | get_n () const |
Returns the length of the input stream. More... | |
uint32_t | get_num_samples () const |
Returns the number of samples currently in the sketch. More... | |
template<typename P > | |
subset_summary | estimate_subset_sum (P predicate) const |
Computes an estimated subset sum from the entire stream for objects matching a given predicate. More... | |
bool | is_empty () const |
Returns true if the sketch is empty. More... | |
void | reset () |
Resets the sketch to its default, empty state. | |
template<typename TT = T, typename SerDe = serde<T>, typename std::enable_if< std::is_arithmetic< TT >::value, int >::type = 0> | |
size_t | get_serialized_size_bytes (const SerDe &sd=SerDe()) const |
Computes size needed to serialize the current state of the sketch. More... | |
template<typename TT = T, typename SerDe = serde<T>, typename std::enable_if<!std::is_arithmetic< TT >::value, int >::type = 0> | |
size_t | get_serialized_size_bytes (const SerDe &sd=SerDe()) const |
Computes size needed to serialize the current state of the sketch. More... | |
template<typename SerDe = serde<T>> | |
vector_bytes | serialize (unsigned header_size_bytes=0, const SerDe &sd=SerDe()) const |
This method serializes the sketch as a vector of bytes. More... | |
template<typename SerDe = serde<T>> | |
void | serialize (std::ostream &os, const SerDe &sd=SerDe()) const |
This method serializes the sketch into a given stream in a binary form. More... | |
string< A > | to_string () const |
Prints a summary of the sketch. More... | |
string< A > | items_to_string () const |
Prints the raw sketch items to a string. More... | |
const_iterator | begin () const |
Iterator pointing to the first item in the sketch. More... | |
const_iterator | end () const |
Iterator pointing to the past-the-end item in the sketch. More... | |
Static Public Member Functions | |
template<typename SerDe = serde<T>> | |
static var_opt_sketch | deserialize (std::istream &is, const SerDe &sd=SerDe(), const A &allocator=A()) |
This method deserializes a sketch from a given stream. More... | |
template<typename SerDe = serde<T>> | |
static var_opt_sketch | deserialize (const void *bytes, size_t size, const SerDe &sd=SerDe(), const A &allocator=A()) |
This method deserializes a sketch from a given array of bytes. More... | |
This sketch samples data from a stream of items.
Designed for optimal (minimum) variance when querying the sketch to estimate subset sums of items matching a provided predicate. Variance optimal (varopt) sampling is related to reservoir sampling, with improved error bounds for subset sum estimation.
author Kevin Lang author Jon Malkin
|
explicit |
Constructor.
k | sketch size |
rf | resize factor |
allocator | instance of an allocator |
var_opt_sketch | ( | const var_opt_sketch< T, A > & | other | ) |
Copy constructor.
other | sketch to be copied |
|
noexcept |
Move constructor.
other | sketch to be moved |
var_opt_sketch< T, A > & operator= | ( | const var_opt_sketch< T, A > & | other | ) |
Copy assignment.
other | sketch to be copied |
var_opt_sketch< T, A > & operator= | ( | var_opt_sketch< T, A > && | other | ) |
Move assignment.
other | sketch to be moved |
void update | ( | const T & | item, |
double | weight = 1.0 |
||
) |
Updates this sketch with the given data item with the given weight.
This method takes an lvalue.
item | an item from a stream of items |
weight | the weight of the item |
void update | ( | T && | item, |
double | weight = 1.0 |
||
) |
Updates this sketch with the given data item with the given weight.
This method takes an rvalue.
item | an item from a stream of items |
weight | the weight of the item |
|
inline |
Returns the configured maximum sample size.
|
inline |
Returns the length of the input stream.
|
inline |
Returns the number of samples currently in the sketch.
subset_summary estimate_subset_sum | ( | P | predicate | ) | const |
Computes an estimated subset sum from the entire stream for objects matching a given predicate.
Provides a lower bound, estimate, and upper bound using a target of 2 standard deviations. This is technically a heuristic method and tries to err on the conservative side.
predicate | a predicate function |
|
inline |
Returns true if the sketch is empty.
|
inline |
Computes size needed to serialize the current state of the sketch.
This version is for fixed-size arithmetic types (integral and floating point).
sd | instance of a SerDe |
|
inline |
Computes size needed to serialize the current state of the sketch.
This version is for all other types and can be expensive since every item needs to be looked at.
sd | instance of a SerDe |
vector_bytes serialize | ( | unsigned | header_size_bytes = 0 , |
const SerDe & | sd = SerDe() |
||
) | const |
This method serializes the sketch as a vector of bytes.
An optional header can be reserved in front of the sketch. It is a blank space of a given size. This header is used in Datasketches PostgreSQL extension.
header_size_bytes | space to reserve in front of the sketch |
sd | instance of a SerDe |
void serialize | ( | std::ostream & | os, |
const SerDe & | sd = SerDe() |
||
) | const |
This method serializes the sketch into a given stream in a binary form.
os | output stream |
sd | instance of a SerDe |
|
static |
This method deserializes a sketch from a given stream.
is | input stream |
sd | instance of a SerDe |
allocator | instance of an allocator |
|
static |
This method deserializes a sketch from a given array of bytes.
bytes | pointer to the array of bytes |
size | the size of the array |
sd | instance of a SerDe |
allocator | instance of an allocator |
string< A > to_string |
Prints a summary of the sketch.
string< A > items_to_string |
Prints the raw sketch items to a string.
Calls items_to_stream() internally. Only works for type T with a defined std::ostream& operator<<(std::ostream&, const T&) and kept separate from to_string() to allow compilation even if T does not have such an operator defined.
var_opt_sketch< T, A >::const_iterator begin |
Iterator pointing to the first item in the sketch.
If the sketch is empty, the returned iterator must not be dereferenced or incremented.
var_opt_sketch< T, A >::const_iterator end |
Iterator pointing to the past-the-end item in the sketch.
The past-the-end item is the hypothetical item that would follow the last item. It does not point to any item, and must not be dereferenced or incremented.