|
datasketches-cpp
|
This sketch samples data from a stream of items. More...
#include <var_opt_sketch.hpp>
Public Member Functions | |
| var_opt_sketch (uint32_t k, resize_factor rf=var_opt_constants::DEFAULT_RESIZE_FACTOR, const A &allocator=A()) | |
| Constructor. More... | |
| var_opt_sketch (const var_opt_sketch &other) | |
| Copy constructor. More... | |
| var_opt_sketch (var_opt_sketch &&other) noexcept | |
| Move constructor. More... | |
| var_opt_sketch & | operator= (const var_opt_sketch &other) |
| Copy assignment. More... | |
| var_opt_sketch & | operator= (var_opt_sketch &&other) |
| Move assignment. More... | |
| void | update (const T &item, double weight=1.0) |
| Updates this sketch with the given data item with the given weight. More... | |
| void | update (T &&item, double weight=1.0) |
| Updates this sketch with the given data item with the given weight. More... | |
| uint32_t | get_k () const |
| Returns the configured maximum sample size. More... | |
| uint64_t | get_n () const |
| Returns the length of the input stream. More... | |
| uint32_t | get_num_samples () const |
| Returns the number of samples currently in the sketch. More... | |
| template<typename P > | |
| subset_summary | estimate_subset_sum (P predicate) const |
| Computes an estimated subset sum from the entire stream for objects matching a given predicate. More... | |
| bool | is_empty () const |
| Returns true if the sketch is empty. More... | |
| void | reset () |
| Resets the sketch to its default, empty state. | |
| template<typename TT = T, typename SerDe = serde<T>, typename std::enable_if< std::is_arithmetic< TT >::value, int >::type = 0> | |
| size_t | get_serialized_size_bytes (const SerDe &sd=SerDe()) const |
| Computes size needed to serialize the current state of the sketch. More... | |
| template<typename TT = T, typename SerDe = serde<T>, typename std::enable_if<!std::is_arithmetic< TT >::value, int >::type = 0> | |
| size_t | get_serialized_size_bytes (const SerDe &sd=SerDe()) const |
| Computes size needed to serialize the current state of the sketch. More... | |
| template<typename SerDe = serde<T>> | |
| vector_bytes | serialize (unsigned header_size_bytes=0, const SerDe &sd=SerDe()) const |
| This method serializes the sketch as a vector of bytes. More... | |
| template<typename SerDe = serde<T>> | |
| void | serialize (std::ostream &os, const SerDe &sd=SerDe()) const |
| This method serializes the sketch into a given stream in a binary form. More... | |
| string< A > | to_string () const |
| Prints a summary of the sketch. More... | |
| string< A > | items_to_string () const |
| Prints the raw sketch items to a string. More... | |
| const_iterator | begin () const |
| Iterator pointing to the first item in the sketch. More... | |
| const_iterator | end () const |
| Iterator pointing to the past-the-end item in the sketch. More... | |
Static Public Member Functions | |
| template<typename SerDe = serde<T>> | |
| static var_opt_sketch | deserialize (std::istream &is, const SerDe &sd=SerDe(), const A &allocator=A()) |
| This method deserializes a sketch from a given stream. More... | |
| template<typename SerDe = serde<T>> | |
| static var_opt_sketch | deserialize (const void *bytes, size_t size, const SerDe &sd=SerDe(), const A &allocator=A()) |
| This method deserializes a sketch from a given array of bytes. More... | |
This sketch samples data from a stream of items.
Designed for optimal (minimum) variance when querying the sketch to estimate subset sums of items matching a provided predicate. Variance optimal (varopt) sampling is related to reservoir sampling, with improved error bounds for subset sum estimation.
author Kevin Lang author Jon Malkin
|
explicit |
Constructor.
| k | sketch size |
| rf | resize factor |
| allocator | instance of an allocator |
| var_opt_sketch | ( | const var_opt_sketch< T, A > & | other | ) |
Copy constructor.
| other | sketch to be copied |
|
noexcept |
Move constructor.
| other | sketch to be moved |
| var_opt_sketch< T, A > & operator= | ( | const var_opt_sketch< T, A > & | other | ) |
Copy assignment.
| other | sketch to be copied |
| var_opt_sketch< T, A > & operator= | ( | var_opt_sketch< T, A > && | other | ) |
Move assignment.
| other | sketch to be moved |
| void update | ( | const T & | item, |
| double | weight = 1.0 |
||
| ) |
Updates this sketch with the given data item with the given weight.
This method takes an lvalue.
| item | an item from a stream of items |
| weight | the weight of the item |
| void update | ( | T && | item, |
| double | weight = 1.0 |
||
| ) |
Updates this sketch with the given data item with the given weight.
This method takes an rvalue.
| item | an item from a stream of items |
| weight | the weight of the item |
|
inline |
Returns the configured maximum sample size.
|
inline |
Returns the length of the input stream.
|
inline |
Returns the number of samples currently in the sketch.
| subset_summary estimate_subset_sum | ( | P | predicate | ) | const |
Computes an estimated subset sum from the entire stream for objects matching a given predicate.
Provides a lower bound, estimate, and upper bound using a target of 2 standard deviations. This is technically a heuristic method and tries to err on the conservative side.
| predicate | a predicate function |
|
inline |
Returns true if the sketch is empty.
|
inline |
Computes size needed to serialize the current state of the sketch.
This version is for fixed-size arithmetic types (integral and floating point).
| sd | instance of a SerDe |
|
inline |
Computes size needed to serialize the current state of the sketch.
This version is for all other types and can be expensive since every item needs to be looked at.
| sd | instance of a SerDe |
| vector_bytes serialize | ( | unsigned | header_size_bytes = 0, |
| const SerDe & | sd = SerDe() |
||
| ) | const |
This method serializes the sketch as a vector of bytes.
An optional header can be reserved in front of the sketch. It is a blank space of a given size. This header is used in Datasketches PostgreSQL extension.
| header_size_bytes | space to reserve in front of the sketch |
| sd | instance of a SerDe |
| void serialize | ( | std::ostream & | os, |
| const SerDe & | sd = SerDe() |
||
| ) | const |
This method serializes the sketch into a given stream in a binary form.
| os | output stream |
| sd | instance of a SerDe |
|
static |
This method deserializes a sketch from a given stream.
| is | input stream |
| sd | instance of a SerDe |
| allocator | instance of an allocator |
|
static |
This method deserializes a sketch from a given array of bytes.
| bytes | pointer to the array of bytes |
| size | the size of the array |
| sd | instance of a SerDe |
| allocator | instance of an allocator |
| string< A > to_string |
Prints a summary of the sketch.
| string< A > items_to_string |
Prints the raw sketch items to a string.
Calls items_to_stream() internally. Only works for type T with a defined std::ostream& operator<<(std::ostream&, const T&) and kept separate from to_string() to allow compilation even if T does not have such an operator defined.
| var_opt_sketch< T, A >::const_iterator begin |
Iterator pointing to the first item in the sketch.
If the sketch is empty, the returned iterator must not be dereferenced or incremented.
| var_opt_sketch< T, A >::const_iterator end |
Iterator pointing to the past-the-end item in the sketch.
The past-the-end item is the hypothetical item that would follow the last item. It does not point to any item, and must not be dereferenced or incremented.