datasketches-cpp
Public Member Functions | Static Public Member Functions | List of all members
var_opt_sketch< T, A > Class Template Reference

This sketch samples data from a stream of items. More...

#include <var_opt_sketch.hpp>

Public Member Functions

 var_opt_sketch (uint32_t k, resize_factor rf=var_opt_constants::DEFAULT_RESIZE_FACTOR, const A &allocator=A())
 Constructor. More...
 
 var_opt_sketch (const var_opt_sketch &other)
 Copy constructor. More...
 
 var_opt_sketch (var_opt_sketch &&other) noexcept
 Move constructor. More...
 
var_opt_sketchoperator= (const var_opt_sketch &other)
 Copy assignment. More...
 
var_opt_sketchoperator= (var_opt_sketch &&other)
 Move assignment. More...
 
void update (const T &item, double weight=1.0)
 Updates this sketch with the given data item with the given weight. More...
 
void update (T &&item, double weight=1.0)
 Updates this sketch with the given data item with the given weight. More...
 
uint32_t get_k () const
 Returns the configured maximum sample size. More...
 
uint64_t get_n () const
 Returns the length of the input stream. More...
 
uint32_t get_num_samples () const
 Returns the number of samples currently in the sketch. More...
 
template<typename P >
subset_summary estimate_subset_sum (P predicate) const
 Computes an estimated subset sum from the entire stream for objects matching a given predicate. More...
 
bool is_empty () const
 Returns true if the sketch is empty. More...
 
void reset ()
 Resets the sketch to its default, empty state.
 
template<typename TT = T, typename SerDe = serde<T>, typename std::enable_if< std::is_arithmetic< TT >::value, int >::type = 0>
size_t get_serialized_size_bytes (const SerDe &sd=SerDe()) const
 Computes size needed to serialize the current state of the sketch. More...
 
template<typename TT = T, typename SerDe = serde<T>, typename std::enable_if<!std::is_arithmetic< TT >::value, int >::type = 0>
size_t get_serialized_size_bytes (const SerDe &sd=SerDe()) const
 Computes size needed to serialize the current state of the sketch. More...
 
template<typename SerDe = serde<T>>
vector_bytes serialize (unsigned header_size_bytes=0, const SerDe &sd=SerDe()) const
 This method serializes the sketch as a vector of bytes. More...
 
template<typename SerDe = serde<T>>
void serialize (std::ostream &os, const SerDe &sd=SerDe()) const
 This method serializes the sketch into a given stream in a binary form. More...
 
string< A > to_string () const
 Prints a summary of the sketch. More...
 
string< A > items_to_string () const
 Prints the raw sketch items to a string. More...
 
const_iterator begin () const
 Iterator pointing to the first item in the sketch. More...
 
const_iterator end () const
 Iterator pointing to the past-the-end item in the sketch. More...
 

Static Public Member Functions

template<typename SerDe = serde<T>>
static var_opt_sketch deserialize (std::istream &is, const SerDe &sd=SerDe(), const A &allocator=A())
 This method deserializes a sketch from a given stream. More...
 
template<typename SerDe = serde<T>>
static var_opt_sketch deserialize (const void *bytes, size_t size, const SerDe &sd=SerDe(), const A &allocator=A())
 This method deserializes a sketch from a given array of bytes. More...
 

Detailed Description

template<typename T, typename A = std::allocator<T>>
class datasketches::var_opt_sketch< T, A >

This sketch samples data from a stream of items.

Designed for optimal (minimum) variance when querying the sketch to estimate subset sums of items matching a provided predicate. Variance optimal (varopt) sampling is related to reservoir sampling, with improved error bounds for subset sum estimation.

author Kevin Lang author Jon Malkin

Constructor & Destructor Documentation

◆ var_opt_sketch() [1/3]

var_opt_sketch ( uint32_t  k,
resize_factor  rf = var_opt_constants::DEFAULT_RESIZE_FACTOR,
const A &  allocator = A() 
)
explicit

Constructor.

Parameters
ksketch size
rfresize factor
allocatorinstance of an allocator

◆ var_opt_sketch() [2/3]

var_opt_sketch ( const var_opt_sketch< T, A > &  other)

Copy constructor.

Parameters
othersketch to be copied

◆ var_opt_sketch() [3/3]

var_opt_sketch ( var_opt_sketch< T, A > &&  other)
noexcept

Move constructor.

Parameters
othersketch to be moved

Member Function Documentation

◆ operator=() [1/2]

var_opt_sketch< T, A > & operator= ( const var_opt_sketch< T, A > &  other)

Copy assignment.

Parameters
othersketch to be copied
Returns
reference to this sketch

◆ operator=() [2/2]

var_opt_sketch< T, A > & operator= ( var_opt_sketch< T, A > &&  other)

Move assignment.

Parameters
othersketch to be moved
Returns
reference to this sketch

◆ update() [1/2]

void update ( const T &  item,
double  weight = 1.0 
)

Updates this sketch with the given data item with the given weight.

This method takes an lvalue.

Parameters
iteman item from a stream of items
weightthe weight of the item

◆ update() [2/2]

void update ( T &&  item,
double  weight = 1.0 
)

Updates this sketch with the given data item with the given weight.

This method takes an rvalue.

Parameters
iteman item from a stream of items
weightthe weight of the item

◆ get_k()

uint32_t get_k
inline

Returns the configured maximum sample size.

Returns
configured maximum sample size

◆ get_n()

uint64_t get_n
inline

Returns the length of the input stream.

Returns
stream length

◆ get_num_samples()

uint32_t get_num_samples
inline

Returns the number of samples currently in the sketch.

Returns
stream length

◆ estimate_subset_sum()

subset_summary estimate_subset_sum ( predicate) const

Computes an estimated subset sum from the entire stream for objects matching a given predicate.

Provides a lower bound, estimate, and upper bound using a target of 2 standard deviations. This is technically a heuristic method and tries to err on the conservative side.

Parameters
predicatea predicate function
Returns
a subset_summary item with estimate, upper and lower bounds, and total sketch weight

◆ is_empty()

bool is_empty
inline

Returns true if the sketch is empty.

Returns
empty flag

◆ get_serialized_size_bytes() [1/2]

size_t get_serialized_size_bytes ( const SerDe &  sd = SerDe()) const
inline

Computes size needed to serialize the current state of the sketch.

This version is for fixed-size arithmetic types (integral and floating point).

Parameters
sdinstance of a SerDe
Returns
size in bytes needed to serialize this sketch

◆ get_serialized_size_bytes() [2/2]

size_t get_serialized_size_bytes ( const SerDe &  sd = SerDe()) const
inline

Computes size needed to serialize the current state of the sketch.

This version is for all other types and can be expensive since every item needs to be looked at.

Parameters
sdinstance of a SerDe
Returns
size in bytes needed to serialize this sketch

◆ serialize() [1/2]

vector_bytes serialize ( unsigned  header_size_bytes = 0,
const SerDe &  sd = SerDe() 
) const

This method serializes the sketch as a vector of bytes.

An optional header can be reserved in front of the sketch. It is a blank space of a given size. This header is used in Datasketches PostgreSQL extension.

Parameters
header_size_bytesspace to reserve in front of the sketch
sdinstance of a SerDe

◆ serialize() [2/2]

void serialize ( std::ostream &  os,
const SerDe &  sd = SerDe() 
) const

This method serializes the sketch into a given stream in a binary form.

Parameters
osoutput stream
sdinstance of a SerDe

◆ deserialize() [1/2]

static var_opt_sketch deserialize ( std::istream &  is,
const SerDe &  sd = SerDe(),
const A &  allocator = A() 
)
static

This method deserializes a sketch from a given stream.

Parameters
isinput stream
sdinstance of a SerDe
allocatorinstance of an allocator
Returns
an instance of a sketch

◆ deserialize() [2/2]

static var_opt_sketch deserialize ( const void *  bytes,
size_t  size,
const SerDe &  sd = SerDe(),
const A &  allocator = A() 
)
static

This method deserializes a sketch from a given array of bytes.

Parameters
bytespointer to the array of bytes
sizethe size of the array
sdinstance of a SerDe
allocatorinstance of an allocator
Returns
an instance of a sketch

◆ to_string()

string< A > to_string

Prints a summary of the sketch.

Returns
the summary as a string

◆ items_to_string()

string< A > items_to_string

Prints the raw sketch items to a string.

Calls items_to_stream() internally. Only works for type T with a defined std::ostream& operator<<(std::ostream&, const T&) and kept separate from to_string() to allow compilation even if T does not have such an operator defined.

Returns
a string with the sketch items

◆ begin()

var_opt_sketch< T, A >::const_iterator begin

Iterator pointing to the first item in the sketch.

If the sketch is empty, the returned iterator must not be dereferenced or incremented.

Returns
iterator pointing to the first item in the sketch

◆ end()

var_opt_sketch< T, A >::const_iterator end

Iterator pointing to the past-the-end item in the sketch.

The past-the-end item is the hypothetical item that would follow the last item. It does not point to any item, and must not be dereferenced or incremented.

Returns
iterator pointing to the past-the-end item in the sketch

The documentation for this class was generated from the following files: