datasketches-cpp
Namespaces | Classes | Typedefs | Enumerations | Functions
datasketches Namespace Reference

DataSketches namespace. More...

Namespaces

 cpc_constants
 CPC constants.
 
 ebpps_constants
 EBPPS sketch constants.
 
 kll_constants
 KLL sketch constants.
 
 quantiles_constants
 Constants for Quantiles sketch.
 
 req_constants
 REQ sketch constants.
 
 theta_constants
 Theta constants.
 
 var_opt_constants
 VarOpt sketch constants.
 

Classes

class  bounds_binomial_proportions
 Confidence intervals for binomial proportions. More...
 
class  kolmogorov_smirnov
 Kolmogorov-Smirnov test for KLL or Quantiles sketches. More...
 
class  quantiles_sorted_view
 Sorted view for quantiles sketches (REQ, KLL and Quantiles) More...
 
struct  serde
 Interface for serializing and deserializing items. More...
 
struct  serde< T, typename std::enable_if< std::is_arithmetic< T >::value >::type >
 serde for all fixed-size arithmetic types (int and float of different sizes). More...
 
struct  serde< std::string >
 serde for std::string items. More...
 
class  cpc_sketch_alloc
 High performance C++ implementation of Compressed Probabilistic Counting (CPC) Sketch. More...
 
class  cpc_union_alloc
 High performance C++ implementation of Compressed Probabilistic Counting (CPC) Union. More...
 
class  hll_union_alloc
 This performs union operations for HLL sketches. More...
 
class  HllSketchImpl
 This is a high performance implementation of Phillipe Flajolet's HLL sketch but with significantly improved error behavior. More...
 
class  kll_sketch
 Implementation of a very compact quantiles sketch with lazy compaction scheme and nearly optimal accuracy per retained item. More...
 
class  bounds_on_ratios_in_sampled_sets
 Bounds on ratios in sampled sets. More...
 
class  bounds_on_ratios_in_theta_sketched_sets
 Bounds on ratios in Theta sketched sets. More...
 
class  theta_a_not_b_alloc
 Theta A-not-B (set difference). More...
 
class  theta_intersection_alloc
 Theta intersection. More...
 
class  jaccard_similarity_base
 Base class for Jaccard similarity. More...
 
class  theta_sketch_alloc
 Base class for the Theta Sketch, a generalization of the Kth Minimum Value (KMV) sketch. More...
 
class  update_theta_sketch_alloc
 Update Theta sketch. More...
 
class  compact_theta_sketch_alloc
 Compact Theta sketch. More...
 
class  wrapped_compact_theta_sketch_alloc
 Wrapped Compact Theta sketch. More...
 
class  base_theta_sketch_alloc
 Abstract base class for Theta sketch. More...
 
class  theta_union_alloc
 Theta Union. More...
 
class  theta_base_builder
 Theta base builder. More...
 
class  array_tuple_a_not_b
 array tuple A-not-B More...
 
class  array_tuple_intersection
 array tuple intersection More...
 
class  default_array_tuple_update_policy
 default array tuple update policy More...
 
class  compact_array_tuple_sketch
 Compact array tuple sketch. More...
 
class  update_array_tuple_sketch
 Update array tuple sketch. More...
 
struct  default_array_tuple_union_policy
 default array tuple union policy More...
 
class  array_tuple_union
 array tuple union More...
 
class  tuple_a_not_b
 tuple A-not-B More...
 
class  tuple_intersection
 Tuple intersection. More...
 
class  tuple_sketch
 Base class for Tuple sketch. More...
 
class  update_tuple_sketch
 Update Tuple sketch. More...
 
class  compact_tuple_sketch
 Compact Tuple sketch. More...
 
class  tuple_base_builder
 Tuple base builder. More...
 
class  tuple_union
 Tuple Union. More...
 
class  quantiles_sketch
 This is a stochastic streaming sketch that enables near-real time analysis of the approximate distribution from a very large stream in a single pass. More...
 
class  ebpps_sketch
 An implementation of an Exact and Bounded Sampling Proportional to Size sketch. More...
 
class  var_opt_union
 Provides a unioning operation over var_opt_sketch objects. More...
 
class  var_opt_sketch
 This sketch samples data from a stream of items. More...
 
class  density_sketch
 Density sketch. More...
 
class  frequent_items_sketch
 Frequent Items sketch. More...
 
class  count_min_sketch
 C++ implementation of the CountMin sketch data structure of Cormode and Muthukrishnan. More...
 
class  req_sketch
 Relative Error Quantiles Sketch. More...
 

Typedefs

using cpc_sketch = cpc_sketch_alloc< std::allocator< uint8_t > >
 CPC sketch alias with default allocator.
 
using cpc_union = cpc_union_alloc< std::allocator< uint8_t > >
 CPC union alias with default allocator.
 
using hll_sketch = hll_sketch_alloc< std::allocator< uint8_t > >
 HLL sketch alias with default allocator.
 
using hll_union = hll_union_alloc< std::allocator< uint8_t > >
 HLL union alias with default allocator.
 
template<typename Allocator = std::allocator<uint64_t>>
using theta_jaccard_similarity_alloc = jaccard_similarity_base< theta_union_alloc< Allocator >, theta_intersection_alloc< Allocator >, trivial_extract_key >
 Theta Jaccard similarity alias.
 
using theta_jaccard_similarity = theta_jaccard_similarity_alloc< std::allocator< uint64_t > >
 Theta Jaccard similarity alias with default allocator.
 
using theta_sketch = theta_sketch_alloc< std::allocator< uint64_t > >
 Theta sketch alias with default allocator.
 
using update_theta_sketch = update_theta_sketch_alloc< std::allocator< uint64_t > >
 Update Theta sketch alias with default allocator.
 
using compact_theta_sketch = compact_theta_sketch_alloc< std::allocator< uint64_t > >
 Compact Theta sketch alias with default allocator.
 
using wrapped_compact_theta_sketch = wrapped_compact_theta_sketch_alloc< std::allocator< uint64_t > >
 Wrapped Compact Theta sketch alias with default allocator.
 
using default_array_of_doubles_update_policy = default_array_tuple_update_policy< array< double > >
 convenience alias with default allocator, default policy for update_array_of_doubles_sketch
 
using update_array_of_doubles_sketch = update_array_tuple_sketch< array< double > >
 convenience alias with default allocator, equivalent to ArrayOfDoublesUpdatableSketch in Java
 
using compact_array_of_doubles_sketch = compact_array_tuple_sketch< array< double > >
 convenience alias with default allocator, equivalent to ArrayOfDoublesCompactSketch in Java
 
using default_array_of_doubles_union_policy = default_array_tuple_union_policy< array< double > >
 convenience alias, default policy for array_of_doubles_union
 
using array_of_doubles_union = array_tuple_union< array< double > >
 convenience alias with default allocator, equivalent to ArrayOfDoublesUnion in Java
 
template<typename Policy >
using array_of_doubles_intersection = array_tuple_intersection< array< double >, Policy >
 convenience alias with default allocator, equivalent to ArrayOfDoublesIntersection in Java no default policy since it is not clear in general
 
using array_of_doubles_a_not_b = array_tuple_a_not_b< array< double > >
 convenience alias with default allocator, equivalent to ArrayOfDoublesAnotB in Java
 
template<typename Summary , typename IntersectionPolicy , typename UnionPolicy = default_tuple_union_policy<Summary>, typename Allocator = std::allocator<Summary>>
using tuple_jaccard_similarity = jaccard_similarity_base< tuple_union< Summary, UnionPolicy, Allocator >, tuple_intersection< Summary, IntersectionPolicy, Allocator >, pair_extract_key< uint64_t, Summary > >
 Tuple Jaccard similarity alias.
 

Enumerations

enum  target_hll_type { HLL_4 , HLL_6 , HLL_8 }
 Specifies the target type of HLL sketch to be created. More...
 
enum  frequent_items_error_type { NO_FALSE_POSITIVES , NO_FALSE_NEGATIVES }
 Frequent items error type. More...
 

Functions

template<typename A >
void cpc_init ()
 Allocation and initialization of global decompression (decoding) tables. More...
 

Detailed Description

DataSketches namespace.

Enumeration Type Documentation

◆ target_hll_type

Specifies the target type of HLL sketch to be created.

It is a target in that the actual allocation of the HLL array is deferred until sufficient number of items have been received by the warm-up phases.

These three target types are isomorphic representations of the same underlying HLL algorithm. Thus, given the same value of lg_config_k and the same input, all three HLL target types will produce identical estimates and have identical error distributions.

The memory (and also the serialization) of the sketch during this early warmup phase starts out very small (8 bytes, when empty) and then grows in increments of 4 bytes as required until the full HLL array is allocated. This transition point occurs at about 10% of K for sketches where lg_config_k is > 8.

  • HLL_8 This uses an 8-bit byte per HLL bucket. It is generally the fastest in terms of update time, but has the largest storage footprint of about K bytes.

  • HLL_6 This uses a 6-bit field per HLL bucket. It is the generally the next fastest in terms of update time with a storage footprint of about 3/4 * K bytes.

  • HLL_4 This uses a 4-bit field per HLL bucket and for large counts may require the use of a small internal auxiliary array for storing statistical exceptions, which are rare. For the values of lg_config_k > 13 (K = 8192), this additional array adds about 3% to the overall storage. It is generally the slowest in terms of update time, but has the smallest storage footprint of about K/2 * 1.03 bytes.
Enumerator
HLL_4 

4 bits per entry (most compact, size may vary)

HLL_6 

6 bits per entry (fixed size)

HLL_8 

8 bits per entry (fastest, fixed size)

◆ frequent_items_error_type

Frequent items error type.

Enumerator
NO_FALSE_POSITIVES 

include an item in the result list if get_lower_bound(item) > threshold

NO_FALSE_NEGATIVES 

include an item in the result list if get_upper_bound(item) > threshold

Function Documentation

◆ cpc_init()

void cpc_init ( )

Allocation and initialization of global decompression (decoding) tables.

Call this before anything else if you want to control the initialization time. For instance, to have this happen outside of a transaction context. Otherwise initialization happens on the first use (serialization or deserialization). It is safe to call more than once assuming no race conditions. This is not thread safe! Neither is the rest of the library.