datasketches-cpp
|
DataSketches namespace. More...
Namespaces | |
cpc_constants | |
CPC constants. | |
ebpps_constants | |
EBPPS sketch constants. | |
kll_constants | |
KLL sketch constants. | |
quantiles_constants | |
Constants for Quantiles sketch. | |
req_constants | |
REQ sketch constants. | |
theta_constants | |
Theta constants. | |
var_opt_constants | |
VarOpt sketch constants. | |
Classes | |
class | bounds_binomial_proportions |
Confidence intervals for binomial proportions. More... | |
class | kolmogorov_smirnov |
Kolmogorov-Smirnov test for KLL or Quantiles sketches. More... | |
class | quantiles_sorted_view |
Sorted view for quantiles sketches (REQ, KLL and Quantiles) More... | |
struct | serde |
Interface for serializing and deserializing items. More... | |
struct | serde< T, typename std::enable_if< std::is_arithmetic< T >::value >::type > |
serde for all fixed-size arithmetic types (int and float of different sizes). More... | |
struct | serde< std::string > |
serde for std::string items. More... | |
class | cpc_sketch_alloc |
High performance C++ implementation of Compressed Probabilistic Counting (CPC) Sketch. More... | |
class | cpc_union_alloc |
High performance C++ implementation of Compressed Probabilistic Counting (CPC) Union. More... | |
class | hll_union_alloc |
This performs union operations for HLL sketches. More... | |
class | HllSketchImpl |
This is a high performance implementation of Phillipe Flajolet's HLL sketch but with significantly improved error behavior. More... | |
class | kll_sketch |
Implementation of a very compact quantiles sketch with lazy compaction scheme and nearly optimal accuracy per retained item. More... | |
class | bounds_on_ratios_in_sampled_sets |
Bounds on ratios in sampled sets. More... | |
class | bounds_on_ratios_in_theta_sketched_sets |
Bounds on ratios in Theta sketched sets. More... | |
class | theta_a_not_b_alloc |
Theta A-not-B (set difference). More... | |
class | theta_intersection_alloc |
Theta intersection. More... | |
class | jaccard_similarity_base |
Base class for Jaccard similarity. More... | |
class | theta_sketch_alloc |
Base class for the Theta Sketch, a generalization of the Kth Minimum Value (KMV) sketch. More... | |
class | update_theta_sketch_alloc |
Update Theta sketch. More... | |
class | compact_theta_sketch_alloc |
Compact Theta sketch. More... | |
class | wrapped_compact_theta_sketch_alloc |
Wrapped Compact Theta sketch. More... | |
class | base_theta_sketch_alloc |
Abstract base class for Theta sketch. More... | |
class | theta_union_alloc |
Theta Union. More... | |
class | theta_base_builder |
Theta base builder. More... | |
class | array_tuple_a_not_b |
array tuple A-not-B More... | |
class | array_tuple_intersection |
array tuple intersection More... | |
class | default_array_tuple_update_policy |
default array tuple update policy More... | |
class | compact_array_tuple_sketch |
Compact array tuple sketch. More... | |
class | update_array_tuple_sketch |
Update array tuple sketch. More... | |
struct | default_array_tuple_union_policy |
default array tuple union policy More... | |
class | array_tuple_union |
array tuple union More... | |
class | tuple_a_not_b |
tuple A-not-B More... | |
class | tuple_intersection |
Tuple intersection. More... | |
class | tuple_sketch |
Base class for Tuple sketch. More... | |
class | update_tuple_sketch |
Update Tuple sketch. More... | |
class | compact_tuple_sketch |
Compact Tuple sketch. More... | |
class | tuple_base_builder |
Tuple base builder. More... | |
class | tuple_union |
Tuple Union. More... | |
class | quantiles_sketch |
This is a stochastic streaming sketch that enables near-real time analysis of the approximate distribution from a very large stream in a single pass. More... | |
class | ebpps_sketch |
An implementation of an Exact and Bounded Sampling Proportional to Size sketch. More... | |
class | var_opt_union |
Provides a unioning operation over var_opt_sketch objects. More... | |
class | var_opt_sketch |
This sketch samples data from a stream of items. More... | |
class | density_sketch |
Density sketch. More... | |
class | frequent_items_sketch |
Frequent Items sketch. More... | |
class | count_min_sketch |
C++ implementation of the CountMin sketch data structure of Cormode and Muthukrishnan. More... | |
class | req_sketch |
Relative Error Quantiles Sketch. More... | |
Typedefs | |
using | cpc_sketch = cpc_sketch_alloc< std::allocator< uint8_t > > |
CPC sketch alias with default allocator. | |
using | cpc_union = cpc_union_alloc< std::allocator< uint8_t > > |
CPC union alias with default allocator. | |
using | hll_sketch = hll_sketch_alloc< std::allocator< uint8_t > > |
HLL sketch alias with default allocator. | |
using | hll_union = hll_union_alloc< std::allocator< uint8_t > > |
HLL union alias with default allocator. | |
template<typename Allocator = std::allocator<uint64_t>> | |
using | theta_jaccard_similarity_alloc = jaccard_similarity_base< theta_union_alloc< Allocator >, theta_intersection_alloc< Allocator >, trivial_extract_key > |
Theta Jaccard similarity alias. | |
using | theta_jaccard_similarity = theta_jaccard_similarity_alloc< std::allocator< uint64_t > > |
Theta Jaccard similarity alias with default allocator. | |
using | theta_sketch = theta_sketch_alloc< std::allocator< uint64_t > > |
Theta sketch alias with default allocator. | |
using | update_theta_sketch = update_theta_sketch_alloc< std::allocator< uint64_t > > |
Update Theta sketch alias with default allocator. | |
using | compact_theta_sketch = compact_theta_sketch_alloc< std::allocator< uint64_t > > |
Compact Theta sketch alias with default allocator. | |
using | wrapped_compact_theta_sketch = wrapped_compact_theta_sketch_alloc< std::allocator< uint64_t > > |
Wrapped Compact Theta sketch alias with default allocator. | |
using | default_array_of_doubles_update_policy = default_array_tuple_update_policy< array< double > > |
convenience alias with default allocator, default policy for update_array_of_doubles_sketch | |
using | update_array_of_doubles_sketch = update_array_tuple_sketch< array< double > > |
convenience alias with default allocator, equivalent to ArrayOfDoublesUpdatableSketch in Java | |
using | compact_array_of_doubles_sketch = compact_array_tuple_sketch< array< double > > |
convenience alias with default allocator, equivalent to ArrayOfDoublesCompactSketch in Java | |
using | default_array_of_doubles_union_policy = default_array_tuple_union_policy< array< double > > |
convenience alias, default policy for array_of_doubles_union | |
using | array_of_doubles_union = array_tuple_union< array< double > > |
convenience alias with default allocator, equivalent to ArrayOfDoublesUnion in Java | |
template<typename Policy > | |
using | array_of_doubles_intersection = array_tuple_intersection< array< double >, Policy > |
convenience alias with default allocator, equivalent to ArrayOfDoublesIntersection in Java no default policy since it is not clear in general | |
using | array_of_doubles_a_not_b = array_tuple_a_not_b< array< double > > |
convenience alias with default allocator, equivalent to ArrayOfDoublesAnotB in Java | |
template<typename Summary , typename IntersectionPolicy , typename UnionPolicy = default_tuple_union_policy<Summary>, typename Allocator = std::allocator<Summary>> | |
using | tuple_jaccard_similarity = jaccard_similarity_base< tuple_union< Summary, UnionPolicy, Allocator >, tuple_intersection< Summary, IntersectionPolicy, Allocator >, pair_extract_key< uint64_t, Summary > > |
Tuple Jaccard similarity alias. | |
Enumerations | |
enum | target_hll_type { HLL_4 , HLL_6 , HLL_8 } |
Specifies the target type of HLL sketch to be created. More... | |
enum | frequent_items_error_type { NO_FALSE_POSITIVES , NO_FALSE_NEGATIVES } |
Frequent items error type. More... | |
Functions | |
template<typename A > | |
void | cpc_init () |
Allocation and initialization of global decompression (decoding) tables. More... | |
DataSketches namespace.
enum target_hll_type |
Specifies the target type of HLL sketch to be created.
It is a target in that the actual allocation of the HLL array is deferred until sufficient number of items have been received by the warm-up phases.
These three target types are isomorphic representations of the same underlying HLL algorithm. Thus, given the same value of lg_config_k and the same input, all three HLL target types will produce identical estimates and have identical error distributions.
The memory (and also the serialization) of the sketch during this early warmup phase starts out very small (8 bytes, when empty) and then grows in increments of 4 bytes as required until the full HLL array is allocated. This transition point occurs at about 10% of K for sketches where lg_config_k is > 8.
HLL_8 This uses an 8-bit byte per HLL bucket. It is generally the fastest in terms of update time, but has the largest storage footprint of about K bytes.
HLL_6 This uses a 6-bit field per HLL bucket. It is the generally the next fastest in terms of update time with a storage footprint of about 3/4 * K bytes.
Enumerator | |
---|---|
HLL_4 | 4 bits per entry (most compact, size may vary) |
HLL_6 | 6 bits per entry (fixed size) |
HLL_8 | 8 bits per entry (fastest, fixed size) |
void cpc_init | ( | ) |
Allocation and initialization of global decompression (decoding) tables.
Call this before anything else if you want to control the initialization time. For instance, to have this happen outside of a transaction context. Otherwise initialization happens on the first use (serialization or deserialization). It is safe to call more than once assuming no race conditions. This is not thread safe! Neither is the rest of the library.