Variance Optimal Sampling (VarOpt)
A VarOpt sketch samples data from a stream of items. The sketch is desinged for optimal (minimum) variance when querying the sketch to estimate subset sums of items matching a provided predicate. The sketch will produce a sample of size k (or smaller if fewer items have been presented), with the probability of including an item roughly corresponding it the item’s weight relative to the total weight of all items presented to the sketch.
VarOpt sampling is related to reservoir sampling, with improved error bounds for subset sum estimation. Feeding the sketch items with a uniform weight value will produce a sample equivalent to reservoir sampling.
Note
Serializing and deserializing this sketch requires the use of a PyObjectSerDe
.
- class var_opt_sketch(*args, **kwargs)
Static Methods:
- deserialize(bytes: bytes, serde: _datasketches.PyObjectSerDe) _datasketches.var_opt_sketch
Reads a bytes object and returns the corresponding var opt sketch
Non-static Methods:
- __init__(self, k: int) None
Creates a new Var Opt sketch instance
- Parameters:
k (int) – Maximum number of samples in the sketch
- estimate_subset_sum
Applies a provided predicate to the sketch and returns the estimated total weight matching the predicate, as well as upper and lower bounds on the estimate and the total weight processed by the sketch
- get_serialized_size_bytes
Computes the size in bytes needed to serialize the current sketch
- is_empty
Returns True if the sketch is empty, otherwise False
- property k
Returns the sketch’s maximum configured sample size
- property n
Returns the total stream length
- property num_samples
Returns the number of samples currently in the sketch
- serialize
Serializes the sketch into a bytes object
- to_string
Produces a string summary of the sketch and optionally prints the items
- update
Updates the sketch with the given value and weight
- class var_opt_union(*args, **kwargs)
Static Methods:
- deserialize(bytes: bytes, serde: _datasketches.PyObjectSerDe) _datasketches.var_opt_union
Constructs a var opt union from the given bytes using the provided serde
Non-static Methods:
- __init__(self, max_k: int) None
- get_result
Returns a sketch corresponding to the union result
- get_serialized_size_bytes
Computes the size in bytes needed to serialize the current union
- reset
Resets the union to the empty state
- serialize
Serializes the union into a bytes object with the provided serde
- to_string
Produces a string summary of the sketch
- update
Updates the union with the given sketch