Variance Optimal Sampling (VarOpt)

A VarOpt sketch samples data from a stream of items. The sketch is desinged for optimal (minimum) variance when querying the sketch to estimate subset sums of items matching a provided predicate. The sketch will produce a sample of size k (or smaller if fewer items have been presented), with the probability of including an item roughly corresponding it the item’s weight relative to the total weight of all items presented to the sketch.

VarOpt sampling is related to reservoir sampling, with improved error bounds for subset sum estimation. Feeding the sketch items with a uniform weight value will produce a sample equivalent to reservoir sampling.


Serializing and deserializing this sketch requires the use of a PyObjectSerDe.

class var_opt_sketch(*args, **kwargs)

Static Methods:

deserialize(bytes: bytes, serde: _datasketches.PyObjectSerDe) _datasketches.var_opt_sketch

Reads a bytes object and returns the corresponding var opt sketch

Non-static Methods:

__init__(self, k: int) None

Creates a new Var Opt sketch instance


k (int) – Maximum number of samples in the sketch


Applies a provided predicate to the sketch and returns the estimated total weight matching the predicate, as well as upper and lower bounds on the estimate and the total weight processed by the sketch


Computes the size in bytes needed to serialize the current sketch


Returns True if the sketch is empty, otherwise False

property k

Returns the sketch’s maximum configured sample size

property n

Returns the total stream length

property num_samples

Returns the number of samples currently in the sketch


Serializes the sketch into a bytes object


Produces a string summary of the sketch and optionally prints the items


Updates the sketch with the given value and weight

class var_opt_union(*args, **kwargs)

Static Methods:

deserialize(bytes: bytes, serde: _datasketches.PyObjectSerDe) _datasketches.var_opt_union

Constructs a var opt union from the given bytes using the provided serde

Non-static Methods:

__init__(self, max_k: int) None

Returns a sketch corresponding to the union result


Computes the size in bytes needed to serialize the current union


Resets the union to the empty state


Serializes the union into a bytes object with the provided serde


Produces a string summary of the sketch


Updates the union with the given sketch