Density Sketch
Builds a coreset from the given set of input points. Provides density estimate at a given point.
Based on the following paper: Zohar Karnin, Edo Liberty “Discrepancy, Coresets, and Sketches in Machine Learning” https://proceedings.mlr.press/v99/karnin19a/karnin19a.pdf
Inspired by the following implementation: https://github.com/edoliberty/streaming-quantiles/blob/f688c8161a25582457b0a09deb4630a81406293b/gde.py
Requires the use of a KernelFunction
to compute the distance between two vectors.
- class density_sketch(*args, **kwargs)
Static Methods:
- deserialize(bytes: bytes, kernel: _datasketches.KernelFunction) _datasketches.density_sketch
Reads a bytes object and returns the corresponding density_sketch
Non-static Methods:
- __init__(self, k: int, dim: int, kernel: _datasketches.KernelFunction) None
Creates a new density sketch
- Parameters:
k (int) – controls the size and error of the sketch
dim (int) – dimension of the input data
kernel (KernelFunction) – instance of a kernel
- property dim
The configured parameter dim
- get_estimate
Returns an approximate density at the given point
- is_empty
Returns True if the sketch is empty, otherwise False
- is_estimation_mode
Returns True if the sketch is in estimation mode, otherwise False
- property k
The configured parameter k
- merge
Merges the provided sketch into this one
- property n
The length of the input stream
- property num_retained
The number of retained items (samples) in the sketch
- serialize
Serializes the sketch into a bytes object
- to_string
Produces a string summary of the sketch
- update
Updates the sketch with the given vector