Density Sketch

Builds a coreset from the given set of input points. Provides density estimate at a given point.

Based on the following paper: Zohar Karnin, Edo Liberty “Discrepancy, Coresets, and Sketches in Machine Learning” https://proceedings.mlr.press/v99/karnin19a/karnin19a.pdf

Inspired by the following implementation: https://github.com/edoliberty/streaming-quantiles/blob/f688c8161a25582457b0a09deb4630a81406293b/gde.py

Requires the use of a KernelFunction to compute the distance between two vectors.

class density_sketch(*args, **kwargs)

Static Methods:

deserialize(bytes: bytes, kernel: _datasketches.KernelFunction) _datasketches.density_sketch

Reads a bytes object and returns the corresponding density_sketch

Non-static Methods:

__init__(self, k: int, dim: int, kernel: _datasketches.KernelFunction) None

Creates a new density sketch

Parameters:
  • k (int) – controls the size and error of the sketch

  • dim (int) – dimension of the input data

  • kernel (KernelFunction) – instance of a kernel

property dim

The configured parameter dim

get_estimate

Returns an approximate density at the given point

is_empty

Returns True if the sketch is empty, otherwise False

is_estimation_mode

Returns True if the sketch is in estimation mode, otherwise False

property k

The configured parameter k

merge

Merges the provided sketch into this one

property n

The length of the input stream

property num_retained

The number of retained items (samples) in the sketch

serialize

Serializes the sketch into a bytes object

to_string

Produces a string summary of the sketch

update

Updates the sketch with the given vector