org.apache.datasketches.count (datasketches-java 9.0.0 API)

package org.apache.datasketches.count

This package in intended for implementations of the the Count Sketch and the Count-min Sketch both of which can be used to estimate frequency-moments of a stream of distinct elements. They are different from the unique counting sketches (HLL, Theta, CPC, etc.) and different from the Frequent-Items Sketch.

A Count-Min sketch is a probabilistic data structure that estimates the frequency of items in a large data stream using a fixed amount of memory. It uses a 2D array of counters and multiple hash functions to map each item to a position in every row. To estimate an item's frequency, it takes the minimum count from all the positions it hashes to, which provides an overestimate that is guaranteed to be greater than or equal to the true count.

A Count sketch is a type of dimensionality reduction that is particularly efficient in statistics, machine learning and algorithms. It was invented by Moses Charikar, Kevin Chen and Martin Farach-Colton in an effort to speed up the AMS Sketch by Alon, Matias and Szegedy for approximating the frequency moments of streams[4] (these calculations require counting of the number of occurrences for the distinct elements of the stream).

See Also:

Related Packages

Package

Description

org.apache.datasketches

This package is the parent package for all sketch families and common code areas.
Classes

Class

Description

CountMinSketch

Java implementation of the CountMin sketch data structure of Cormode and Muthukrishnan.

Package org.apache.datasketches.count