Package org.apache.datasketches.count


package org.apache.datasketches.count
This package in intended for implementations of the the Count Sketch and the Count-min Sketch both of which can be used to estimate frequency-moments of a stream of distinct elements. They are different from the unique counting sketches (HLL, Theta, CPC, etc.) and different from the Frequent-Items Sketch.

A Count-Min sketch is a probabilistic data structure that estimates the frequency of items in a large data stream using a fixed amount of memory. It uses a 2D array of counters and multiple hash functions to map each item to a position in every row. To estimate an item's frequency, it takes the minimum count from all the positions it hashes to, which provides an overestimate that is guaranteed to be greater than or equal to the true count.

A Count sketch is a type of dimensionality reduction that is particularly efficient in statistics, machine learning and algorithms. It was invented by Moses Charikar, Kevin Chen and Martin Farach-Colton in an effort to speed up the AMS Sketch by Alon, Matias and Szegedy for approximating the frequency moments of streams[4] (these calculations require counting of the number of occurrences for the distinct elements of the stream).

See Also:
  • Classes
    Class
    Description
    Java implementation of the CountMin sketch data structure of Cormode and Muthukrishnan.