Package org.apache.datasketches.count
A Count-Min sketch is a probabilistic data structure that estimates the frequency of items in a large data stream using a fixed amount of memory. It uses a 2D array of counters and multiple hash functions to map each item to a position in every row. To estimate an item's frequency, it takes the minimum count from all the positions it hashes to, which provides an overestimate that is guaranteed to be greater than or equal to the true count.
A Count sketch is a type of dimensionality reduction that is particularly efficient in statistics, machine learning and algorithms. It was invented by Moses Charikar, Kevin Chen and Martin Farach-Colton in an effort to speed up the AMS Sketch by Alon, Matias and Szegedy for approximating the frequency moments of streams[4] (these calculations require counting of the number of occurrences for the distinct elements of the stream).
- See Also:
-
ClassesClassDescriptionJava implementation of the CountMin sketch data structure of Cormode and Muthukrishnan.