Package org.apache.datasketches.frequencies
This package is dedicated to streaming algorithms that enable estimation of the
frequency of occurrence of items in a weighted multiset stream of items.
If the frequency distribution of items is sufficiently skewed, these algorithms are very
useful in identifying the "Heavy Hitters" that occurred most frequently in the stream.
The accuracy of the estimation of the frequency of an item has well understood error
bounds that can be returned by the sketch.
These algorithms are sometimes referred to as "TopN" algorithms.
-
Class Summary Class Description ItemsSketch<T> This sketch is useful for tracking approximate frequencies of items of type <T> with optional associated counts (<T> item, long count) that are members of a multiset of such items.ItemsSketch.Row<T> Row class that defines the return values from a getFrequentItems query.LongsSketch This sketch is useful for tracking approximate frequencies of long items with optional associated counts (long item, long count) that are members of a multiset of such items.LongsSketch.Row Row class that defines the return values from a getFrequentItems query. -
Enum Summary Enum Description ErrorType Specifies one of two types of error regions of the statistical classification Confusion Matrix that can be excluded from a returned sample of Frequent Items.