All Classes and Interfaces

Class
Description
Computes a set difference, A-AND-NOT-B, of two theta sketches.
AnotB<S extends Summary>
Computes a set difference, A-AND-NOT-B, of two generic tuple sketches.
Methods of serializing and deserializing arrays of Boolean as a bit array.
Computes a set difference of two tuple sketches of type ArrayOfDoubles
Computes a set difference, A-AND-NOT-B, of two ArrayOfDoublesSketches.
Combines two arrays of double values for use with ArrayOfDoubles tuple sketches
Top level compact tuple sketch of type ArrayOfDoubles.
Computes the intersection of two or more tuple sketches of type ArrayOfDoubles.
Methods of serializing and deserializing arrays of Double.
Builds set operations object for tuple sketches of type ArrayOfDoubles.
The base class for the tuple sketch of type ArrayOfDoubles, where an array of double values is associated with each key.
Convenient static methods to instantiate tuple sketches of type ArrayOfDoubles.
Interface for iterating over tuple sketches of type ArrayOfDoubles
The base class for unions of tuple sketches of type ArrayOfDoubles.
The top level for updatable tuple sketches of type ArrayOfDoubles.
For building a new ArrayOfDoublesUpdatableSketch
Base class for serializing and deserializing custom types.
Methods of serializing and deserializing arrays of Long.
Methods of serializing and deserializing arrays of the object version of primitive types of Number.
Methods of serializing and deserializing arrays of String.
 
 
 
 
 
Methods of serializing and deserializing arrays of String.
Contains common equality binary search algorithms.
Algorithms with logarithmic complexity for searching in an array.
This class enables the estimation of error bounds given a sample set size, the sampling probability theta, the number of standard deviations and a simple noDataSeen flag.
Used as part of Theta compression.
A Bloom filter is a data structure that can be used for probabilistic set membership.
This class provides methods to help estimate the correct parameters when creating a Bloom filter, and methods to create the filter using those values.
Confidence intervals for binomial proportions.
This class is used to compute the bounds on the estimate of the ratio |B| / |A|, where: |A| is the unknown size of a set A of unique identifiers. |B| is the unknown size of a subset B of A. a = |SA| is the observed size of a sample of A that was obtained by Bernoulli sampling with a known inclusion probability f. b = |SA ∩ B| is the observed size of a subset of SA.
This class is used to compute the bounds on the estimate of the ratio B / A, where: A is a Theta Sketch of population PopA. B is a Theta Sketch of population PopB that is a subset of A, obtained by an intersection of A with some other Theta Sketch C, which acts like a predicate or selection clause. The estimate of the ratio PopB/PopA is BoundsOnRatiosInThetaSketchedSets.getEstimateOfBoverA(A, B). The Upper Bound estimate on the ratio PopB/PopA is BoundsOnRatiosInThetaSketchedSets.getUpperBoundForBoverA(A, B). The Lower Bound estimate on the ratio PopB/PopA is BoundsOnRatiosInThetaSketchedSets.getLowerBoundForBoverA(A, B). Note: The theta of A cannot be greater than the theta of B.
This class is used to compute the bounds on the estimate of the ratio B / A, where: A is a Tuple Sketch of population PopA. B is a Tuple or Theta Sketch of population PopB that is a subset of A, obtained by an intersection of A with some other Tuple or Theta Sketch C, which acts like a predicate or selection clause. The estimate of the ratio PopB/PopA is BoundsOnRatiosInThetaSketchedSets.getEstimateOfBoverA(A, B). The Upper Bound estimate on the ratio PopB/PopA is BoundsOnRatiosInThetaSketchedSets.getUpperBoundForBoverA(A, B). The Lower Bound estimate on the ratio PopB/PopA is BoundsOnRatiosInThetaSketchedSets.getLowerBoundForBoverA(A, B). Note: The theta of A cannot be greater than the theta of B.
This instructs the user about which of the upper and lower bounds of a partition definition row should be included with the returned data.
Useful methods for byte arrays.
Utilities for the classic quantiles sketches and independent of the type.
Compact sketches are inherently read only.
The parent class of all the CompactSketches.
CompactSketches are never created directly.
This code is used both by unit tests, for short running tests, and by the characterization repository for longer running, more exhaustive testing.
This is a unique-counting sketch that implements the Compressed Probabilistic Counting (CPC, a.k.a FM85) algorithms developed by Kevin Lang in his paper Back to the Future: an Even More Nearly Optimal Cardinality Estimation Algorithm.
The union (merge) operation for the CPC sketches.
This provides a read-only view of a serialized image of a CpcSketch, which can be on-heap or off-heap represented as a Memory object, or on-heap represented as a byte array.
Returns an object and its size in bytes as a result of a deserialize operation
This class can maintain the BitArray object off-heap.
 
This is an implementation of the Low Discrepancy Mergeable Quantiles Sketch, using doubles, described in section 3.2 of the journal version of the paper "Mergeable Summaries" by Agarwal, Cormode, Huang, Phillips, Wei, and Yi:
For building a new quantiles DoublesSketch.
Iterator over DoublesSketch.
The SortedView of the Quantiles Classic DoublesSketch and the KllDoublesSketch.
The Sorted View for quantile sketches of primitive type double.
Iterator over quantile sketches of primitive type double.
Summary for generic tuple sketches of type Double.
The aggregation modes for this Summary
 
Factory for DoubleSummary.
Methods for defining how unions and intersections of two objects of type DoubleSummary are performed.
The API for Union operations for quantiles DoublesSketches
For building a new DoublesSketch Union operation.
An implementation of an Exact and Bounded Sampling Proportional to Size sketch.
Specifies one of two types of error regions of the statistical classification Confusion Matrix that can be excluded from a returned sample of Frequent Items.
Defines the various families of sketch and set operation classes.
A Frequent Distinct Tuples sketch.
Filter<T extends Summary>
Class for filtering entries from a Sketch given a Summary
The SortedView for the KllFloatsSketch and the ReqSketch.
The Sorted View for quantiles of primitive type float.
Iterator over quantile sketches of primitive type float.
This provides efficient, unique and unambiguous binary searching for inequality comparison criteria for ordered arrays of values that may include duplicate values.
The enumerator of inequalities
This defines the returned results of the getParitionBoundaries() function and includes the basic methods needed to construct actual partitions.
The Sorted View for quantiles of generic type.
Iterator over quantile sketches of generic type.
Defines a Group from a Frequent Distinct Tuple query.
This is used to iterate over the retained hash values of the Theta sketch.
Helper class for the common hash table methods.
The HllSketch is actually a collection of compact implementations of Phillipe Flajolet’s HyperLogLog (HLL) sketch but with significantly improved error behavior and excellent speed performance.
This class reinserts the min and max values into the sorted view arrays as required.
A simple structure to hold a pair of arrays
A simple structure to hold a pair of arrays
A simple structure to hold a pair of arrays
A simple structure to hold a pair of arrays
This provides efficient, unique and unambiguous binary searching for inequality comparison criteria for ordered arrays of values that may include duplicate values.
 
Summary for generic tuple sketches of type Integer.
The aggregation modes for this Summary
 
Factory for IntegerSummary.
Methods for defining how unions and intersections of two objects of type IntegerSummary are performed.
The API for intersection operations
Computes an intersection of two or more generic tuple sketches or generic tuple sketches combined with theta sketches.
This sketch is useful for tracking approximate frequencies of items of type <T> with optional associated counts (<T> item, long count) that are members of a multiset of such items.
This is an implementation of the Low Discrepancy Mergeable Quantiles Sketch, using generic items, described in section 3.2 of the journal version of the paper "Mergeable Summaries" by Agarwal, Cormode, Huang, Phillips, Wei, and Yi:
Row class that defines the return values from a getFrequentItems query.
Iterator over ItemsSketch.
The SortedView for the KllItemsSketch and the classic ItemsSketch.
The API for Union operations for generic ItemsSketches
Jaccard similarity of two Theta Sketches.
Jaccard similarity of two Tuple Sketches, or alternatively, of a Tuple and Theta Sketch.
This variation of the KllSketch implements primitive doubles.
Iterator over KllDoublesSketch.
This variation of the KllSketch implements primitive floats.
Iterator over KllFloatsSketch.
This variation of the KllSketch implements generic data types.
Iterator over KllItemsSketch.
This variation of the KllSketch implements primitive longs.
Iterator over KllLongsSketch.
This class is the root of the KLL sketch class hierarchy.
Used primarily to define the structure of the serialized sketch.
Used to define the variable type of the current instance of this class.
The base implementation for the KLL sketch iterator hierarchy used for viewing the non-ordered quantiles retained by a sketch.
Kolmogorov-Smirnov Test See Kolmogorov–Smirnov Test
This sketch is useful for tracking approximate frequencies of long items with optional associated counts (long item, long count) that are members of a multiset of such items.
Row class that defines the return values from a getFrequentItems query.
The SortedView of the KllLongsSketch.
The Sorted View for quantile sketches of primitive type long.
Iterator over quantile sketches of primitive type long.
Methods for inquiring the status of a backing Memory object.
This code is used both by unit tests, for short running tests, and by the characterization repository for longer running, more exhaustive testing.
The MurmurHash3 is a fast, non-cryptographic, 128-bit hash function that has excellent avalanche and 2-way bit independence properties.
A general purpose wrapper for the MurmurHash3.
A partitioning process that can partition very large data sets into thousands of partitions of approximately the same size.
Defines a row for List of PartitionBounds.
Holds data for a Stack element
This enables the special functions for performing efficient partitioning of massive data.
This processes the contents of a FDT sketch to extract the primary keys with the most frequent unique combinations of the non-primary dimensions.
This is a stochastic streaming sketch that enables near-real time analysis of the approximate distribution of items from a very large stream in a single pass, requiring only that the items are comparable.
The Quantiles API for item type double.
The quantiles sketch iterator for primitive type double.
These search criteria are used by the KLL, REQ and Classic Quantiles sketches in the DataSketches library.
The Quantiles API for item type float.
The quantiles sketch iterator for primitive type float.
The Quantiles API for item type generic.
The quantiles sketch iterator for generic types.
The Quantiles API for item type long.
The quantiles sketch iterator for primitive type long.
This is the base interface for the SketchIterator hierarchy used for viewing the non-ordered quantiles retained by a sketch.
Utilities for the quantiles sketches.
This code is used both by unit tests, for short running tests, and by the characterization repository for longer running, more exhaustive testing.
QuickSelect algorithm improved from Sedgewick.
The signaling interface that allows comprehensive analysis of the ReqSketch and ReqCompactor while eliminating code clutter in the main classes.
This Relative Error Quantiles Sketch is the Java implementation based on the paper "Relative Error Streaming Quantiles" by Graham Cormode, Zohar Karnin, Edo Liberty, Justin Thaler, Pavel Veselý, and loosely derived from a Python prototype written by Pavel Veselý.
For building a new ReqSketch
Iterator over all retained items of the ReqSketch.
This sketch provides a reservoir sample over an input stream of items.
Class to union reservoir samples of generic items.
This sketch provides a reservoir sample over an input stream of longs.
Class to union reservoir samples of longs.
For the Families that accept this configuration parameter, it controls the size multiple that affects how fast the internal cache grows, when more space is required.
A simple object o capture the results of a subset sum query on a sampling sketch.
Multipurpose serializer-deserializer for a collection of sketches defined by the enum.
Defines the sketch classes that this SerializerDeserializer can handle.
The parent API for all Set Operations
For building a new SetOperation.
Simplifies and speeds up set operations by resolving specific corner cases.
A not B actions
List of corner cases
Intersection actions
List of union actions
The top-level class for all theta sketches.
Sketch<S extends Summary>
This is an equivalent to org.apache.datasketches.theta.Sketch with addition of a user-defined Summary object associated with every unique entry in the sketch.
This class brings together the common sketch and set operation creation methods and the public static methods into one place.
Convenient static methods to instantiate generic tuple sketches.
Illegal Arguments Exception class for the library
Exception class for the library
Write operation attempted on a read-only class.
Illegal State Exception class for the library
This is a callback request to the data source to fill a quantiles sketch, which is returned to the caller.
This defines the methods required to compute the partition limits.
Specialized sorting algorithm that can sort one array and permute another array the same way.
This is the base interface for the Sorted View interface hierarchy and defines the methods that are type independent.
This is the base interface for the SortedViewIterator hierarchy used with a SortedView obtained from a quantile-type sketch.
This code is used both by unit tests, for short running tests, and by the characterization repository for longer running, more exhaustive testing.
Interface for user-defined Summary, which is associated with every hash in a tuple sketch
Interface for deserializing user-defined Summary
Interface for user-defined SummaryFactory
This is to provide methods of producing unions and intersections of two Summary objects.
Used to suppress SpotBug warnings.
t-Digest for estimating quantiles and ranks.
 
Specifies the target type of HLL sketch to be created.
Utility methods for the Theta Family of sketches
Iterator over a generic tuple sketch
This performs union operations for all HllSketches.
Compute the union of two or more theta sketches.
Union<S extends Summary>
Compute the union of two or more generic tuple sketches or generic tuple sketches combined with theta sketches.
This is a real-time, key-value HLL mapping sketch that tracks approximate unique counts of identifiers (the values) associated with each key.
An extension of QuickSelectSketch<S>, which can be updated with many types of keys.
For building a new generic tuple UpdatableSketch
Interface for updating user-defined Summary
 
The parent class for the Update Sketch families, such as QuickSelect and Alpha.
For building a new UpdateSketch.
Common utility functions.
Common utility functions for Tuples
This class provides access to the samples contained in a VarOptItemsSketch.
This sketch provides a variance optimal sample over an input stream of weighted items.
Provides a unioning operation over varopt sketches.
The XxHash is a fast, non-cryptographic, 64-bit hash function that has excellent avalanche and 2-way bit independence properties.