datasketches-java 6.2.0 API

Sketching Core Library

Overview

The Sketching Core Library provides a range of stochastic streaming algorithms and closely related java technologies that are particularly useful when integrating this technology into systems that must deal with massive data.

This library is divided into packages that constitute distinct groups of functionality:

Note: In general, if the requirements or promises of any method's contract are not fulfilled (that is, if there is a bug in either the method or its caller), then an unchecked exception will be thrown. The precise type of such an unchecked exception does not form part of any method's contract.
Packages 
Package Description
org.apache.datasketches
This package is the parent package for all sketch families and common code areas.
org.apache.datasketches.common
This package is for common classes that may be used across all the sketch families.
org.apache.datasketches.cpc
Compressed Probabilistic Counting sketch family
org.apache.datasketches.fdt
Frequent Distinct Tuples Sketch
org.apache.datasketches.filters
The filters package contains data structures used to determine approximate set-membership.
org.apache.datasketches.filters.bloomfilter  
org.apache.datasketches.frequencies
This package is dedicated to streaming algorithms that enable estimation of the frequency of occurrence of items in a weighted multiset stream of items.
org.apache.datasketches.hash
The hash package contains a high-performing and extended Java implementations of Austin Appleby's 128-bit MurmurHash3 hash function originally coded in C.
org.apache.datasketches.hll
The DataSketches™ HLL sketch family package
org.apache.datasketches.hllmap
The hllmap package contains a space efficient HLL mapping sketch of keys to approximate unique count of identifiers.
org.apache.datasketches.kll
This package is for the implementations of the sketch algorithm developed by Zohar Karnin, Kevin Lang, and Edo Liberty that is commonly referred to as the "KLL" sketch after the authors' last names.
org.apache.datasketches.partitions  
org.apache.datasketches.quantiles
The quantiles package contains stochastic streaming algorithms that enable single-pass analysis of the distribution of a stream of quantiles.
org.apache.datasketches.quantilescommon
This package contains common tools and methods for the quantiles, kll and req packages.
org.apache.datasketches.req
This package is for the implementation of the Relative Error Quantiles sketch algorithm.
org.apache.datasketches.sampling
This package is dedicated to streaming algorithms that enable fixed size, uniform sampling of weighted and unweighted items from a stream.
org.apache.datasketches.tdigest
t-Digest for estimating quantiles and ranks.
org.apache.datasketches.theta
The theta package contains the basic sketch classes that are members of the Theta Sketch Framework.
org.apache.datasketches.thetacommon
This package contains common tools and methods for the theta, tuple, tuple/* and fdt packages.
org.apache.datasketches.tuple
The tuple package contains a number of sketches based on the same fundamental algorithms of the Theta Sketch Framework and extend these concepts for whole new families of sketches.
org.apache.datasketches.tuple.adouble
This package is for a generic implementation of the Tuple sketch for single Double value.
org.apache.datasketches.tuple.aninteger
This package is for a generic implementation of the Tuple sketch for single Integer value.
org.apache.datasketches.tuple.arrayofdoubles
This package is for a concrete implementation of the Tuple sketch for an array of double values.
org.apache.datasketches.tuple.strings
This package is for a generic implementation of the Tuple sketch for single String value.