Package org.apache.datasketches.sampling
Class VarOptItemsSketch<T>
java.lang.Object
org.apache.datasketches.sampling.VarOptItemsSketch<T>
- Type Parameters:
T
- The type of object held in the sketch.
This sketch provides a variance optimal sample over an input stream of weighted items. The
sketch can be used to compute subset sums over predicates, producing estimates with optimal
variance for a given sketch size.
Using this sketch with uniformly constant item weights (e.g. 1.0) will produce a standard reservoir sample over the steam.
- Author:
- Jon Malkin, Kevin Lang
-
Method Summary
Modifier and TypeMethodDescriptionestimateSubsetSum
(Predicate<T> predicate) Computes an estimated subset sum from the entire stream for objects matching a given predicate.int
getK()
Returns the sketch's value of k, the maximum number of samples stored in the sketch.long
getN()
Returns the number of items processed from the input streamint
Returns the current number of items in the sketch, which may be smaller than the sketch capacity.Gets a result iterator object.static <T> VarOptItemsSketch<T>
heapify
(org.apache.datasketches.memory.Memory srcMem, ArrayOfItemsSerDe<T> serDe) Returns a sketch instance of this class from the given srcMem, which must be a Memory representation of this sketch class.static <T> VarOptItemsSketch<T>
newInstance
(int k) Construct a varopt sampling sketch with up to k samples using the default resize factor (8).static <T> VarOptItemsSketch<T>
newInstance
(int k, ResizeFactor rf) Construct a varopt sampling sketch with up to k samples using the specified resize factor.void
reset()
Resets this sketch to the empty state, but retains the original value of k.byte[]
toByteArray
(ArrayOfItemsSerDe<? super T> serDe) Returns a byte array representation of this sketch.byte[]
toByteArray
(ArrayOfItemsSerDe<? super T> serDe, Class<?> clazz) Returns a byte array representation of this sketch.toString()
Returns a human-readable summary of the sketch.static String
toString
(byte[] byteArr) Returns a human readable string of the preamble of a byte array image of a VarOptItemsSketch.static String
toString
(org.apache.datasketches.memory.Memory mem) Returns a human readable string of the preamble of a Memory image of a VarOptItemsSketch.void
Randomly decide whether or not to include an item in the sample set.
-
Method Details
-
newInstance
Construct a varopt sampling sketch with up to k samples using the default resize factor (8).- Type Parameters:
T
- The type of object held in the sketch.- Parameters:
k
- Maximum size of sampling. Allocated size may be smaller until sketch fills. Unlike many sketches in this package, this value does not need to be a power of 2.- Returns:
- A VarOptItemsSketch initialized with maximum size k and resize factor rf.
-
newInstance
Construct a varopt sampling sketch with up to k samples using the specified resize factor.- Type Parameters:
T
- The type of object held in the sketch.- Parameters:
k
- Maximum size of sampling. Allocated size may be smaller until sketch fills. Unlike many sketches in this package, this value does not need to be a power of 2. The maximum size is Integer.MAX_VALUE-1.rf
- See Resize Factor- Returns:
- A VarOptItemsSketch initialized with maximum size k and resize factor rf.
-
heapify
public static <T> VarOptItemsSketch<T> heapify(org.apache.datasketches.memory.Memory srcMem, ArrayOfItemsSerDe<T> serDe) Returns a sketch instance of this class from the given srcMem, which must be a Memory representation of this sketch class.- Type Parameters:
T
- The type of item this sketch contains- Parameters:
srcMem
- a Memory representation of a sketch of this class. See MemoryserDe
- An instance of ArrayOfItemsSerDe- Returns:
- a sketch instance of this class
-
getK
public int getK()Returns the sketch's value of k, the maximum number of samples stored in the sketch. The current number of items in the sketch may be lower.- Returns:
- k, the maximum number of samples in the sketch
-
getN
public long getN()Returns the number of items processed from the input stream- Returns:
- n, the number of stream items the sketch has seen
-
getNumSamples
public int getNumSamples()Returns the current number of items in the sketch, which may be smaller than the sketch capacity.- Returns:
- the number of items currently in the sketch
-
getSketchSamples
Gets a result iterator object.- Returns:
- An object with an iterator over the results
-
update
Randomly decide whether or not to include an item in the sample set.- Parameters:
item
- an item of the set being sampled fromweight
- a strictly positive weight associated with the item
-
reset
public void reset()Resets this sketch to the empty state, but retains the original value of k. -
toString
Returns a human-readable summary of the sketch. -
toString
Returns a human readable string of the preamble of a byte array image of a VarOptItemsSketch.- Parameters:
byteArr
- the given byte array- Returns:
- a human readable string of the preamble of a byte array image of a VarOptItemsSketch.
-
toString
Returns a human readable string of the preamble of a Memory image of a VarOptItemsSketch.- Parameters:
mem
- the given Memory- Returns:
- a human readable string of the preamble of a Memory image of a VarOptItemsSketch.
-
toByteArray
Returns a byte array representation of this sketch. May fail for polymorphic item types.- Parameters:
serDe
- An instance of ArrayOfItemsSerDe- Returns:
- a byte array representation of this sketch
-
toByteArray
Returns a byte array representation of this sketch. Copies contents into an array of the specified class for serialization to allow for polymorphic types.- Parameters:
serDe
- An instance of ArrayOfItemsSerDeclazz
- The class represented by <T>- Returns:
- a byte array representation of this sketch
-
estimateSubsetSum
Computes an estimated subset sum from the entire stream for objects matching a given predicate. Provides a lower bound, estimate, and upper bound using a target of 2 standard deviations.This is technically a heuristic method, and tries to err on the conservative side.
- Parameters:
predicate
- A predicate to use when identifying items.- Returns:
- A summary object containing the estimate, upper and lower bounds, and the total sketch weight.
-