Class ReservoirItemsUnion<T>
- java.lang.Object
-
- org.apache.datasketches.sampling.ReservoirItemsUnion<T>
-
- Type Parameters:
T
- The specific Java type for this sketch
public final class ReservoirItemsUnion<T> extends Object
Class to union reservoir samples of generic items.For efficiency reasons, the unioning process picks one of the two sketches to use as the base. As a result, we provide only a stateful union. Using the same approach for a merge would result in unpredictable side effects on the underlying sketches.
A union object is created with a maximum value of
k
, represented using the ReservoirSize class. The unioning process may cause the actual number of samples to fall below that maximum value, but never to exceed it. The result of a union will be a reservoir where each item from the global input has a uniform probability of selection, but there are no claims about higher order statistics. For instance, in general all possible permutations of the global input are not equally likely.If taking the union of two reservoirs of different sizes, the output sample will contain no more than MIN(k_1, k_2) samples.
- Author:
- Jon Malkin, Kevin Lang
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description int
getMaxK()
Returns the maximum allowed reservoir capacity in this union.ReservoirItemsSketch<T>
getResult()
Returns a sketch representing the current state of the union.static <T> ReservoirItemsUnion<T>
heapify(org.apache.datasketches.memory.Memory srcMem, ArrayOfItemsSerDe<T> serDe)
Instantiates a Union from Memorystatic <T> ReservoirItemsUnion<T>
newInstance(int maxK)
Creates an empty Union with a maximum reservoir capacity of size k.byte[]
toByteArray(ArrayOfItemsSerDe<T> serDe)
Returns a byte array representation of this unionbyte[]
toByteArray(ArrayOfItemsSerDe<T> serDe, Class<?> clazz)
Returns a byte array representation of this union.String
toString()
Returns a human-readable summary of the sketch, without items.void
update(long n, int k, ArrayList<T> input)
Present this union with raw elements of a sketch.void
update(org.apache.datasketches.memory.Memory mem, ArrayOfItemsSerDe<T> serDe)
Union the given Memory image of the sketch.void
update(ReservoirItemsSketch<T> sketchIn)
Union the given sketch.void
update(T datum)
Present this union with a single item to be added to the union.
-
-
-
Method Detail
-
newInstance
public static <T> ReservoirItemsUnion<T> newInstance(int maxK)
Creates an empty Union with a maximum reservoir capacity of size k.- Type Parameters:
T
- The type of item this sketch contains- Parameters:
maxK
- The maximum allowed reservoir capacity for any sketches in the union- Returns:
- A new ReservoirItemsUnion
-
heapify
public static <T> ReservoirItemsUnion<T> heapify(org.apache.datasketches.memory.Memory srcMem, ArrayOfItemsSerDe<T> serDe)
Instantiates a Union from Memory- Type Parameters:
T
- The type of item this sketch contains- Parameters:
srcMem
- Memory object containing a serialized unionserDe
- An instance of ArrayOfItemsSerDe- Returns:
- A ReservoirItemsUnion created from the provided Memory
-
getMaxK
public int getMaxK()
Returns the maximum allowed reservoir capacity in this union. The current reservoir capacity may be lower.- Returns:
- The maximum allowed reservoir capacity in this union.
-
update
public void update(ReservoirItemsSketch<T> sketchIn)
Union the given sketch. This method can be repeatedly called. If the given sketch is null it is interpreted as an empty sketch.- Parameters:
sketchIn
- The incoming sketch.
-
update
public void update(org.apache.datasketches.memory.Memory mem, ArrayOfItemsSerDe<T> serDe)
Union the given Memory image of the sketch.This method can be repeatedly called. If the given sketch is null it is interpreted as an empty sketch.
- Parameters:
mem
- Memory image of sketch to be mergedserDe
- An instance of ArrayOfItemsSerDe
-
update
public void update(T datum)
Present this union with a single item to be added to the union.- Parameters:
datum
- The given datum of type T.
-
update
public void update(long n, int k, ArrayList<T> input)
Present this union with raw elements of a sketch. Useful when operating in a distributed environment like Pig Latin scripts, where an explicit SerDe may be overly complicated but keeping raw values is simple. Values are not copied and the input array may be modified.- Parameters:
n
- Total items seenk
- Reservoir sizeinput
- Reservoir samples
-
getResult
public ReservoirItemsSketch<T> getResult()
Returns a sketch representing the current state of the union.- Returns:
- The result of any unions already processed.
-
toByteArray
public byte[] toByteArray(ArrayOfItemsSerDe<T> serDe)
Returns a byte array representation of this union- Parameters:
serDe
- An instance of ArrayOfItemsSerDe- Returns:
- a byte array representation of this union
-
toString
public String toString()
Returns a human-readable summary of the sketch, without items.
-
toByteArray
public byte[] toByteArray(ArrayOfItemsSerDe<T> serDe, Class<?> clazz)
Returns a byte array representation of this union. This method should be used when the array elements are subclasses of a common base class.- Parameters:
serDe
- An instance of ArrayOfItemsSerDeclazz
- A class to which the items are cast before serialization- Returns:
- a byte array representation of this union
-
-