Class ReservoirItemsUnion<T>
- Type Parameters:
T
- The specific Java type for this sketch
For efficiency reasons, the unioning process picks one of the two sketches to use as the base. As a result, we provide only a stateful union. Using the same approach for a merge would result in unpredictable side effects on the underlying sketches.
A union object is created with a maximum value of k
, represented using the
ReservoirSize class. The unioning process may cause the actual number of samples to fall below
that maximum value, but never to exceed it. The result of a union will be a reservoir where
each item from the global input has a uniform probability of selection, but there are no
claims about higher order statistics. For instance, in general all possible permutations of
the global input are not equally likely.
If taking the union of two reservoirs of different sizes, the output sample will contain no more than MIN(k_1, k_2) samples.
- Author:
- Jon Malkin, Kevin Lang
-
Method Summary
Modifier and TypeMethodDescriptionint
getMaxK()
Returns the maximum allowed reservoir capacity in this union.Returns a sketch representing the current state of the union.static <T> ReservoirItemsUnion<T>
heapify
(org.apache.datasketches.memory.Memory srcMem, ArrayOfItemsSerDe<T> serDe) Instantiates a Union from Memorystatic <T> ReservoirItemsUnion<T>
newInstance
(int maxK) Creates an empty Union with a maximum reservoir capacity of size k.byte[]
toByteArray
(ArrayOfItemsSerDe<T> serDe) Returns a byte array representation of this unionbyte[]
toByteArray
(ArrayOfItemsSerDe<T> serDe, Class<?> clazz) Returns a byte array representation of this union.toString()
Returns a human-readable summary of the sketch, without items.void
Present this union with raw elements of a sketch.void
update
(org.apache.datasketches.memory.Memory mem, ArrayOfItemsSerDe<T> serDe) Union the given Memory image of the sketch.void
update
(ReservoirItemsSketch<T> sketchIn) Union the given sketch.void
Present this union with a single item to be added to the union.
-
Method Details
-
newInstance
Creates an empty Union with a maximum reservoir capacity of size k.- Type Parameters:
T
- The type of item this sketch contains- Parameters:
maxK
- The maximum allowed reservoir capacity for any sketches in the union- Returns:
- A new ReservoirItemsUnion
-
heapify
public static <T> ReservoirItemsUnion<T> heapify(org.apache.datasketches.memory.Memory srcMem, ArrayOfItemsSerDe<T> serDe) Instantiates a Union from Memory- Type Parameters:
T
- The type of item this sketch contains- Parameters:
srcMem
- Memory object containing a serialized unionserDe
- An instance of ArrayOfItemsSerDe- Returns:
- A ReservoirItemsUnion created from the provided Memory
-
getMaxK
public int getMaxK()Returns the maximum allowed reservoir capacity in this union. The current reservoir capacity may be lower.- Returns:
- The maximum allowed reservoir capacity in this union.
-
update
Union the given sketch. This method can be repeatedly called. If the given sketch is null it is interpreted as an empty sketch.- Parameters:
sketchIn
- The incoming sketch.
-
update
Union the given Memory image of the sketch.This method can be repeatedly called. If the given sketch is null it is interpreted as an empty sketch.
- Parameters:
mem
- Memory image of sketch to be mergedserDe
- An instance of ArrayOfItemsSerDe
-
update
Present this union with a single item to be added to the union.- Parameters:
datum
- The given datum of type T.
-
update
Present this union with raw elements of a sketch. Useful when operating in a distributed environment like Pig Latin scripts, where an explicit SerDe may be overly complicated but keeping raw values is simple. Values are not copied and the input array may be modified.- Parameters:
n
- Total items seenk
- Reservoir sizeinput
- Reservoir samples
-
getResult
Returns a sketch representing the current state of the union.- Returns:
- The result of any unions already processed.
-
toByteArray
Returns a byte array representation of this union- Parameters:
serDe
- An instance of ArrayOfItemsSerDe- Returns:
- a byte array representation of this union
-
toString
Returns a human-readable summary of the sketch, without items. -
toByteArray
Returns a byte array representation of this union. This method should be used when the array elements are subclasses of a common base class.- Parameters:
serDe
- An instance of ArrayOfItemsSerDeclazz
- A class to which the items are cast before serialization- Returns:
- a byte array representation of this union
-