Class EbppsItemsSketch<T>
- Type Parameters:
T- the item class type
From: "Exact PPS Sampling with Bounded Sample Size", B. Hentschel, P. J. Haas, Y. Tian. Information Processing Letters, 2023.
This sketch samples data from a stream of items proportional to the weight of each item. The sample guarantees the presence of an item in the result is proportional to that item's portion of the total weight seen by the sketch, and returns a sample no larger than size k.
The sample may be smaller than k and the resulting size of the sample potentially includes a probabilistic component, meaning the resulting sample size is not always constant.
- Author:
- Jon Malkin
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptiondoublegetC()Returns the expected number of samples returned upon a call to getResult().doubleReturns the cumulative weight of items processed by the sketch.intgetK()Returns the configured maximum sample size.longgetN()Returns the number of items processed by the sketch, regardless of item weight.Returns a copy of the current sample.intgetSerializedSizeBytes(ArrayOfItemsSerDe<? super T> serDe) Returns the size of a byte array representation of this sketch.intgetSerializedSizeBytes(ArrayOfItemsSerDe<? super T> serDe, Class<?> clazz) Returns the length of a byte array representation of this sketch.static <T> EbppsItemsSketch<T> heapify(MemorySegment srcSeg, ArrayOfItemsSerDe<T> serDe) Returns a sketch instance of this class from the given srcSeg, which must be a MemorySegment representation of this sketch class.booleanisEmpty()Returns true if the sketch is empty.voidmerge(EbppsItemsSketch<T> other) Merges the provided sketch into the current one.voidreset()Resets the sketch to its default, empty state.byte[]toByteArray(ArrayOfItemsSerDe<? super T> serDe) Returns a byte array representation of this sketch.byte[]toByteArray(ArrayOfItemsSerDe<? super T> serDe, Class<?> clazz) Returns a byte array representation of this sketch.toString()Provides a human-readable summary of the sketchvoidUpdates this sketch with the given data item with weight 1.0.voidUpdates this sketch with the given data item with the given weight.
-
Constructor Details
-
EbppsItemsSketch
public EbppsItemsSketch(int k) Constructor- Parameters:
k- The maximum number of samples to retain
-
-
Method Details
-
heapify
Returns a sketch instance of this class from the given srcSeg, which must be a MemorySegment representation of this sketch class.- Type Parameters:
T- The type of item this sketch contains- Parameters:
srcSeg- a MemorySegment representation of a sketch of this class.serDe- An instance of ArrayOfItemsSerDe- Returns:
- a sketch instance of this class
-
update
Updates this sketch with the given data item with weight 1.0.- Parameters:
item- an item from a stream of items
-
update
Updates this sketch with the given data item with the given weight.- Parameters:
item- an item from a stream of itemsweight- the weight of the item
-
merge
Merges the provided sketch into the current one.- Parameters:
other- the sketch to merge into the current object
-
getResult
-
toString
-
getK
public int getK()Returns the configured maximum sample size.- Returns:
- configured maximum sample size
-
getN
public long getN()Returns the number of items processed by the sketch, regardless of item weight.- Returns:
- count of items processed by the sketch
-
getCumulativeWeight
public double getCumulativeWeight()Returns the cumulative weight of items processed by the sketch.- Returns:
- cumulative weight of items seen
-
getC
public double getC()Returns the expected number of samples returned upon a call to getResult(). The number is a floating point value, where the fractional portion represents the probability of including a "partial item" from the sample.The value C should be no larger than the sketch's configured value of k, although numerical precision limitations mean it may exceed k by double precision floating point error margins in certain cases.
- Returns:
- The expected number of samples returned when querying the sketch
-
isEmpty
public boolean isEmpty()Returns true if the sketch is empty.- Returns:
- empty flag
-
reset
public void reset()Resets the sketch to its default, empty state. -
getSerializedSizeBytes
Returns the size of a byte array representation of this sketch. May fail for polymorphic item types.- Parameters:
serDe- An instance of ArrayOfItemsSerDe- Returns:
- the length of a byte array representation of this sketch
-
getSerializedSizeBytes
Returns the length of a byte array representation of this sketch. Copies contents into an array of the specified class for serialization to allow for polymorphic types.- Parameters:
serDe- An instance of ArrayOfItemsSerDeclazz- The class represented by <T>- Returns:
- the length of a byte array representation of this sketch
-
toByteArray
Returns a byte array representation of this sketch. May fail for polymorphic item types.- Parameters:
serDe- An instance of ArrayOfItemsSerDe- Returns:
- a byte array representation of this sketch
-
toByteArray
Returns a byte array representation of this sketch. Copies contents into an array of the specified class for serialization to allow for polymorphic types.- Parameters:
serDe- An instance of ArrayOfItemsSerDeclazz- The class represented by <T>- Returns:
- a byte array representation of this sketch
-