Class EbppsItemsSketch<T>

java.lang.Object
org.apache.datasketches.sampling.EbppsItemsSketch<T>
Type Parameters:
T - the item class type

public final class EbppsItemsSketch<T> extends Object
An implementation of an Exact and Bounded Sampling Proportional to Size sketch.

From: "Exact PPS Sampling with Bounded Sample Size", B. Hentschel, P. J. Haas, Y. Tian. Information Processing Letters, 2023.

This sketch samples data from a stream of items proportional to the weight of each item. The sample guarantees the presence of an item in the result is proportional to that item's portion of the total weight seen by the sketch, and returns a sample no larger than size k.

The sample may be smaller than k and the resulting size of the sample potentially includes a probabilistic component, meaning the resulting sample size is not always constant.

Author:
Jon Malkin
  • Constructor Summary

    Constructors
    Constructor
    Description
    Constructor
  • Method Summary

    Modifier and Type
    Method
    Description
    double
    Returns the expected number of samples returned upon a call to getResult().
    double
    Returns the cumulative weight of items processed by the sketch.
    int
    Returns the configured maximum sample size.
    long
    Returns the number of items processed by the sketch, regardless of item weight.
    Returns a copy of the current sample.
    int
    Returns the size of a byte array representation of this sketch.
    int
    getSerializedSizeBytes(ArrayOfItemsSerDe<? super T> serDe, Class<?> clazz)
    Returns the length of a byte array representation of this sketch.
    static <T> EbppsItemsSketch<T>
    heapify(org.apache.datasketches.memory.Memory srcMem, ArrayOfItemsSerDe<T> serDe)
    Returns a sketch instance of this class from the given srcMem, which must be a Memory representation of this sketch class.
    boolean
    Returns true if the sketch is empty.
    void
    Merges the provided sketch into the current one.
    void
    Resets the sketch to its default, empty state.
    byte[]
    Returns a byte array representation of this sketch.
    byte[]
    toByteArray(ArrayOfItemsSerDe<? super T> serDe, Class<?> clazz)
    Returns a byte array representation of this sketch.
    Provides a human-readable summary of the sketch
    void
    update(T item)
    Updates this sketch with the given data item with weight 1.0.
    void
    update(T item, double weight)
    Updates this sketch with the given data item with the given weight.

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Constructor Details

    • EbppsItemsSketch

      public EbppsItemsSketch(int k)
      Constructor
      Parameters:
      k - The maximum number of samples to retain
  • Method Details

    • heapify

      public static <T> EbppsItemsSketch<T> heapify(org.apache.datasketches.memory.Memory srcMem, ArrayOfItemsSerDe<T> serDe)
      Returns a sketch instance of this class from the given srcMem, which must be a Memory representation of this sketch class.
      Type Parameters:
      T - The type of item this sketch contains
      Parameters:
      srcMem - a Memory representation of a sketch of this class. See Memory
      serDe - An instance of ArrayOfItemsSerDe
      Returns:
      a sketch instance of this class
    • update

      public void update(T item)
      Updates this sketch with the given data item with weight 1.0.
      Parameters:
      item - an item from a stream of items
    • update

      public void update(T item, double weight)
      Updates this sketch with the given data item with the given weight.
      Parameters:
      item - an item from a stream of items
      weight - the weight of the item
    • merge

      public void merge(EbppsItemsSketch<T> other)
      Merges the provided sketch into the current one.
      Parameters:
      other - the sketch to merge into the current object
    • getResult

      public ArrayList<T> getResult()
      Returns a copy of the current sample. The exact size may be probabilistic, differing by at most 1 item.
      Returns:
      the current sketch sample
    • toString

      public String toString()
      Provides a human-readable summary of the sketch
      Overrides:
      toString in class Object
      Returns:
      a summary of information in the sketch
    • getK

      public int getK()
      Returns the configured maximum sample size.
      Returns:
      configured maximum sample size
    • getN

      public long getN()
      Returns the number of items processed by the sketch, regardless of item weight.
      Returns:
      count of items processed by the sketch
    • getCumulativeWeight

      public double getCumulativeWeight()
      Returns the cumulative weight of items processed by the sketch.
      Returns:
      cumulative weight of items seen
    • getC

      public double getC()
      Returns the expected number of samples returned upon a call to getResult(). The number is a floating point value, where the fractional portion represents the probability of including a "partial item" from the sample.

      The value C should be no larger than the sketch's configured value of k, although numerical precision limitations mean it may exceed k by double precision floating point error margins in certain cases.

      Returns:
      The expected number of samples returned when querying the sketch
    • isEmpty

      public boolean isEmpty()
      Returns true if the sketch is empty.
      Returns:
      empty flag
    • reset

      public void reset()
      Resets the sketch to its default, empty state.
    • getSerializedSizeBytes

      public int getSerializedSizeBytes(ArrayOfItemsSerDe<? super T> serDe)
      Returns the size of a byte array representation of this sketch. May fail for polymorphic item types.
      Parameters:
      serDe - An instance of ArrayOfItemsSerDe
      Returns:
      the length of a byte array representation of this sketch
    • getSerializedSizeBytes

      public int getSerializedSizeBytes(ArrayOfItemsSerDe<? super T> serDe, Class<?> clazz)
      Returns the length of a byte array representation of this sketch. Copies contents into an array of the specified class for serialization to allow for polymorphic types.
      Parameters:
      serDe - An instance of ArrayOfItemsSerDe
      clazz - The class represented by <T>
      Returns:
      the length of a byte array representation of this sketch
    • toByteArray

      public byte[] toByteArray(ArrayOfItemsSerDe<? super T> serDe)
      Returns a byte array representation of this sketch. May fail for polymorphic item types.
      Parameters:
      serDe - An instance of ArrayOfItemsSerDe
      Returns:
      a byte array representation of this sketch
    • toByteArray

      public byte[] toByteArray(ArrayOfItemsSerDe<? super T> serDe, Class<?> clazz)
      Returns a byte array representation of this sketch. Copies contents into an array of the specified class for serialization to allow for polymorphic types.
      Parameters:
      serDe - An instance of ArrayOfItemsSerDe
      clazz - The class represented by <T>
      Returns:
      a byte array representation of this sketch