Class ReservoirLongsSketch


  • public final class ReservoirLongsSketch
    extends Object
    This sketch provides a reservoir sample over an input stream of longs. The sketch contains a uniform random sample of items from the stream.
    Author:
    Jon Malkin, Kevin Lang
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      SampleSubsetSummary estimateSubsetSum​(Predicate<Long> predicate)
      Computes an estimated subset sum from the entire stream for objects matching a given predicate.
      int getK()
      Returns the sketch's value of k, the maximum number of samples stored in the reservoir.
      long getN()
      Returns the number of items processed from the input stream
      int getNumSamples()
      Returns the current number of items in the reservoir, which may be smaller than the reservoir capacity.
      long[] getSamples()
      Returns a copy of the items in the reservoir.
      static ReservoirLongsSketch heapify​(org.apache.datasketches.memory.Memory srcMem)
      Returns a sketch instance of this class from the given srcMem, which must be a Memory representation of this sketch class.
      static ReservoirLongsSketch newInstance​(int k)
      Construct a mergeable reservoir sampling sketch with up to k samples using the default resize factor (8).
      static ReservoirLongsSketch newInstance​(int k, ResizeFactor rf)
      Construct a mergeable reservoir sampling sketch with up to k samples using the default resize factor (8).
      void reset()
      Resets this sketch to the empty state, but retains the original value of k.
      byte[] toByteArray()
      Returns a byte array representation of this sketch
      String toString()
      Returns a human-readable summary of the sketch, without items.
      static String toString​(byte[] byteArr)
      Returns a human readable string of the preamble of a byte array image of a ReservoirLongsSketch.
      static String toString​(org.apache.datasketches.memory.Memory mem)
      Returns a human readable string of the preamble of a Memory image of a ReservoirLongsSketch.
      void update​(long item)
      Randomly decide whether or not to include an item in the sample set.
    • Method Detail

      • newInstance

        public static ReservoirLongsSketch newInstance​(int k)
        Construct a mergeable reservoir sampling sketch with up to k samples using the default resize factor (8).
        Parameters:
        k - Maximum size of sampling. Allocated size may be smaller until sampling fills. Unlike many sketches in this package, this value does not need to be a power of 2.
        Returns:
        A ReservoirLongsSketch initialized with maximum size k and the default resize factor.
      • newInstance

        public static ReservoirLongsSketch newInstance​(int k,
                                                       ResizeFactor rf)
        Construct a mergeable reservoir sampling sketch with up to k samples using the default resize factor (8).
        Parameters:
        k - Maximum size of sampling. Allocated size may be smaller until sampling fills. Unlike many sketches in this package, this value does not need to be a power of 2.
        rf - See Resize Factor
        Returns:
        A ReservoirLongsSketch initialized with maximum size k and ResizeFactor rf.
      • heapify

        public static ReservoirLongsSketch heapify​(org.apache.datasketches.memory.Memory srcMem)
        Returns a sketch instance of this class from the given srcMem, which must be a Memory representation of this sketch class.
        Parameters:
        srcMem - a Memory representation of a sketch of this class. See Memory
        Returns:
        a sketch instance of this class
      • getK

        public int getK()
        Returns the sketch's value of k, the maximum number of samples stored in the reservoir. The current number of items in the sketch may be lower.
        Returns:
        k, the maximum number of samples in the reservoir
      • getN

        public long getN()
        Returns the number of items processed from the input stream
        Returns:
        n, the number of stream items the sketch has seen
      • getNumSamples

        public int getNumSamples()
        Returns the current number of items in the reservoir, which may be smaller than the reservoir capacity.
        Returns:
        the number of items currently in the reservoir
      • getSamples

        public long[] getSamples()
        Returns a copy of the items in the reservoir. The returned array length may be smaller than the reservoir capacity.
        Returns:
        A copy of the reservoir array
      • update

        public void update​(long item)
        Randomly decide whether or not to include an item in the sample set.
        Parameters:
        item - a unit-weight (equivalently, unweighted) item of the set being sampled from
      • reset

        public void reset()
        Resets this sketch to the empty state, but retains the original value of k.
      • toString

        public String toString()
        Returns a human-readable summary of the sketch, without items.
        Overrides:
        toString in class Object
        Returns:
        A string version of the sketch summary
      • toString

        public static String toString​(byte[] byteArr)
        Returns a human readable string of the preamble of a byte array image of a ReservoirLongsSketch.
        Parameters:
        byteArr - the given byte array
        Returns:
        a human readable string of the preamble of a byte array image of a ReservoirLongsSketch.
      • toString

        public static String toString​(org.apache.datasketches.memory.Memory mem)
        Returns a human readable string of the preamble of a Memory image of a ReservoirLongsSketch.
        Parameters:
        mem - the given Memory
        Returns:
        a human readable string of the preamble of a Memory image of a ReservoirLongsSketch.
      • toByteArray

        public byte[] toByteArray()
        Returns a byte array representation of this sketch
        Returns:
        a byte array representation of this sketch
      • estimateSubsetSum

        public SampleSubsetSummary estimateSubsetSum​(Predicate<Long> predicate)
        Computes an estimated subset sum from the entire stream for objects matching a given predicate. Provides a lower bound, estimate, and upper bound using a target of 2 standard deviations.

        This is technically a heuristic method, and tries to err on the conservative side.

        Parameters:
        predicate - A predicate to use when identifying items.
        Returns:
        A summary object containing the estimate, upper and lower bounds, and the total sketch weight.