Class Sketch<S extends Summary>

java.lang.Object
org.apache.datasketches.tuple.Sketch<S>
Type Parameters:
S - Type of Summary
Direct Known Subclasses:
CompactSketch, UpdatableSketch

public abstract class Sketch<S extends Summary> extends Object
This is an equivalent to org.apache.datasketches.theta.Sketch with addition of a user-defined Summary object associated with every unique entry in the sketch.
  • Method Summary

    Modifier and Type
    Method
    Description
    abstract CompactSketch<S>
    Converts this sketch to a CompactSketch on the Java heap.
    abstract int
    getCountLessThanThetaLong(long thetaLong)
    Gets the number of hash values less than the given theta expressed as a long.
    double
    Estimates the cardinality of the set (number of unique values presented to the sketch)
    double
    getEstimate(int numSubsetEntries)
    Gets the estimate of the true distinct population of subset tuples represented by the count of entries in a subset of the total retained entries of the sketch.
    double
    getLowerBound(int numStdDev)
    Gets the approximate lower error bound given the specified number of Standard Deviations.
    double
    getLowerBound(int numStdDev, int numSubsetEntries)
    Gets the estimate of the lower bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.
    abstract int
     
    Gets the Summary Factory class of type S
    double
    Gets the value of theta as a double between zero and one
    long
    Returns Theta as a long
    double
    getUpperBound(int numStdDev)
    Gets the approximate upper error bound given the specified number of Standard Deviations.
    double
    getUpperBound(int numStdDev, int numSubsetEntries)
    Gets the estimate of the upper bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.
    boolean
    boolean
    Returns true if the sketch is Estimation Mode (as opposed to Exact Mode).
    Returns a SketchIterator
    abstract byte[]
    This is to serialize a sketch instance to a byte array.
     

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Method Details

    • compact

      public abstract CompactSketch<S> compact()
      Converts this sketch to a CompactSketch on the Java heap.

      If this sketch is already in compact form this operation returns this.

      Returns:
      this sketch as a CompactSketch on the Java heap.
    • getEstimate

      public double getEstimate()
      Estimates the cardinality of the set (number of unique values presented to the sketch)
      Returns:
      best estimate of the number of unique values
    • getUpperBound

      public double getUpperBound(int numStdDev)
      Gets the approximate upper error bound given the specified number of Standard Deviations. This will return getEstimate() if isEmpty() is true.
      Parameters:
      numStdDev - See Number of Standard Deviations
      Returns:
      the upper bound.
    • getLowerBound

      public double getLowerBound(int numStdDev)
      Gets the approximate lower error bound given the specified number of Standard Deviations. This will return getEstimate() if isEmpty() is true.
      Parameters:
      numStdDev - See Number of Standard Deviations
      Returns:
      the lower bound.
    • getEstimate

      public double getEstimate(int numSubsetEntries)
      Gets the estimate of the true distinct population of subset tuples represented by the count of entries in a subset of the total retained entries of the sketch.
      Parameters:
      numSubsetEntries - number of entries for a chosen subset of the sketch.
      Returns:
      the estimate of the true distinct population of subset tuples represented by the count of entries in a subset of the total retained entries of the sketch.
    • getLowerBound

      public double getLowerBound(int numStdDev, int numSubsetEntries)
      Gets the estimate of the lower bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.
      Parameters:
      numStdDev - See Number of Standard Deviations
      numSubsetEntries - number of entries for a chosen subset of the sketch.
      Returns:
      the estimate of the lower bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.
    • getUpperBound

      public double getUpperBound(int numStdDev, int numSubsetEntries)
      Gets the estimate of the upper bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.
      Parameters:
      numStdDev - See Number of Standard Deviations
      numSubsetEntries - number of entries for a chosen subset of the sketch.
      Returns:
      the estimate of the upper bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.
    • isEmpty

      public boolean isEmpty()
      Returns:
      true if empty.
    • isEstimationMode

      public boolean isEstimationMode()
      Returns true if the sketch is Estimation Mode (as opposed to Exact Mode). This is true if theta < 1.0 AND isEmpty() is false.
      Returns:
      true if the sketch is in estimation mode.
    • getRetainedEntries

      public abstract int getRetainedEntries()
      Returns:
      number of retained entries
    • getCountLessThanThetaLong

      public abstract int getCountLessThanThetaLong(long thetaLong)
      Gets the number of hash values less than the given theta expressed as a long.
      Parameters:
      thetaLong - the given theta as a long between zero and Long.MAX_VALUE.
      Returns:
      the number of hash values less than the given thetaLong.
    • getSummaryFactory

      public SummaryFactory<S> getSummaryFactory()
      Gets the Summary Factory class of type S
      Returns:
      the Summary Factory class of type S
    • getTheta

      public double getTheta()
      Gets the value of theta as a double between zero and one
      Returns:
      the value of theta as a double
    • toByteArray

      public abstract byte[] toByteArray()
      This is to serialize a sketch instance to a byte array.

      As of 3.0.0, serializing an UpdatableSketch is deprecated. This capability will be removed in a future release. Serializing a CompactSketch is not deprecated.

      Returns:
      serialized representation of the sketch
    • iterator

      public abstract TupleSketchIterator<S> iterator()
      Returns a SketchIterator
      Returns:
      a SketchIterator
    • getThetaLong

      public long getThetaLong()
      Returns Theta as a long
      Returns:
      Theta as a long
    • toString

      public String toString()
      Overrides:
      toString in class Object