Class TupleSketch<S extends Summary>

java.lang.Object
org.apache.datasketches.tuple.TupleSketch<S>
Type Parameters:
S - Type of Summary
Direct Known Subclasses:
CompactTupleSketch, UpdatableTupleSketch

public abstract class TupleSketch<S extends Summary> extends Object
The top-level class for all Tuple sketches. This class is never constructed directly. Use the UpdatableTupleSketchBuilder() methods to create UpdatableTupleSketches. This is similar to ThetaSketch with addition of a user-defined Summary object associated with every unique entry in the sketch.
  • Method Details

    • compact

      public abstract CompactTupleSketch<S> compact()
      Converts this TupleSketch to a CompactTupleSketch on the Java heap.

      If this sketch is already in compact form this operation returns this.

      Returns:
      this sketch as a CompactTupleSketch on the Java heap.
    • getEstimate

      public double getEstimate()
      Estimates the cardinality of the set (number of unique values presented to the sketch)
      Returns:
      best estimate of the number of unique values
    • getUpperBound

      public double getUpperBound(int numStdDev)
      Gets the approximate upper error bound given the specified number of Standard Deviations. This will return getEstimate() if isEmpty() is true.
      Parameters:
      numStdDev - See Number of Standard Deviations
      Returns:
      the upper bound.
    • getLowerBound

      public double getLowerBound(int numStdDev)
      Gets the approximate lower error bound given the specified number of Standard Deviations. This will return getEstimate() if isEmpty() is true.
      Parameters:
      numStdDev - See Number of Standard Deviations
      Returns:
      the lower bound.
    • getEstimate

      public double getEstimate(int numSubsetEntries)
      Gets the estimate of the true distinct population of subset tuples represented by the count of entries in a subset of the total retained entries of the sketch.
      Parameters:
      numSubsetEntries - number of entries for a chosen subset of the sketch.
      Returns:
      the estimate of the true distinct population of subset tuples represented by the count of entries in a subset of the total retained entries of the sketch.
    • getLowerBound

      public double getLowerBound(int numStdDev, int numSubsetEntries)
      Gets the estimate of the lower bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.
      Parameters:
      numStdDev - See Number of Standard Deviations
      numSubsetEntries - number of entries for a chosen subset of the sketch.
      Returns:
      the estimate of the lower bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.
    • getUpperBound

      public double getUpperBound(int numStdDev, int numSubsetEntries)
      Gets the estimate of the upper bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.
      Parameters:
      numStdDev - See Number of Standard Deviations
      numSubsetEntries - number of entries for a chosen subset of the sketch.
      Returns:
      the estimate of the upper bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.
    • isEmpty

      public boolean isEmpty()
      Returns:
      true if empty.
    • isEstimationMode

      public boolean isEstimationMode()
      Returns true if the sketch is Estimation Mode (as opposed to Exact Mode). This is true if theta < 1.0 AND isEmpty() is false.
      Returns:
      true if the sketch is in estimation mode.
    • getRetainedEntries

      public abstract int getRetainedEntries()
      Returns number of retained entries
      Returns:
      number of retained entries
    • getCountLessThanThetaLong

      public abstract int getCountLessThanThetaLong(long thetaLong)
      Gets the number of hash values less than the given theta expressed as a long.
      Parameters:
      thetaLong - the given theta as a long in the range (zero, Long.MAX_VALUE].
      Returns:
      the number of hash values less than the given thetaLong.
    • getSummaryFactory

      public SummaryFactory<S> getSummaryFactory()
      Gets the Summary Factory class of type S
      Returns:
      the Summary Factory class of type S
    • getTheta

      public double getTheta()
      Gets the value of theta as a double between zero and one
      Returns:
      the value of theta as a double
    • toByteArray

      public abstract byte[] toByteArray()
      Serialize this sketch to a byte array.

      As of 3.0.0, serializing an UpdatableTupleSketch is deprecated. This capability will be removed in a future release. Serializing a CompactTupleSketch is not deprecated.

      Returns:
      serialized representation of this sketch.
    • iterator

      public abstract TupleSketchIterator<S> iterator()
      Returns a SketchIterator
      Returns:
      a SketchIterator
    • getThetaLong

      public long getThetaLong()
      Returns Theta as a long
      Returns:
      Theta as a long
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • heapifyUpdatableSketch

      public static <U, S extends UpdatableSummary<U>> UpdatableTupleSketch<U,S> heapifyUpdatableSketch(MemorySegment seg, SummaryDeserializer<S> deserializer, SummaryFactory<S> summaryFactory)
      Instantiate an UpdatableTupleSketch from a given MemorySegment on the heap,
      Type Parameters:
      U - Type of update value
      S - Type of Summary
      Parameters:
      seg - MemorySegment object representing an UpdatableTupleSketch
      deserializer - instance of SummaryDeserializer
      summaryFactory - instance of SummaryFactory
      Returns:
      UpdatableTupleSketch created from its MemorySegment representation
    • heapifySketch

      public static <S extends Summary> TupleSketch<S> heapifySketch(MemorySegment seg, SummaryDeserializer<S> deserializer)
      Instantiate a TupleSketch from a given MemorySegment.
      Type Parameters:
      S - Type of Summary
      Parameters:
      seg - MemorySegment object representing a TupleSketch
      deserializer - instance of SummaryDeserializer
      Returns:
      TupleSketch created from its MemorySegment representation
    • createEmptySketch

      public static <S extends Summary> TupleSketch<S> createEmptySketch()
      Creates an empty CompactTupleSketch.
      Type Parameters:
      S - Type of Summary
      Returns:
      an empty instance of a CompactTupleSketch