Class Sketch<S extends Summary>

  • Type Parameters:
    S - Type of Summary
    Direct Known Subclasses:
    CompactSketch, UpdatableSketch

    public abstract class Sketch<S extends Summary>
    extends Object
    This is an equivalent to org.apache.datasketches.theta.Sketch with addition of a user-defined Summary object associated with every unique entry in the sketch.
    • Method Detail

      • compact

        public abstract CompactSketch<S> compact()
        Converts this sketch to a CompactSketch on the Java heap.

        If this sketch is already in compact form this operation returns this.

        Returns:
        this sketch as a CompactSketch on the Java heap.
      • getEstimate

        public double getEstimate()
        Estimates the cardinality of the set (number of unique values presented to the sketch)
        Returns:
        best estimate of the number of unique values
      • getUpperBound

        public double getUpperBound​(int numStdDev)
        Gets the approximate upper error bound given the specified number of Standard Deviations. This will return getEstimate() if isEmpty() is true.
        Parameters:
        numStdDev - See Number of Standard Deviations
        Returns:
        the upper bound.
      • getLowerBound

        public double getLowerBound​(int numStdDev)
        Gets the approximate lower error bound given the specified number of Standard Deviations. This will return getEstimate() if isEmpty() is true.
        Parameters:
        numStdDev - See Number of Standard Deviations
        Returns:
        the lower bound.
      • getEstimate

        public double getEstimate​(int numSubsetEntries)
        Gets the estimate of the true distinct population of subset tuples represented by the count of entries in a subset of the total retained entries of the sketch.
        Parameters:
        numSubsetEntries - number of entries for a chosen subset of the sketch.
        Returns:
        the estimate of the true distinct population of subset tuples represented by the count of entries in a subset of the total retained entries of the sketch.
      • getLowerBound

        public double getLowerBound​(int numStdDev,
                                    int numSubsetEntries)
        Gets the estimate of the lower bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.
        Parameters:
        numStdDev - See Number of Standard Deviations
        numSubsetEntries - number of entries for a chosen subset of the sketch.
        Returns:
        the estimate of the lower bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.
      • getUpperBound

        public double getUpperBound​(int numStdDev,
                                    int numSubsetEntries)
        Gets the estimate of the upper bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.
        Parameters:
        numStdDev - See Number of Standard Deviations
        numSubsetEntries - number of entries for a chosen subset of the sketch.
        Returns:
        the estimate of the upper bound of the true distinct population represented by the count of entries in a subset of the total retained entries of the sketch.
      • isEmpty

        public boolean isEmpty()
        Returns:
        true if empty.
      • isEstimationMode

        public boolean isEstimationMode()
        Returns true if the sketch is Estimation Mode (as opposed to Exact Mode). This is true if theta < 1.0 AND isEmpty() is false.
        Returns:
        true if the sketch is in estimation mode.
      • getRetainedEntries

        public abstract int getRetainedEntries()
        Returns:
        number of retained entries
      • getCountLessThanThetaLong

        public abstract int getCountLessThanThetaLong​(long thetaLong)
        Gets the number of hash values less than the given theta expressed as a long.
        Parameters:
        thetaLong - the given theta as a long between zero and Long.MAX_VALUE.
        Returns:
        the number of hash values less than the given thetaLong.
      • getSummaryFactory

        public SummaryFactory<S> getSummaryFactory()
        Gets the Summary Factory class of type S
        Returns:
        the Summary Factory class of type S
      • getTheta

        public double getTheta()
        Gets the value of theta as a double between zero and one
        Returns:
        the value of theta as a double
      • toByteArray

        public abstract byte[] toByteArray()
        This is to serialize a sketch instance to a byte array.

        As of 3.0.0, serializing an UpdatableSketch is deprecated. This capability will be removed in a future release. Serializing a CompactSketch is not deprecated.

        Returns:
        serialized representation of the sketch
      • iterator

        public abstract TupleSketchIterator<S> iterator()
        Returns a SketchIterator
        Returns:
        a SketchIterator
      • getThetaLong

        public long getThetaLong()
        Returns Theta as a long
        Returns:
        Theta as a long