Class CompactThetaSketch

java.lang.Object
org.apache.datasketches.theta.ThetaSketch
org.apache.datasketches.theta.CompactThetaSketch
All Implemented Interfaces:
MemorySegmentStatus

public abstract class CompactThetaSketch extends ThetaSketch
The parent class of all the CompactThetaSketches. CompactThetaSketches are never created directly. They are created as a result of the compact() method of an UpdatableThetaSketch, a result of a getResult() of a ThetaSetOperation, or from a heapify method.

A CompactThetaSketch is the simplest form of a ThetaSketches. It consists of a compact list (i.e., no intervening spaces) of hash values, which may be ordered or not, a value for theta and a seed hash. A CompactThetaSketch is immutable (read-only), and the space required when stored is only the space required for the hash values and 8 to 24 bytes of preamble. An empty CompactThetaSketch consumes only 8 bytes.

Author:
Lee Rhodes
  • Constructor Details

    • CompactThetaSketch

      public CompactThetaSketch()
      No argument constructor.
  • Method Details

    • heapify

      public static CompactThetaSketch heapify(MemorySegment srcSeg)
      Heapify takes a CompactThetaSketch image in a MemorySegment and instantiates an on-heap CompactThetaSketch.

      The resulting sketch will not retain any link to the source MemorySegment and all of its data will be copied to the heap CompactThetaSketch.

      The DEFAULT_UPDATE_SEED is assumed.

      Parameters:
      srcSeg - an image of a CompactThetaSketch.
      Returns:
      a CompactThetaSketch on the heap.
    • heapify

      public static CompactThetaSketch heapify(MemorySegment srcSeg, long expectedSeed)
      Heapify takes a CompactThetaSketch image in a MemorySegment and instantiates an on-heap CompactThetaSketch.

      The resulting sketch will not retain any link to the source MemorySegment and all of its data will be copied to the heap CompactThetaSketch.

      This method checks if the given expectedSeed was used to create the source MemorySegment image.

      Parameters:
      srcSeg - an image of a CompactThetaSketch that was created using the given expectedSeed.
      expectedSeed - the seed used to validate the given MemorySegment image. See Update Hash Seed.
      Returns:
      a CompactThetaSketch on the heap.
    • wrap

      public static CompactThetaSketch wrap(MemorySegment srcSeg)
      Wrap takes the CompactThetaSketch image in given MemorySegment and refers to it directly. There is no data copying onto the java heap. The wrap operation enables fast read-only merging and access to all the public read-only API.

      Wrapping any subclass of this class that is empty or contains only a single item will result in heapified forms of empty and single item sketch respectively. This is actually faster and consumes less overall space.

      The DEFAULT_UPDATE_SEED is assumed.

      Parameters:
      srcSeg - an image of a CompactThetaSketch.
      Returns:
      a CompactThetaSketch backed by the given MemorySegment.
    • wrap

      public static CompactThetaSketch wrap(MemorySegment srcSeg, long expectedSeed)
      Wrap takes the sketch image in the given MemorySegment and refers to it directly. There is no data copying onto the java heap. The wrap operation enables fast read-only merging and access to all the public read-only API.

      Wrapping any subclass of this class that is empty or contains only a single item will result in heapified forms of empty and single item sketch respectively. This is actually faster and consumes less overall space.

      This method checks if the given expectedSeed was used to create the source MemorySegment image.

      Parameters:
      srcSeg - an image of a CompactThetaSketch that was created using the given expectedSeed.
      expectedSeed - the seed used to validate the given MemorySegment image. See Update Hash Seed.
      Returns:
      a CompactThetaketch backed by the given MemorySegment.
    • wrap

      public static CompactThetaSketch wrap(byte[] bytes)
      Wrap takes the sketch image in the given byte array and refers to it directly. There is no data copying onto the java heap. The wrap operation enables fast read-only merging and access to all the public read-only API.

      Only sketches that have been explicitly stored as direct sketches can be wrapped.

      Wrapping any subclass of this class that is empty or contains only a single item will result in heapified forms of empty and single item sketch respectively. This is actually faster and consumes less overall space.

      This method checks if the DEFAULT_UPDATE_SEED was used to create the source byte array image.

      Parameters:
      bytes - a byte array image of a CompactThetaSketch that was created using the DEFAULT_UPDATE_SEED.
      Returns:
      a CompactThetaSketch backed by the given byte array except as above.
    • wrap

      public static CompactThetaSketch wrap(byte[] bytes, long expectedSeed)
      Wrap takes the sketch image in the given byte array and refers to it directly. There is no data copying onto the java heap. The wrap operation enables fast read-only merging and access to all the public read-only API.

      Only sketches that have been explicitly stored as direct sketches can be wrapped.

      Wrapping any subclass of this class that is empty or contains only a single item will result in heapified forms of empty and single item sketch respectively. This is actually faster and consumes less overall space.

      This method checks if the given expectedSeed was used to create the source byte array image.

      Parameters:
      bytes - a byte array image of a CompactThetaSketch that was created using the given expectedSeed.
      expectedSeed - the seed used to validate the given byte array image. See Update Hash Seed.
      Returns:
      a CompactThetaSketch backed by the given byte array except as above.
    • compact

      public abstract CompactThetaSketch compact(boolean dstOrdered, MemorySegment dstSeg)
      Description copied from class: ThetaSketch
      Convert this sketch to a CompactThetaSketch.

      If this sketch is a type of UpdatableThetaSketch, the compacting process converts the hash table of the UpdatableThetaketch to a simple list of the valid hash values. Any hash values of zero or equal-to or greater than theta will be discarded. The number of valid values remaining in the CompactThetaSketch depends on a number of factors, but may be larger or smaller than Nominal Entries (or k). It will never exceed 2k. If it is critical to always limit the size to no more than k, then rebuild() should be called on the UpdatableThetaSketch prior to calling this method.

      A CompactThetaSketch is always immutable.

      A new CompactThetaSketch object is created:

      • if dstSeg!= null
      • if dstSeg == null and this.hasMemorySegment() == true
      • if dstSeg == null and this has more than 1 item and this.isOrdered() == false and dstOrdered == true.

      Otherwise, this operation returns this.

      Specified by:
      compact in class ThetaSketch
      Parameters:
      dstOrdered - assumed true if this sketch is empty or has only one value See Destination Ordered
      dstSeg - See Destination MemorySegment.
      Returns:
      this sketch as a CompactThetaSketch.
    • getCompactBytes

      public int getCompactBytes()
      Description copied from class: ThetaSketch
      Returns the number of storage bytes required for this ThetaSketch if its current state were compacted. It this sketch is already in the compact form this is equivalent to calling ThetaSketch.getCurrentBytes().
      Specified by:
      getCompactBytes in class ThetaSketch
      Returns:
      number of compact bytes
    • getFamily

      public Family getFamily()
      Description copied from class: ThetaSketch
      Returns the Family that this sketch belongs to
      Specified by:
      getFamily in class ThetaSketch
      Returns:
      the Family that this sketch belongs to
    • hasMemorySegment

      public boolean hasMemorySegment()
      Description copied from interface: MemorySegmentStatus
      Returns true if this object's internal data is backed by a MemorySegment, which may be on-heap or off-heap.
      Returns:
      true if this object's internal data is backed by a MemorySegment.
    • isCompact

      public boolean isCompact()
      Description copied from class: ThetaSketch
      Returns true if this sketch is in compact form.
      Specified by:
      isCompact in class ThetaSketch
      Returns:
      true if this sketch is in compact form.
    • isOffHeap

      public boolean isOffHeap()
      Description copied from interface: MemorySegmentStatus
      Returns true if this object's internal data is backed by an off-heap (direct or native)) MemorySegment.
      Returns:
      true if this object's internal data is backed by an off-heap (direct or native)) MemorySegment.
    • isSameResource

      public boolean isSameResource(MemorySegment that)
      Description copied from interface: MemorySegmentStatus
      Returns true if an internally referenced MemorySegment has the same backing resource as that, or equivalently, if their two memory regions overlap. This applies to both on-heap and off-heap MemorySegments.

      Note: If both segments are on-heap and not read-only, it can be determined if they were derived from the same backing memory (array). However, this is not always possible off-heap. Because of this asymmetry, this definition of "isSameResource" is confined to the existence of an overlap.

      Parameters:
      that - The given MemorySegment.
      Returns:
      true if an internally referenced MemorySegment has the same backing resource as that.
    • getEstimate

      public double getEstimate()
      Description copied from class: ThetaSketch
      Gets the unique count estimate.
      Specified by:
      getEstimate in class ThetaSketch
      Returns:
      the sketch's best estimate of the cardinality of the input stream.
    • toByteArrayCompressed

      public byte[] toByteArrayCompressed()
      gets the sketch as a compressed byte array
      Returns:
      the sketch as a compressed byte array