Class CompactSketch


  • public abstract class CompactSketch
    extends Sketch
    The parent class of all the CompactSketches. CompactSketches are never created directly. They are created as a result of the compact() method of an UpdateSketch, a result of a getResult() of a SetOperation, or from a heapify method.

    A CompactSketch is the simplest form of a Theta Sketch. It consists of a compact list (i.e., no intervening spaces) of hash values, which may be ordered or not, a value for theta and a seed hash. A CompactSketch is immutable (read-only), and the space required when stored is only the space required for the hash values and 8 to 24 bytes of preamble. An empty CompactSketch consumes only 8 bytes.

    Author:
    Lee Rhodes
    • Constructor Detail

      • CompactSketch

        public CompactSketch()
    • Method Detail

      • heapify

        public static CompactSketch heapify​(org.apache.datasketches.memory.Memory srcMem)
        Heapify takes a CompactSketch image in Memory and instantiates an on-heap CompactSketch.

        The resulting sketch will not retain any link to the source Memory and all of its data will be copied to the heap CompactSketch.

        This method assumes that the sketch image was created with the correct hash seed, so it is not checked. The resulting on-heap CompactSketch will be given the seedHash derived from the given sketch image. However, Serial Version 1 sketch images do not have a seedHash field, so the resulting heapified CompactSketch will be given the hash of the DEFAULT_UPDATE_SEED.

        Parameters:
        srcMem - an image of a CompactSketch. See Memory.
        Returns:
        a CompactSketch on the heap.
      • heapify

        public static CompactSketch heapify​(org.apache.datasketches.memory.Memory srcMem,
                                            long expectedSeed)
        Heapify takes a CompactSketch image in Memory and instantiates an on-heap CompactSketch.

        The resulting sketch will not retain any link to the source Memory and all of its data will be copied to the heap CompactSketch.

        This method checks if the given expectedSeed was used to create the source Memory image. However, SerialVersion 1 sketch images cannot be checked as they don't have a seedHash field, so the resulting heapified CompactSketch will be given the hash of the expectedSeed.

        Parameters:
        srcMem - an image of a CompactSketch that was created using the given expectedSeed. See Memory.
        expectedSeed - the seed used to validate the given Memory image. See Update Hash Seed.
        Returns:
        a CompactSketch on the heap.
      • wrap

        public static CompactSketch wrap​(org.apache.datasketches.memory.Memory srcMem)
        Wrap takes the CompactSketch image in given Memory and refers to it directly. There is no data copying onto the java heap. The wrap operation enables fast read-only merging and access to all the public read-only API.

        Only "Direct" Serialization Version 3 (i.e, OpenSource) sketches that have been explicitly stored as direct sketches can be wrapped. Wrapping earlier serial version sketches will result in a heapify operation. These early versions were never designed to "wrap".

        Wrapping any subclass of this class that is empty or contains only a single item will result in heapified forms of empty and single item sketch respectively. This is actually faster and consumes less overall memory.

        This method assumes that the sketch image was created with the correct hash seed, so it is not checked. However, Serial Version 1 sketch images do not have a seedHash field, so the resulting on-heap CompactSketch will be given the hash of the DEFAULT_UPDATE_SEED.

        Parameters:
        srcMem - an image of a Sketch. See Memory.
        Returns:
        a CompactSketch backed by the given Memory except as above.
      • wrap

        public static CompactSketch wrap​(org.apache.datasketches.memory.Memory srcMem,
                                         long expectedSeed)
        Wrap takes the sketch image in the given Memory and refers to it directly. There is no data copying onto the java heap. The wrap operation enables fast read-only merging and access to all the public read-only API.

        Only "Direct" Serialization Version 3 (i.e, OpenSource) sketches that have been explicitly stored as direct sketches can be wrapped. Wrapping earlier serial version sketches will result in a heapify operation. These early versions were never designed to "wrap".

        Wrapping any subclass of this class that is empty or contains only a single item will result in heapified forms of empty and single item sketch respectively. This is actually faster and consumes less overall memory.

        This method checks if the given expectedSeed was used to create the source Memory image. However, SerialVersion 1 sketches cannot be checked as they don't have a seedHash field, so the resulting heapified CompactSketch will be given the hash of the expectedSeed.

        Parameters:
        srcMem - an image of a Sketch that was created using the given expectedSeed. See Memory
        expectedSeed - the seed used to validate the given Memory image. See Update Hash Seed.
        Returns:
        a CompactSketch backed by the given Memory except as above.
      • compact

        public abstract CompactSketch compact​(boolean dstOrdered,
                                              org.apache.datasketches.memory.WritableMemory dstMem)
        Description copied from class: Sketch
        Convert this sketch to a CompactSketch.

        If this sketch is a type of UpdateSketch, the compacting process converts the hash table of the UpdateSketch to a simple list of the valid hash values. Any hash values of zero or equal-to or greater than theta will be discarded. The number of valid values remaining in the CompactSketch depends on a number of factors, but may be larger or smaller than Nominal Entries (or k). It will never exceed 2k. If it is critical to always limit the size to no more than k, then rebuild() should be called on the UpdateSketch prior to calling this method.

        A CompactSketch is always immutable.

        A new CompactSketch object is created:

        • if dstMem != null
        • if dstMem == null and this.hasMemory() == true
        • if dstMem == null and this has more than 1 item and this.isOrdered() == false and dstOrdered == true.

        Otherwise, this operation returns this.

        Specified by:
        compact in class Sketch
        Parameters:
        dstOrdered - assumed true if this sketch is empty or has only one value See Destination Ordered
        dstMem - See Destination Memory.
        Returns:
        this sketch as a CompactSketch.
      • getCompactBytes

        public int getCompactBytes()
        Description copied from class: Sketch
        Returns the number of storage bytes required for this Sketch if its current state were compacted. It this sketch is already in the compact form this is equivalent to calling Sketch.getCurrentBytes().
        Specified by:
        getCompactBytes in class Sketch
        Returns:
        number of compact bytes
      • getFamily

        public Family getFamily()
        Description copied from class: Sketch
        Returns the Family that this sketch belongs to
        Specified by:
        getFamily in class Sketch
        Returns:
        the Family that this sketch belongs to
      • isCompact

        public boolean isCompact()
        Description copied from class: Sketch
        Returns true if this sketch is in compact form.
        Specified by:
        isCompact in class Sketch
        Returns:
        true if this sketch is in compact form.
      • toByteArrayCompressed

        public byte[] toByteArrayCompressed()
        gets the sketch as a compressed byte array
        Returns:
        the sketch as a compressed byte array