Class ThetaUnion

java.lang.Object
org.apache.datasketches.theta.ThetaSetOperation
org.apache.datasketches.theta.ThetaUnion
All Implemented Interfaces:
MemorySegmentStatus

public abstract class ThetaUnion extends ThetaSetOperation
Compute the union of two or more theta sketches. A new instance represents an empty set.
Author:
Lee Rhodes
  • Constructor Details

    • ThetaUnion

      public ThetaUnion()
      No argument constructor.
  • Method Details

    • fastWrap

      public static ThetaUnion fastWrap(MemorySegment srcSeg)
      Wrap a ThetaUnion object around a ThetaUnion MemorySegment object containing data. This method assumes the Default Update Seed. This does NO validity checking of the given MemorySegment. If the given source MemorySegment is read-only, the returned ThetaUnion object will also be read-only.
      Parameters:
      srcSeg - The source MemorySegment object.
      Returns:
      this class
    • fastWrap

      public static ThetaUnion fastWrap(MemorySegment srcSeg, long expectedSeed)
      Wrap a ThetaUnion object around a ThetaUnion MemorySegment object containing data. This does NO validity checking of the given MemorySegment. If the given source MemorySegment is read-only, the returned ThetaUnion object will also be read-only.
      Parameters:
      srcSeg - The source MemorySegment object.
      expectedSeed - the seed used to validate the given MemorySegment image. See seed
      Returns:
      this class
    • wrap

      public static ThetaUnion wrap(MemorySegment srcSeg)
      Wrap a ThetaUnion object around a ThetaUnion MemorySegment object containing data. This method assumes the Default Update Seed. If the given source MemorySegment is read-only, the returned ThetaUnion object will also be read-only.
      Parameters:
      srcSeg - The source MemorySegment object.
      Returns:
      this class
    • wrap

      public static ThetaUnion wrap(MemorySegment srcSeg, long expectedSeed)
      Wrap a ThetaUnion object around a ThetaUnion MemorySegment object containing data. If the given source MemorySegment is read-only, the returned ThetaUnion object will also be read-only.
      Parameters:
      srcSeg - The source MemorySegment object.
      expectedSeed - the seed used to validate the given MemorySegment image. See seed
      Returns:
      this class
    • getCurrentBytes

      public abstract int getCurrentBytes()
      Returns the number of storage bytes required for this union in its current state.
      Returns:
      the number of storage bytes required for this union in its current state.
    • getFamily

      public Family getFamily()
      Description copied from class: ThetaSetOperation
      Gets the Family of this ThetaSetOperation
      Specified by:
      getFamily in class ThetaSetOperation
      Returns:
      the Family of this ThetaSetOperation
    • getMaxUnionBytes

      public abstract int getMaxUnionBytes()
      Returns the maximum required storage bytes for this union.
      Returns:
      the maximum required storage bytes for this union.
    • getResult

      public abstract CompactThetaSketch getResult()
      Gets the result of this operation as an ordered CompactThetaSketch on the Java heap. This does not disturb the underlying data structure of the union. Therefore, it is OK to continue updating the union after this operation.
      Returns:
      the result of this operation as an ordered CompactThetaSketch on the Java heap
    • getResult

      public abstract CompactThetaSketch getResult(boolean dstOrdered, MemorySegment dstSeg)
      Gets the result of this operation as a CompactThetaSketch of the chosen form. This does not disturb the underlying data structure of the union. Therefore, it is OK to continue updating the union after this operation.
      Parameters:
      dstOrdered - See Destination Ordered
      dstSeg - destination MemorySegment
      Returns:
      the result of this operation as a CompactThetaSketch of the chosen form
    • reset

      public abstract void reset()
      Resets this ThetaUnion. The seed remains intact, everything else reverts back to its virgin state.
    • toByteArray

      public abstract byte[] toByteArray()
      Returns a byte array image of this ThetaUnion object
      Returns:
      a byte array image of this ThetaUnion object
    • union

      public CompactThetaSketch union(ThetaSketch sketchA, ThetaSketch sketchB)
      This implements a stateless, pair-wise union operation. The returned sketch will be cut back to the smaller of the two k values if required.

      Nulls and empty sketches are ignored.

      Parameters:
      sketchA - The first argument
      sketchB - The second argument
      Returns:
      the result ordered CompactThetaSketch on the heap.
    • union

      public abstract CompactThetaSketch union(ThetaSketch sketchA, ThetaSketch sketchB, boolean dstOrdered, MemorySegment dstSeg)
      This implements a stateless, pair-wise union operation. The returned sketch will be cut back to k if required, similar to the regular ThetaUnion operation.

      Nulls and empty sketches are ignored.

      Parameters:
      sketchA - The first argument
      sketchB - The second argument
      dstOrdered - If true, the returned CompactThetaSketch will be ordered.
      dstSeg - If not null, the returned CompactThetaSketch will be placed in this MemorySegment.
      Returns:
      the result CompactThetaSketch.
    • union

      public abstract void union(ThetaSketch sketchIn)
      Perform a union operation with this ThetaUnion and the given on-heap sketch of the Theta Family.

      This method can be repeatedly called.

      Nulls and empty sketches are ignored.

      Parameters:
      sketchIn - The incoming sketch.
    • union

      public abstract void union(MemorySegment seg)
      Perform a union operation with this ThetaUnion and the given MemorySegment image of any sketch of the Theta Family.

      This method can be repeatedly called.

      Nulls and empty sketches are ignored.

      Parameters:
      seg - MemorySegment image of sketch to be merged
    • update

      public abstract void update(long datum)
      Update this union with the given long data item.
      Parameters:
      datum - The given long datum.
    • update

      public abstract void update(double datum)
      Update this union with the given double (or float) data item. The double will be converted to a long using Double.doubleToLongBits(datum), which normalizes all NaN values to a single NaN representation. Plus and minus zero will be normalized to plus zero. Each of the special floating-point values NaN and +/- Infinity are treated as distinct.
      Parameters:
      datum - The given double datum.
    • update

      public abstract void update(String datum)
      Update this union with the with the given String data item. The string is converted to a byte array using UTF8 encoding. If the string is null or empty no update attempt is made and the method returns.

      Note: this will not produce the same output hash values as the update(char[]) method and will generally be a little slower depending on the complexity of the UTF8 encoding.

      Note: this is not a union operation. This treats the given string as a data item.

      Parameters:
      datum - The given String.
    • update

      public abstract void update(byte[] data)
      Update this union with the given byte array item. If the byte array is null or empty no update attempt is made and the method returns.

      Note: this is not a union operation. This treats the given byte array as a data item.

      Parameters:
      data - The given byte array.
    • update

      public abstract void update(ByteBuffer data)
      Update this union with the given ByteBuffer item. If the ByteBuffer is null or empty no update attempt is made and the method returns.

      Note: this is not a union operation. This treats the given ByteBuffer as a data item.

      Parameters:
      data - The given ByteBuffer.
    • update

      public abstract void update(int[] data)
      Update this union with the given integer array item. If the integer array is null or empty no update attempt is made and the method returns.

      Note: this is not a union operation. This treats the given integer array as a data item.

      Parameters:
      data - The given int array.
    • update

      public abstract void update(char[] data)
      Update this union with the given char array item. If the char array is null or empty no update attempt is made and the method returns.

      Note: this will not produce the same output hash values as the update(String) method but will be a little faster as it avoids the complexity of the UTF8 encoding.

      Note: this is not a union operation. This treats the given char array as a data item.

      Parameters:
      data - The given char array.
    • update

      public abstract void update(long[] data)
      Update this union with the given long array item. If the long array is null or empty no update attempt is made and the method returns.

      Note: this is not a union operation. This treats the given char array as a data item.

      Parameters:
      data - The given long array.