Class HashOperations

java.lang.Object
org.apache.datasketches.thetacommon.HashOperations

public final class HashOperations extends Object
Helper class for the common hash table methods.
Author:
Lee Rhodes, Kevin Lang
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final int
    The stride mask for the Open Address, Double Hashing (OADH) hash table algorithm.
  • Method Summary

    Modifier and Type
    Method
    Description
    static void
     
    static void
    checkThetaCorruption(long thetaLong)
     
    static boolean
    continueCondition(long thetaLong, long hash)
    Return true (continue) if hash is greater than or equal to thetaLong, or if hash == 0, or if hash == Long.MAX_VALUE.
    static long[]
    convertToHashTable(long[] hashArr, int count, long thetaLong, double rebuildThreshold)
    Converts the given array to a hash table.
    static int
    count(long[] srcArr, long thetaLong)
    Counts the cardinality of the given source array.
    static int
    countPart(long[] srcArr, int lgArrLongs, long thetaLong)
    Counts the cardinality of the first Log2 values of the given source array.
    static int
    hashArrayInsert(long[] srcArr, long[] hashTable, int lgArrLongs, long thetaLong)
    Inserts the given long array into the given OADH hashTable of the target size, ignores duplicates and counts the values inserted.
    static int
    hashInsertOnly(long[] hashTable, int lgArrLongs, long hash)
    This is a classical Knuth-style Open Addressing, Double Hash (OADH) insert scheme for on-heap.
    static int
    hashInsertOnlyMemory(org.apache.datasketches.memory.WritableMemory wmem, int lgArrLongs, long hash, int memOffsetBytes)
    This is a classical Knuth-style Open Addressing, Double Hash (OADH) insert scheme for Memory.
    static int
    hashSearch(long[] hashTable, int lgArrLongs, long hash)
    This is a classical Knuth-style Open Addressing, Double Hash (OADH) search scheme for on-heap.
    static int
    hashSearchMemory(org.apache.datasketches.memory.Memory mem, int lgArrLongs, long hash, int memOffsetBytes)
    This is a classical Knuth-style Open Addressing, Double Hash (OADH) search scheme for Memory.
    static int
    hashSearchOrInsert(long[] hashTable, int lgArrLongs, long hash)
    This is a classical Knuth-style Open Addressing, Double Hash (OADH) insert scheme for on-heap.
    static int
    hashSearchOrInsertMemory(org.apache.datasketches.memory.WritableMemory wmem, int lgArrLongs, long hash, int memOffsetBytes)
    This is a classical Knuth-style Open Addressing, Double Hash insert scheme, but inserts values directly into a Memory.
    static int
    minLgHashTableSize(int count, double rebuild_threshold)
    Returns the smallest log hash table size given the count of items and the rebuild threshold.

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • STRIDE_MASK

      public static final int STRIDE_MASK
      The stride mask for the Open Address, Double Hashing (OADH) hash table algorithm.
      See Also:
  • Method Details

    • hashSearch

      public static int hashSearch(long[] hashTable, int lgArrLongs, long hash)
      This is a classical Knuth-style Open Addressing, Double Hash (OADH) search scheme for on-heap. Returns the index if found, -1 if not found.
      Parameters:
      hashTable - The hash table to search. Its size must be a power of 2.
      lgArrLongs - The log_base2(hashTable.length). See lgArrLongs.
      hash - The hash value to search for. It must not be zero.
      Returns:
      Current probe index if found, -1 if not found.
    • hashInsertOnly

      public static int hashInsertOnly(long[] hashTable, int lgArrLongs, long hash)
      This is a classical Knuth-style Open Addressing, Double Hash (OADH) insert scheme for on-heap. This method assumes that the input hash is not a duplicate. Useful for rebuilding tables to avoid unnecessary comparisons. Returns the index of insertion, which is always positive or zero. Throws an exception if the table has no empty slot.
      Parameters:
      hashTable - the hash table to insert into. Its size must be a power of 2.
      lgArrLongs - The log_base2(hashTable.length). See lgArrLongs.
      hash - The hash value to be potentially inserted into an empty slot. It must not be zero.
      Returns:
      index of insertion. Always positive or zero.
    • hashSearchOrInsert

      public static int hashSearchOrInsert(long[] hashTable, int lgArrLongs, long hash)
      This is a classical Knuth-style Open Addressing, Double Hash (OADH) insert scheme for on-heap. Returns index ≥ 0 if found (duplicate); < 0 if inserted, inserted at -(index + 1). Throws an exception if the value is not found and table has no empty slot.
      Parameters:
      hashTable - The hash table to insert into. Its size must be a power of 2.
      lgArrLongs - The log_base2(hashTable.length). See lgArrLongs.
      hash - The hash value to be potentially inserted into an empty slot only if it is not a duplicate of any other hash value in the table. It must not be zero.
      Returns:
      index ≥ 0 if found (duplicate); < 0 if inserted, inserted at -(index + 1).
    • hashArrayInsert

      public static int hashArrayInsert(long[] srcArr, long[] hashTable, int lgArrLongs, long thetaLong)
      Inserts the given long array into the given OADH hashTable of the target size, ignores duplicates and counts the values inserted. The hash values must not be negative, zero values and values ≥ thetaLong are ignored. The given hash table may have values, but they must have been inserted by this method or one of the other OADH insert methods in this class. This method performs additional checks against potentially invalid hash values or theta values. Returns the count of values actually inserted.
      Parameters:
      srcArr - the source hash array to be potentially inserted
      hashTable - The hash table to insert into. Its size must be a power of 2.
      lgArrLongs - The log_base2(hashTable.length). See lgArrLongs.
      thetaLong - The theta value that all input hash values are compared against. It must greater than zero. See Theta Long
      Returns:
      the count of values actually inserted
    • hashSearchMemory

      public static int hashSearchMemory(org.apache.datasketches.memory.Memory mem, int lgArrLongs, long hash, int memOffsetBytes)
      This is a classical Knuth-style Open Addressing, Double Hash (OADH) search scheme for Memory. Returns the index if found, -1 if not found.
      Parameters:
      mem - The Memory containing the hash table to search. The hash table portion must be a power of 2 in size.
      lgArrLongs - The log_base2(hashTable.length). See lgArrLongs.
      hash - The hash value to search for. Must not be zero.
      memOffsetBytes - offset in the memory where the hashTable starts
      Returns:
      Current probe index if found, -1 if not found.
    • hashInsertOnlyMemory

      public static int hashInsertOnlyMemory(org.apache.datasketches.memory.WritableMemory wmem, int lgArrLongs, long hash, int memOffsetBytes)
      This is a classical Knuth-style Open Addressing, Double Hash (OADH) insert scheme for Memory. This method assumes that the input hash is not a duplicate. Useful for rebuilding tables to avoid unnecessary comparisons. Returns the index of insertion, which is always positive or zero. Throws an exception if table has no empty slot.
      Parameters:
      wmem - The WritableMemory that contains the hashTable to insert into. The size of the hashTable portion must be a power of 2.
      lgArrLongs - The log_base2(hashTable.length. See lgArrLongs.
      hash - value that must not be zero and will be inserted into the array into an empty slot.
      memOffsetBytes - offset in the WritableMemory where the hashTable starts
      Returns:
      index of insertion. Always positive or zero.
    • hashSearchOrInsertMemory

      public static int hashSearchOrInsertMemory(org.apache.datasketches.memory.WritableMemory wmem, int lgArrLongs, long hash, int memOffsetBytes)
      This is a classical Knuth-style Open Addressing, Double Hash insert scheme, but inserts values directly into a Memory. Returns index ≥ 0 if found (duplicate); < 0 if inserted, inserted at -(index + 1). Throws an exception if the value is not found and table has no empty slot.
      Parameters:
      wmem - The WritableMemory that contains the hashTable to insert into.
      lgArrLongs - The log_base2(hashTable.length). See lgArrLongs.
      hash - The hash value to be potentially inserted into an empty slot only if it is not a duplicate of any other hash value in the table. It must not be zero.
      memOffsetBytes - offset in the WritableMemory where the hash array starts
      Returns:
      index ≥ 0 if found (duplicate); < 0 if inserted, inserted at -(index + 1).
    • checkThetaCorruption

      public static void checkThetaCorruption(long thetaLong)
      Parameters:
      thetaLong - must be greater than zero otherwise throws an exception. See Theta Long
    • checkHashCorruption

      public static void checkHashCorruption(long hash)
      Parameters:
      hash - must be greater than -1 otherwise throws an exception. Note a hash of zero is normally ignored, but a negative hash is never allowed.
    • continueCondition

      public static boolean continueCondition(long thetaLong, long hash)
      Return true (continue) if hash is greater than or equal to thetaLong, or if hash == 0, or if hash == Long.MAX_VALUE.
      Parameters:
      thetaLong - must be greater than the hash value See Theta Long
      hash - must be less than thetaLong and not less than or equal to zero.
      Returns:
      true (continue) if hash is greater than or equal to thetaLong, or if hash == 0, or if hash == Long.MAX_VALUE.
    • convertToHashTable

      public static long[] convertToHashTable(long[] hashArr, int count, long thetaLong, double rebuildThreshold)
      Converts the given array to a hash table.
      Parameters:
      hashArr - The given array of hashes. Gaps are OK.
      count - The number of valid hashes in the array
      thetaLong - Any hashes equal to or greater than thetaLong will be ignored
      rebuildThreshold - The fill fraction for the hash table forcing a rebuild or resize.
      Returns:
      a HashTable
    • minLgHashTableSize

      public static int minLgHashTableSize(int count, double rebuild_threshold)
      Returns the smallest log hash table size given the count of items and the rebuild threshold.
      Parameters:
      count - the given count of items
      rebuild_threshold - the rebuild threshold as a fraction between zero and one.
      Returns:
      the smallest log hash table size
    • countPart

      public static int countPart(long[] srcArr, int lgArrLongs, long thetaLong)
      Counts the cardinality of the first Log2 values of the given source array.
      Parameters:
      srcArr - the given source array
      lgArrLongs - See lgArrLongs
      thetaLong - See Theta Long
      Returns:
      the cardinality
    • count

      public static int count(long[] srcArr, long thetaLong)
      Counts the cardinality of the given source array.
      Parameters:
      srcArr - the given source array
      thetaLong - See Theta Long
      Returns:
      the cardinality