Class HashOperations


  • public final class HashOperations
    extends Object
    Helper class for the common hash table methods.
    Author:
    Lee Rhodes, Kevin Lang
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static int STRIDE_MASK
      The stride mask for the Open Address, Double Hashing (OADH) hash table algorithm.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static void checkHashCorruption​(long hash)  
      static void checkThetaCorruption​(long thetaLong)  
      static boolean continueCondition​(long thetaLong, long hash)
      Return true (continue) if hash is greater than or equal to thetaLong, or if hash == 0, or if hash == Long.MAX_VALUE.
      static long[] convertToHashTable​(long[] hashArr, int count, long thetaLong, double rebuildThreshold)
      Converts the given array to a hash table.
      static int count​(long[] srcArr, long thetaLong)
      Counts the cardinality of the given source array.
      static int countPart​(long[] srcArr, int lgArrLongs, long thetaLong)
      Counts the cardinality of the first Log2 values of the given source array.
      static int hashArrayInsert​(long[] srcArr, long[] hashTable, int lgArrLongs, long thetaLong)
      Inserts the given long array into the given OADH hashTable of the target size, ignores duplicates and counts the values inserted.
      static int hashInsertOnly​(long[] hashTable, int lgArrLongs, long hash)
      This is a classical Knuth-style Open Addressing, Double Hash (OADH) insert scheme for on-heap.
      static int hashInsertOnlyMemory​(org.apache.datasketches.memory.WritableMemory wmem, int lgArrLongs, long hash, int memOffsetBytes)
      This is a classical Knuth-style Open Addressing, Double Hash (OADH) insert scheme for Memory.
      static int hashSearch​(long[] hashTable, int lgArrLongs, long hash)
      This is a classical Knuth-style Open Addressing, Double Hash (OADH) search scheme for on-heap.
      static int hashSearchMemory​(org.apache.datasketches.memory.Memory mem, int lgArrLongs, long hash, int memOffsetBytes)
      This is a classical Knuth-style Open Addressing, Double Hash (OADH) search scheme for Memory.
      static int hashSearchOrInsert​(long[] hashTable, int lgArrLongs, long hash)
      This is a classical Knuth-style Open Addressing, Double Hash (OADH) insert scheme for on-heap.
      static int hashSearchOrInsertMemory​(org.apache.datasketches.memory.WritableMemory wmem, int lgArrLongs, long hash, int memOffsetBytes)
      This is a classical Knuth-style Open Addressing, Double Hash insert scheme, but inserts values directly into a Memory.
      static int minLgHashTableSize​(int count, double rebuild_threshold)
      Returns the smallest log hash table size given the count of items and the rebuild threshold.
    • Field Detail

      • STRIDE_MASK

        public static final int STRIDE_MASK
        The stride mask for the Open Address, Double Hashing (OADH) hash table algorithm.
        See Also:
        Constant Field Values
    • Method Detail

      • hashSearch

        public static int hashSearch​(long[] hashTable,
                                     int lgArrLongs,
                                     long hash)
        This is a classical Knuth-style Open Addressing, Double Hash (OADH) search scheme for on-heap. Returns the index if found, -1 if not found.
        Parameters:
        hashTable - The hash table to search. Its size must be a power of 2.
        lgArrLongs - The log_base2(hashTable.length). See lgArrLongs.
        hash - The hash value to search for. It must not be zero.
        Returns:
        Current probe index if found, -1 if not found.
      • hashInsertOnly

        public static int hashInsertOnly​(long[] hashTable,
                                         int lgArrLongs,
                                         long hash)
        This is a classical Knuth-style Open Addressing, Double Hash (OADH) insert scheme for on-heap. This method assumes that the input hash is not a duplicate. Useful for rebuilding tables to avoid unnecessary comparisons. Returns the index of insertion, which is always positive or zero. Throws an exception if the table has no empty slot.
        Parameters:
        hashTable - the hash table to insert into. Its size must be a power of 2.
        lgArrLongs - The log_base2(hashTable.length). See lgArrLongs.
        hash - The hash value to be potentially inserted into an empty slot. It must not be zero.
        Returns:
        index of insertion. Always positive or zero.
      • hashSearchOrInsert

        public static int hashSearchOrInsert​(long[] hashTable,
                                             int lgArrLongs,
                                             long hash)
        This is a classical Knuth-style Open Addressing, Double Hash (OADH) insert scheme for on-heap. Returns index ≥ 0 if found (duplicate); < 0 if inserted, inserted at -(index + 1). Throws an exception if the value is not found and table has no empty slot.
        Parameters:
        hashTable - The hash table to insert into. Its size must be a power of 2.
        lgArrLongs - The log_base2(hashTable.length). See lgArrLongs.
        hash - The hash value to be potentially inserted into an empty slot only if it is not a duplicate of any other hash value in the table. It must not be zero.
        Returns:
        index ≥ 0 if found (duplicate); < 0 if inserted, inserted at -(index + 1).
      • hashArrayInsert

        public static int hashArrayInsert​(long[] srcArr,
                                          long[] hashTable,
                                          int lgArrLongs,
                                          long thetaLong)
        Inserts the given long array into the given OADH hashTable of the target size, ignores duplicates and counts the values inserted. The hash values must not be negative, zero values and values ≥ thetaLong are ignored. The given hash table may have values, but they must have been inserted by this method or one of the other OADH insert methods in this class. This method performs additional checks against potentially invalid hash values or theta values. Returns the count of values actually inserted.
        Parameters:
        srcArr - the source hash array to be potentially inserted
        hashTable - The hash table to insert into. Its size must be a power of 2.
        lgArrLongs - The log_base2(hashTable.length). See lgArrLongs.
        thetaLong - The theta value that all input hash values are compared against. It must greater than zero. See Theta Long
        Returns:
        the count of values actually inserted
      • hashSearchMemory

        public static int hashSearchMemory​(org.apache.datasketches.memory.Memory mem,
                                           int lgArrLongs,
                                           long hash,
                                           int memOffsetBytes)
        This is a classical Knuth-style Open Addressing, Double Hash (OADH) search scheme for Memory. Returns the index if found, -1 if not found.
        Parameters:
        mem - The Memory containing the hash table to search. The hash table portion must be a power of 2 in size.
        lgArrLongs - The log_base2(hashTable.length). See lgArrLongs.
        hash - The hash value to search for. Must not be zero.
        memOffsetBytes - offset in the memory where the hashTable starts
        Returns:
        Current probe index if found, -1 if not found.
      • hashInsertOnlyMemory

        public static int hashInsertOnlyMemory​(org.apache.datasketches.memory.WritableMemory wmem,
                                               int lgArrLongs,
                                               long hash,
                                               int memOffsetBytes)
        This is a classical Knuth-style Open Addressing, Double Hash (OADH) insert scheme for Memory. This method assumes that the input hash is not a duplicate. Useful for rebuilding tables to avoid unnecessary comparisons. Returns the index of insertion, which is always positive or zero. Throws an exception if table has no empty slot.
        Parameters:
        wmem - The WritableMemory that contains the hashTable to insert into. The size of the hashTable portion must be a power of 2.
        lgArrLongs - The log_base2(hashTable.length. See lgArrLongs.
        hash - value that must not be zero and will be inserted into the array into an empty slot.
        memOffsetBytes - offset in the WritableMemory where the hashTable starts
        Returns:
        index of insertion. Always positive or zero.
      • hashSearchOrInsertMemory

        public static int hashSearchOrInsertMemory​(org.apache.datasketches.memory.WritableMemory wmem,
                                                   int lgArrLongs,
                                                   long hash,
                                                   int memOffsetBytes)
        This is a classical Knuth-style Open Addressing, Double Hash insert scheme, but inserts values directly into a Memory. Returns index ≥ 0 if found (duplicate); < 0 if inserted, inserted at -(index + 1). Throws an exception if the value is not found and table has no empty slot.
        Parameters:
        wmem - The WritableMemory that contains the hashTable to insert into.
        lgArrLongs - The log_base2(hashTable.length). See lgArrLongs.
        hash - The hash value to be potentially inserted into an empty slot only if it is not a duplicate of any other hash value in the table. It must not be zero.
        memOffsetBytes - offset in the WritableMemory where the hash array starts
        Returns:
        index ≥ 0 if found (duplicate); < 0 if inserted, inserted at -(index + 1).
      • checkThetaCorruption

        public static void checkThetaCorruption​(long thetaLong)
        Parameters:
        thetaLong - must be greater than zero otherwise throws an exception. See Theta Long
      • checkHashCorruption

        public static void checkHashCorruption​(long hash)
        Parameters:
        hash - must be greater than -1 otherwise throws an exception. Note a hash of zero is normally ignored, but a negative hash is never allowed.
      • continueCondition

        public static boolean continueCondition​(long thetaLong,
                                                long hash)
        Return true (continue) if hash is greater than or equal to thetaLong, or if hash == 0, or if hash == Long.MAX_VALUE.
        Parameters:
        thetaLong - must be greater than the hash value See Theta Long
        hash - must be less than thetaLong and not less than or equal to zero.
        Returns:
        true (continue) if hash is greater than or equal to thetaLong, or if hash == 0, or if hash == Long.MAX_VALUE.
      • convertToHashTable

        public static long[] convertToHashTable​(long[] hashArr,
                                                int count,
                                                long thetaLong,
                                                double rebuildThreshold)
        Converts the given array to a hash table.
        Parameters:
        hashArr - The given array of hashes. Gaps are OK.
        count - The number of valid hashes in the array
        thetaLong - Any hashes equal to or greater than thetaLong will be ignored
        rebuildThreshold - The fill fraction for the hash table forcing a rebuild or resize.
        Returns:
        a HashTable
      • minLgHashTableSize

        public static int minLgHashTableSize​(int count,
                                             double rebuild_threshold)
        Returns the smallest log hash table size given the count of items and the rebuild threshold.
        Parameters:
        count - the given count of items
        rebuild_threshold - the rebuild threshold as a fraction between zero and one.
        Returns:
        the smallest log hash table size
      • countPart

        public static int countPart​(long[] srcArr,
                                    int lgArrLongs,
                                    long thetaLong)
        Counts the cardinality of the first Log2 values of the given source array.
        Parameters:
        srcArr - the given source array
        lgArrLongs - See lgArrLongs
        thetaLong - See Theta Long
        Returns:
        the cardinality
      • count

        public static int count​(long[] srcArr,
                                long thetaLong)
        Counts the cardinality of the given source array.
        Parameters:
        srcArr - the given source array
        thetaLong - See Theta Long
        Returns:
        the cardinality