Class HashOperations
java.lang.Object
org.apache.datasketches.thetacommon.HashOperations
Helper class for the common hash table methods.
- Author:
- Lee Rhodes, Kevin Lang
-
Field Summary
Modifier and TypeFieldDescriptionstatic final int
The stride mask for the Open Address, Double Hashing (OADH) hash table algorithm. -
Method Summary
Modifier and TypeMethodDescriptionstatic void
checkHashCorruption
(long hash) static void
checkThetaCorruption
(long thetaLong) static boolean
continueCondition
(long thetaLong, long hash) Return true (continue) if hash is greater than or equal to thetaLong, or if hash == 0, or if hash == Long.MAX_VALUE.static long[]
convertToHashTable
(long[] hashArr, int count, long thetaLong, double rebuildThreshold) Converts the given array to a hash table.static int
count
(long[] srcArr, long thetaLong) Counts the cardinality of the given source array.static int
countPart
(long[] srcArr, int lgArrLongs, long thetaLong) Counts the cardinality of the first Log2 values of the given source array.static int
hashArrayInsert
(long[] srcArr, long[] hashTable, int lgArrLongs, long thetaLong) Inserts the given long array into the given OADH hashTable of the target size, ignores duplicates and counts the values inserted.static int
hashInsertOnly
(long[] hashTable, int lgArrLongs, long hash) This is a classical Knuth-style Open Addressing, Double Hash (OADH) insert scheme for on-heap.static int
hashInsertOnlyMemory
(org.apache.datasketches.memory.WritableMemory wmem, int lgArrLongs, long hash, int memOffsetBytes) This is a classical Knuth-style Open Addressing, Double Hash (OADH) insert scheme for Memory.static int
hashSearch
(long[] hashTable, int lgArrLongs, long hash) This is a classical Knuth-style Open Addressing, Double Hash (OADH) search scheme for on-heap.static int
hashSearchMemory
(org.apache.datasketches.memory.Memory mem, int lgArrLongs, long hash, int memOffsetBytes) This is a classical Knuth-style Open Addressing, Double Hash (OADH) search scheme for Memory.static int
hashSearchOrInsert
(long[] hashTable, int lgArrLongs, long hash) This is a classical Knuth-style Open Addressing, Double Hash (OADH) insert scheme for on-heap.static int
hashSearchOrInsertMemory
(org.apache.datasketches.memory.WritableMemory wmem, int lgArrLongs, long hash, int memOffsetBytes) This is a classical Knuth-style Open Addressing, Double Hash insert scheme, but inserts values directly into a Memory.static int
minLgHashTableSize
(int count, double rebuild_threshold) Returns the smallest log hash table size given the count of items and the rebuild threshold.
-
Field Details
-
STRIDE_MASK
public static final int STRIDE_MASKThe stride mask for the Open Address, Double Hashing (OADH) hash table algorithm.- See Also:
-
-
Method Details
-
hashSearch
public static int hashSearch(long[] hashTable, int lgArrLongs, long hash) This is a classical Knuth-style Open Addressing, Double Hash (OADH) search scheme for on-heap. Returns the index if found, -1 if not found.- Parameters:
hashTable
- The hash table to search. Its size must be a power of 2.lgArrLongs
- The log_base2(hashTable.length). See lgArrLongs.hash
- The hash value to search for. It must not be zero.- Returns:
- Current probe index if found, -1 if not found.
-
hashInsertOnly
public static int hashInsertOnly(long[] hashTable, int lgArrLongs, long hash) This is a classical Knuth-style Open Addressing, Double Hash (OADH) insert scheme for on-heap. This method assumes that the input hash is not a duplicate. Useful for rebuilding tables to avoid unnecessary comparisons. Returns the index of insertion, which is always positive or zero. Throws an exception if the table has no empty slot.- Parameters:
hashTable
- the hash table to insert into. Its size must be a power of 2.lgArrLongs
- The log_base2(hashTable.length). See lgArrLongs.hash
- The hash value to be potentially inserted into an empty slot. It must not be zero.- Returns:
- index of insertion. Always positive or zero.
-
hashSearchOrInsert
public static int hashSearchOrInsert(long[] hashTable, int lgArrLongs, long hash) This is a classical Knuth-style Open Addressing, Double Hash (OADH) insert scheme for on-heap. Returns index ≥ 0 if found (duplicate); < 0 if inserted, inserted at -(index + 1). Throws an exception if the value is not found and table has no empty slot.- Parameters:
hashTable
- The hash table to insert into. Its size must be a power of 2.lgArrLongs
- The log_base2(hashTable.length). See lgArrLongs.hash
- The hash value to be potentially inserted into an empty slot only if it is not a duplicate of any other hash value in the table. It must not be zero.- Returns:
- index ≥ 0 if found (duplicate); < 0 if inserted, inserted at -(index + 1).
-
hashArrayInsert
public static int hashArrayInsert(long[] srcArr, long[] hashTable, int lgArrLongs, long thetaLong) Inserts the given long array into the given OADH hashTable of the target size, ignores duplicates and counts the values inserted. The hash values must not be negative, zero values and values ≥ thetaLong are ignored. The given hash table may have values, but they must have been inserted by this method or one of the other OADH insert methods in this class. This method performs additional checks against potentially invalid hash values or theta values. Returns the count of values actually inserted.- Parameters:
srcArr
- the source hash array to be potentially insertedhashTable
- The hash table to insert into. Its size must be a power of 2.lgArrLongs
- The log_base2(hashTable.length). See lgArrLongs.thetaLong
- The theta value that all input hash values are compared against. It must greater than zero. See Theta Long- Returns:
- the count of values actually inserted
-
hashSearchMemory
public static int hashSearchMemory(org.apache.datasketches.memory.Memory mem, int lgArrLongs, long hash, int memOffsetBytes) This is a classical Knuth-style Open Addressing, Double Hash (OADH) search scheme for Memory. Returns the index if found, -1 if not found.- Parameters:
mem
- The Memory containing the hash table to search. The hash table portion must be a power of 2 in size.lgArrLongs
- The log_base2(hashTable.length). See lgArrLongs.hash
- The hash value to search for. Must not be zero.memOffsetBytes
- offset in the memory where the hashTable starts- Returns:
- Current probe index if found, -1 if not found.
-
hashInsertOnlyMemory
public static int hashInsertOnlyMemory(org.apache.datasketches.memory.WritableMemory wmem, int lgArrLongs, long hash, int memOffsetBytes) This is a classical Knuth-style Open Addressing, Double Hash (OADH) insert scheme for Memory. This method assumes that the input hash is not a duplicate. Useful for rebuilding tables to avoid unnecessary comparisons. Returns the index of insertion, which is always positive or zero. Throws an exception if table has no empty slot.- Parameters:
wmem
- The WritableMemory that contains the hashTable to insert into. The size of the hashTable portion must be a power of 2.lgArrLongs
- The log_base2(hashTable.length. See lgArrLongs.hash
- value that must not be zero and will be inserted into the array into an empty slot.memOffsetBytes
- offset in the WritableMemory where the hashTable starts- Returns:
- index of insertion. Always positive or zero.
-
hashSearchOrInsertMemory
public static int hashSearchOrInsertMemory(org.apache.datasketches.memory.WritableMemory wmem, int lgArrLongs, long hash, int memOffsetBytes) This is a classical Knuth-style Open Addressing, Double Hash insert scheme, but inserts values directly into a Memory. Returns index ≥ 0 if found (duplicate); < 0 if inserted, inserted at -(index + 1). Throws an exception if the value is not found and table has no empty slot.- Parameters:
wmem
- The WritableMemory that contains the hashTable to insert into.lgArrLongs
- The log_base2(hashTable.length). See lgArrLongs.hash
- The hash value to be potentially inserted into an empty slot only if it is not a duplicate of any other hash value in the table. It must not be zero.memOffsetBytes
- offset in the WritableMemory where the hash array starts- Returns:
- index ≥ 0 if found (duplicate); < 0 if inserted, inserted at -(index + 1).
-
checkThetaCorruption
public static void checkThetaCorruption(long thetaLong) - Parameters:
thetaLong
- must be greater than zero otherwise throws an exception. See Theta Long
-
checkHashCorruption
public static void checkHashCorruption(long hash) - Parameters:
hash
- must be greater than -1 otherwise throws an exception. Note a hash of zero is normally ignored, but a negative hash is never allowed.
-
continueCondition
public static boolean continueCondition(long thetaLong, long hash) Return true (continue) if hash is greater than or equal to thetaLong, or if hash == 0, or if hash == Long.MAX_VALUE.- Parameters:
thetaLong
- must be greater than the hash value See Theta Longhash
- must be less than thetaLong and not less than or equal to zero.- Returns:
- true (continue) if hash is greater than or equal to thetaLong, or if hash == 0, or if hash == Long.MAX_VALUE.
-
convertToHashTable
public static long[] convertToHashTable(long[] hashArr, int count, long thetaLong, double rebuildThreshold) Converts the given array to a hash table.- Parameters:
hashArr
- The given array of hashes. Gaps are OK.count
- The number of valid hashes in the arraythetaLong
- Any hashes equal to or greater than thetaLong will be ignoredrebuildThreshold
- The fill fraction for the hash table forcing a rebuild or resize.- Returns:
- a HashTable
-
minLgHashTableSize
public static int minLgHashTableSize(int count, double rebuild_threshold) Returns the smallest log hash table size given the count of items and the rebuild threshold.- Parameters:
count
- the given count of itemsrebuild_threshold
- the rebuild threshold as a fraction between zero and one.- Returns:
- the smallest log hash table size
-
countPart
public static int countPart(long[] srcArr, int lgArrLongs, long thetaLong) Counts the cardinality of the first Log2 values of the given source array.- Parameters:
srcArr
- the given source arraylgArrLongs
- See lgArrLongsthetaLong
- See Theta Long- Returns:
- the cardinality
-
count
public static int count(long[] srcArr, long thetaLong) Counts the cardinality of the given source array.- Parameters:
srcArr
- the given source arraythetaLong
- See Theta Long- Returns:
- the cardinality
-