Class HashOperations
java.lang.Object
org.apache.datasketches.thetacommon.HashOperations
Helper class for the common hash table methods.
- Author:
- Lee Rhodes, Kevin Lang
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intThe stride mask for the Open Address, Double Hashing (OADH) hash table algorithm. -
Method Summary
Modifier and TypeMethodDescriptionstatic voidcheckHashCorruption(long hash) Checks that the given hash value is not negative.static voidcheckThetaCorruption(long thetaLong) Checks that the given theta is not negative nor zero.static booleancontinueCondition(long thetaLong, long hash) Return true (continue) if hash is greater than or equal to thetaLong, or if hash == 0, or if hash == Long.MAX_VALUE.static long[]convertToHashTable(long[] hashArr, int count, long thetaLong, double rebuildThreshold) Converts the given array to a hash table.static intcount(long[] srcArr, long thetaLong) Counts the cardinality of the given source array.static intcountPart(long[] srcArr, int lgArrLongs, long thetaLong) Counts the cardinality of the first Log2 values of the given source array.static inthashArrayInsert(long[] srcArr, long[] hashTable, int lgArrLongs, long thetaLong) Inserts the given long array into the given OADH hashTable of the target size, ignores duplicates and counts the values inserted.static inthashInsertOnly(long[] hashTable, int lgArrLongs, long hash) This is a classical Knuth-style Open Addressing, Double Hash (OADH) insert scheme for on-heap.static inthashInsertOnlyMemorySegment(MemorySegment wseg, int lgArrLongs, long hash, int segOffsetBytes) This is a classical Knuth-style Open Addressing, Double Hash (OADH) insert scheme for MemorySegment.static inthashSearch(long[] hashTable, int lgArrLongs, long hash) This is a classical Knuth-style Open Addressing, Double Hash (OADH) search scheme for on-heap.static inthashSearchMemorySegment(MemorySegment seg, int lgArrLongs, long hash, int segOffsetBytes) This is a classical Knuth-style Open Addressing, Double Hash (OADH) search scheme for MemorySegment.static inthashSearchOrInsert(long[] hashTable, int lgArrLongs, long hash) This is a classical Knuth-style Open Addressing, Double Hash (OADH) insert scheme for on-heap.static inthashSearchOrInsertMemorySegment(MemorySegment wseg, int lgArrLongs, long hash, int segOffsetBytes) This is a classical Knuth-style Open Addressing, Double Hash insert scheme, but inserts values directly into a writable MemorySegment.static intminLgHashTableSize(int count, double rebuild_threshold) Returns the smallest log hash table size given the count of items and the rebuild threshold.
-
Field Details
-
STRIDE_MASK
public static final int STRIDE_MASKThe stride mask for the Open Address, Double Hashing (OADH) hash table algorithm.- See Also:
-
-
Method Details
-
hashSearch
public static int hashSearch(long[] hashTable, int lgArrLongs, long hash) This is a classical Knuth-style Open Addressing, Double Hash (OADH) search scheme for on-heap. Returns the index if found, -1 if not found.- Parameters:
hashTable- The hash table to search. Its size must be a power of 2.lgArrLongs- The log_base2(hashTable.length). See lgArrLongs.hash- The hash value to search for. It must not be zero.- Returns:
- Current probe index if found, -1 if not found.
-
hashInsertOnly
public static int hashInsertOnly(long[] hashTable, int lgArrLongs, long hash) This is a classical Knuth-style Open Addressing, Double Hash (OADH) insert scheme for on-heap. This method assumes that the input hash is not a duplicate. Useful for rebuilding tables to avoid unnecessary comparisons. Returns the index of insertion, which is always positive or zero. Throws an exception if the table has no empty slot.- Parameters:
hashTable- the hash table to insert into. Its size must be a power of 2.lgArrLongs- The log_base2(hashTable.length). See lgArrLongs.hash- The hash value to be potentially inserted into an empty slot. It must not be zero.- Returns:
- index of insertion. Always positive or zero.
-
hashSearchOrInsert
public static int hashSearchOrInsert(long[] hashTable, int lgArrLongs, long hash) This is a classical Knuth-style Open Addressing, Double Hash (OADH) insert scheme for on-heap. Returns index ≥ 0 if found (duplicate); < 0 if inserted, inserted at -(index + 1). Throws an exception if the value is not found and table has no empty slot.- Parameters:
hashTable- The hash table to insert into. Its size must be a power of 2.lgArrLongs- The log_base2(hashTable.length). See lgArrLongs.hash- The hash value to be potentially inserted into an empty slot only if it is not a duplicate of any other hash value in the table. It must not be zero.- Returns:
- index ≥ 0 if found (duplicate); < 0 if inserted, inserted at -(index + 1).
-
hashArrayInsert
public static int hashArrayInsert(long[] srcArr, long[] hashTable, int lgArrLongs, long thetaLong) Inserts the given long array into the given OADH hashTable of the target size, ignores duplicates and counts the values inserted. The hash values must not be negative, zero values and values ≥ thetaLong are ignored. The given hash table may have values, but they must have been inserted by this method or one of the other OADH insert methods in this class. This method performs additional checks against potentially invalid hash values or theta values. Returns the count of values actually inserted.- Parameters:
srcArr- the source hash array to be potentially insertedhashTable- The hash table to insert into. Its size must be a power of 2.lgArrLongs- The log_base2(hashTable.length). See lgArrLongs.thetaLong- The theta value that all input hash values are compared against. It must greater than zero. See Theta Long- Returns:
- the count of values actually inserted
-
hashSearchMemorySegment
public static int hashSearchMemorySegment(MemorySegment seg, int lgArrLongs, long hash, int segOffsetBytes) This is a classical Knuth-style Open Addressing, Double Hash (OADH) search scheme for MemorySegment. Returns the index if found, -1 if not found. The input MemorySegment may be read only.- Parameters:
seg- The MemorySegment containing the hash table to search. The hash table portion must be a power of 2 in size.lgArrLongs- The log_base2(hashTable.length). See lgArrLongs.hash- The hash value to search for. Must not be zero.segOffsetBytes- offset in the MemorySegment where the hashTable starts- Returns:
- Current probe index if found, -1 if not found.
-
hashInsertOnlyMemorySegment
public static int hashInsertOnlyMemorySegment(MemorySegment wseg, int lgArrLongs, long hash, int segOffsetBytes) This is a classical Knuth-style Open Addressing, Double Hash (OADH) insert scheme for MemorySegment. This method assumes that the input hash is not a duplicate. Useful for rebuilding tables to avoid unnecessary comparisons. Returns the index of insertion, which is always positive or zero. Throws an exception if table has no empty slot.- Parameters:
wseg- The writable MemorySegment that contains the hashTable to insert into. The size of the hashTable portion must be a power of 2.lgArrLongs- The log_base2(hashTable.length. See lgArrLongs.hash- value that must not be zero and will be inserted into the array into an empty slot.segOffsetBytes- offset in the writable MemorySegment where the hashTable starts- Returns:
- index of insertion. Always positive or zero.
-
hashSearchOrInsertMemorySegment
public static int hashSearchOrInsertMemorySegment(MemorySegment wseg, int lgArrLongs, long hash, int segOffsetBytes) This is a classical Knuth-style Open Addressing, Double Hash insert scheme, but inserts values directly into a writable MemorySegment. Returns index ≥ 0 if found (duplicate); < 0 if inserted, inserted at -(index + 1). Throws an exception if the value is not found and table has no empty slot.- Parameters:
wseg- The writable MemorySegment that contains the hashTable to insert into.lgArrLongs- The log_base2(hashTable.length). See lgArrLongs.hash- The hash value to be potentially inserted into an empty slot only if it is not a duplicate of any other hash value in the table. It must not be zero.segOffsetBytes- offset in the writable MemorySegment where the hash array starts- Returns:
- index ≥ 0 if found (duplicate); < 0 if inserted, inserted at -(index + 1).
-
checkThetaCorruption
public static void checkThetaCorruption(long thetaLong) Checks that the given theta is not negative nor zero.- Parameters:
thetaLong- must be greater than zero otherwise throws an exception. See Theta Long
-
checkHashCorruption
public static void checkHashCorruption(long hash) Checks that the given hash value is not negative.- Parameters:
hash- must be greater than -1 otherwise throws an exception. Note a hash of zero is normally ignored, but a negative hash is never allowed.
-
continueCondition
public static boolean continueCondition(long thetaLong, long hash) Return true (continue) if hash is greater than or equal to thetaLong, or if hash == 0, or if hash == Long.MAX_VALUE.- Parameters:
thetaLong- must be greater than the hash value See Theta Longhash- must be less than thetaLong and not less than or equal to zero.- Returns:
- true (continue) if hash is greater than or equal to thetaLong, or if hash == 0, or if hash == Long.MAX_VALUE.
-
convertToHashTable
public static long[] convertToHashTable(long[] hashArr, int count, long thetaLong, double rebuildThreshold) Converts the given array to a hash table.- Parameters:
hashArr- The given array of hashes. Gaps are OK.count- The number of valid hashes in the arraythetaLong- Any hashes equal to or greater than thetaLong will be ignoredrebuildThreshold- The fill fraction for the hash table forcing a rebuild or resize.- Returns:
- a HashTable
-
minLgHashTableSize
public static int minLgHashTableSize(int count, double rebuild_threshold) Returns the smallest log hash table size given the count of items and the rebuild threshold.- Parameters:
count- the given count of itemsrebuild_threshold- the rebuild threshold as a fraction between zero and one.- Returns:
- the smallest log hash table size
-
countPart
public static int countPart(long[] srcArr, int lgArrLongs, long thetaLong) Counts the cardinality of the first Log2 values of the given source array.- Parameters:
srcArr- the given source arraylgArrLongs- See lgArrLongsthetaLong- See Theta Long- Returns:
- the cardinality
-
count
public static int count(long[] srcArr, long thetaLong) Counts the cardinality of the given source array.- Parameters:
srcArr- the given source arraythetaLong- See Theta Long- Returns:
- the cardinality
-