Class ThetaUtil
- java.lang.Object
-
- org.apache.datasketches.thetacommon.ThetaUtil
-
public final class ThetaUtil extends Object
Utility methods for the Theta Family of sketches- Author:
- Lee Rhodes
-
-
Field Summary
Fields Modifier and Type Field Description static int
DEFAULT_NOMINAL_ENTRIES
The default nominal entries is provided as a convenience for those cases where the nominal sketch size in number of entries is not provided.static long
DEFAULT_UPDATE_SEED
The seed 9001 used in the sketch update methods is a prime number that was chosen very early on in experimental testing.static int
MAX_LG_NOM_LONGS
The largest Log2 nom entries allowed: 26.static int
MIN_LG_ARR_LONGS
The smallest Log2 cache size allowed: 5.static int
MIN_LG_NOM_LONGS
The smallest Log2 nom entries allowed: 4.static double
REBUILD_THRESHOLD
The hash table rebuild threshold = 15.0/16.0.static double
RESIZE_THRESHOLD
The resize threshold = 0.5; tuned for speed.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static int
checkNomLongs(int nomLongs)
Checks that the given nomLongs is within bounds and returns the Log2 of the ceiling power of 2 of the given nomLongs.static short
checkSeedHashes(short seedHashA, short seedHashB)
Check if the two seed hashes are equal.static short
computeSeedHash(long seed)
Computes and checks the 16-bit seed hash from the given long seed.static int
startingSubMultiple(int lgTarget, int lgRF, int lgMin)
Gets the smallest allowed exponent of 2 that it is a sub-multiple of the target by zero, one or more resize factors.
-
-
-
Field Detail
-
MIN_LG_NOM_LONGS
public static final int MIN_LG_NOM_LONGS
The smallest Log2 nom entries allowed: 4.- See Also:
- Constant Field Values
-
MAX_LG_NOM_LONGS
public static final int MAX_LG_NOM_LONGS
The largest Log2 nom entries allowed: 26.- See Also:
- Constant Field Values
-
REBUILD_THRESHOLD
public static final double REBUILD_THRESHOLD
The hash table rebuild threshold = 15.0/16.0.- See Also:
- Constant Field Values
-
RESIZE_THRESHOLD
public static final double RESIZE_THRESHOLD
The resize threshold = 0.5; tuned for speed.- See Also:
- Constant Field Values
-
DEFAULT_NOMINAL_ENTRIES
public static final int DEFAULT_NOMINAL_ENTRIES
The default nominal entries is provided as a convenience for those cases where the nominal sketch size in number of entries is not provided. A sketch of 4096 entries has a Relative Standard Error (RSE) of +/- 1.56% at a confidence of 68%; or equivalently, a Relative Error of +/- 3.1% at a confidence of 95.4%. See Default Nominal Entries- See Also:
- Constant Field Values
-
DEFAULT_UPDATE_SEED
public static final long DEFAULT_UPDATE_SEED
The seed 9001 used in the sketch update methods is a prime number that was chosen very early on in experimental testing. Choosing a seed is somewhat arbitrary, and the author cannot prove that this particular seed is somehow superior to other seeds. There was some early Internet discussion that a seed of 0 did not produce as clean avalanche diagrams as non-zero seeds, but this may have been more related to the MurmurHash2 release, which did have some issues. As far as the author can determine, MurmurHash3 does not have these problems.In order to perform set operations on two sketches it is critical that the same hash function and seed are identical for both sketches, otherwise the assumed 1:1 relationship between the original source key value and the hashed bit string would be violated. Once you have developed a history of stored sketches you are stuck with it.
WARNING: This seed is used internally by library sketches in different packages and thus must be declared public. However, this seed value must not be used by library users with the MurmurHash3 function. It should be viewed as existing for exclusive, private use by the library.
- See Also:
- Constant Field Values
-
MIN_LG_ARR_LONGS
public static final int MIN_LG_ARR_LONGS
The smallest Log2 cache size allowed: 5.- See Also:
- Constant Field Values
-
-
Method Detail
-
checkSeedHashes
public static short checkSeedHashes(short seedHashA, short seedHashB)
Check if the two seed hashes are equal. If not, throw an SketchesArgumentException.- Parameters:
seedHashA
- the seedHash AseedHashB
- the seedHash B- Returns:
- seedHashA if they are equal
-
computeSeedHash
public static short computeSeedHash(long seed)
Computes and checks the 16-bit seed hash from the given long seed. The seed hash may not be zero in order to maintain compatibility with older serialized versions that did not have this concept.- Parameters:
seed
- See Update Hash Seed- Returns:
- the seed hash.
-
startingSubMultiple
public static int startingSubMultiple(int lgTarget, int lgRF, int lgMin)
Gets the smallest allowed exponent of 2 that it is a sub-multiple of the target by zero, one or more resize factors.- Parameters:
lgTarget
- Log2 of the target sizelgRF
- Log_base2 of Resize Factor. See Resize FactorlgMin
- Log2 of the minimum allowed starting size- Returns:
- The Log2 of the starting size
-
checkNomLongs
public static int checkNomLongs(int nomLongs)
Checks that the given nomLongs is within bounds and returns the Log2 of the ceiling power of 2 of the given nomLongs.- Parameters:
nomLongs
- the given number of nominal longs. This can be any value from 16 to 67108864, inclusive.- Returns:
- The Log2 of the ceiling power of 2 of the given nomLongs.
-
-