Class ThetaUtil
java.lang.Object
org.apache.datasketches.thetacommon.ThetaUtil
Utility methods for the Theta Family of sketches
- Author:
- Lee Rhodes
-
Field Summary
Modifier and TypeFieldDescriptionstatic final int
The default nominal entries is provided as a convenience for those cases where the nominal sketch size in number of entries is not provided.static final long
The seed 9001 used in the sketch update methods is a prime number that was chosen very early on in experimental testing.static final int
The largest Log2 nom entries allowed: 26.static final int
The smallest Log2 cache size allowed: 5.static final int
The smallest Log2 nom entries allowed: 4.static final double
The hash table rebuild threshold = 15.0/16.0.static final double
The resize threshold = 0.5; tuned for speed. -
Method Summary
Modifier and TypeMethodDescriptionstatic int
checkNomLongs
(int nomLongs) Checks that the given nomLongs is within bounds and returns the Log2 of the ceiling power of 2 of the given nomLongs.static short
checkSeedHashes
(short seedHashA, short seedHashB) Check if the two seed hashes are equal.static short
computeSeedHash
(long seed) Computes and checks the 16-bit seed hash from the given long seed.static int
startingSubMultiple
(int lgTarget, int lgRF, int lgMin) Gets the smallest allowed exponent of 2 that it is a sub-multiple of the target by zero, one or more resize factors.
-
Field Details
-
MIN_LG_NOM_LONGS
public static final int MIN_LG_NOM_LONGSThe smallest Log2 nom entries allowed: 4.- See Also:
-
MAX_LG_NOM_LONGS
public static final int MAX_LG_NOM_LONGSThe largest Log2 nom entries allowed: 26.- See Also:
-
REBUILD_THRESHOLD
public static final double REBUILD_THRESHOLDThe hash table rebuild threshold = 15.0/16.0.- See Also:
-
RESIZE_THRESHOLD
public static final double RESIZE_THRESHOLDThe resize threshold = 0.5; tuned for speed.- See Also:
-
DEFAULT_NOMINAL_ENTRIES
public static final int DEFAULT_NOMINAL_ENTRIESThe default nominal entries is provided as a convenience for those cases where the nominal sketch size in number of entries is not provided. A sketch of 4096 entries has a Relative Standard Error (RSE) of +/- 1.56% at a confidence of 68%; or equivalently, a Relative Error of +/- 3.1% at a confidence of 95.4%. See Default Nominal Entries- See Also:
-
DEFAULT_UPDATE_SEED
public static final long DEFAULT_UPDATE_SEEDThe seed 9001 used in the sketch update methods is a prime number that was chosen very early on in experimental testing. Choosing a seed is somewhat arbitrary, and the author cannot prove that this particular seed is somehow superior to other seeds. There was some early Internet discussion that a seed of 0 did not produce as clean avalanche diagrams as non-zero seeds, but this may have been more related to the MurmurHash2 release, which did have some issues. As far as the author can determine, MurmurHash3 does not have these problems.In order to perform set operations on two sketches it is critical that the same hash function and seed are identical for both sketches, otherwise the assumed 1:1 relationship between the original source key value and the hashed bit string would be violated. Once you have developed a history of stored sketches you are stuck with it.
WARNING: This seed is used internally by library sketches in different packages and thus must be declared public. However, this seed value must not be used by library users with the MurmurHash3 function. It should be viewed as existing for exclusive, private use by the library.
- See Also:
-
MIN_LG_ARR_LONGS
public static final int MIN_LG_ARR_LONGSThe smallest Log2 cache size allowed: 5.- See Also:
-
-
Method Details
-
checkSeedHashes
public static short checkSeedHashes(short seedHashA, short seedHashB) Check if the two seed hashes are equal. If not, throw an SketchesArgumentException.- Parameters:
seedHashA
- the seedHash AseedHashB
- the seedHash B- Returns:
- seedHashA if they are equal
-
computeSeedHash
public static short computeSeedHash(long seed) Computes and checks the 16-bit seed hash from the given long seed. The seed hash may not be zero in order to maintain compatibility with older serialized versions that did not have this concept.- Parameters:
seed
- See Update Hash Seed- Returns:
- the seed hash.
-
startingSubMultiple
public static int startingSubMultiple(int lgTarget, int lgRF, int lgMin) Gets the smallest allowed exponent of 2 that it is a sub-multiple of the target by zero, one or more resize factors.- Parameters:
lgTarget
- Log2 of the target sizelgRF
- Log_base2 of Resize Factor. See Resize FactorlgMin
- Log2 of the minimum allowed starting size- Returns:
- The Log2 of the starting size
-
checkNomLongs
public static int checkNomLongs(int nomLongs) Checks that the given nomLongs is within bounds and returns the Log2 of the ceiling power of 2 of the given nomLongs.- Parameters:
nomLongs
- the given number of nominal longs. This can be any value from 16 to 67108864, inclusive.- Returns:
- The Log2 of the ceiling power of 2 of the given nomLongs.
-