Class BloomFilterBuilder
java.lang.Object
org.apache.datasketches.filters.bloomfilter.BloomFilterBuilder
This class provides methods to help estimate the correct parameters when
creating a Bloom filter, and methods to create the filter using those values.
The underlying math is described in the Wikipedia article on Bloom filters.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic BloomFilter
createByAccuracy
(long maxDistinctItems, double targetFalsePositiveProb) Creates a new BloomFilter with an optimal number of bits and hash functions for the given inputs, using a random base seed for the hash function.static BloomFilter
createByAccuracy
(long maxDistinctItems, double targetFalsePositiveProb, long seed) Creates a new BloomFilter with an optimal number of bits and hash functions for the given inputs, using the provided base seed for the hash function.static BloomFilter
createBySize
(long numBits, int numHashes) Creates a BloomFilter with given number of bits and number of hash functions, using a rnadom base seed for the hash function.static BloomFilter
createBySize
(long numBits, int numHashes, long seed) Creates a BloomFilter with given number of bits and number of hash functions, using the provided base seed for the hash function.static long
getSerializedFilterSize
(long numBits) Returns the minimum memory size, in bytes, needed for a serialized BloomFilter with the given number of bits.static long
getSerializedFilterSizeByAccuracy
(long maxDistinctItems, double targetFalsePositiveProb) Returns the minimum memory size, in bytes, needed for a serialized BloomFilter with an optimal number of bits and hash functions for the given inputs.static BloomFilter
initializeByAccuracy
(long maxDistinctItems, double targetFalsePositiveProb, long seed, org.apache.datasketches.memory.WritableMemory dstMem) Creates a new BloomFilter with an optimal number of bits and hash functions for the given inputs, using the provided base seed for the hash function and writing into the provided WritableMemory.static BloomFilter
initializeByAccuracy
(long maxDistinctItems, double targetFalsePositiveProb, org.apache.datasketches.memory.WritableMemory dstMem) Creates a new BloomFilter with an optimal number of bits and hash functions for the given inputs, using a random base seed for the hash function and writing into the provided WritableMemory.static BloomFilter
initializeBySize
(long numBits, int numHashes, long seed, org.apache.datasketches.memory.WritableMemory dstMem) Initializes a BloomFilter with given number of bits and number of hash functions, using the provided base seed for the hash function and writing into the provided WritableMemory.static BloomFilter
initializeBySize
(long numBits, int numHashes, org.apache.datasketches.memory.WritableMemory dstMem) Initializes a BloomFilter with given number of bits and number of hash functions, using a random base seed for the hash function and writing into the provided WritableMemory.static long
suggestNumFilterBits
(long maxDistinctItems, double targetFalsePositiveProb) Returns the optimal number of bits to use in a Bloom Filter given a target number of distinct items and a target false positive probability.static short
suggestNumHashes
(double targetFalsePositiveProb) Returns the optimal number of hash functions to achieve a target false positive probability.static short
suggestNumHashes
(long maxDistinctItems, long numFilterBits) Returns the optimal number of hash functions to given target numbers of distinct items and the BloomFilter size in bits.
-
Constructor Details
-
BloomFilterBuilder
public BloomFilterBuilder()
-
-
Method Details
-
suggestNumHashes
public static short suggestNumHashes(long maxDistinctItems, long numFilterBits) Returns the optimal number of hash functions to given target numbers of distinct items and the BloomFilter size in bits. This function will provide a result even if the input values exceed the capacity of a single BloomFilter.- Parameters:
maxDistinctItems
- The maximum expected number of distinct items to add to the filternumFilterBits
- The intended size of the Bloom Filter in bits- Returns:
- The suggested number of hash functions to use with the filter
-
suggestNumHashes
public static short suggestNumHashes(double targetFalsePositiveProb) Returns the optimal number of hash functions to achieve a target false positive probability.- Parameters:
targetFalsePositiveProb
- A desired false positive probability per item- Returns:
- The suggested number of hash functions to use with the filter.
-
suggestNumFilterBits
public static long suggestNumFilterBits(long maxDistinctItems, double targetFalsePositiveProb) Returns the optimal number of bits to use in a Bloom Filter given a target number of distinct items and a target false positive probability.- Parameters:
maxDistinctItems
- The maximum expected number of distinct items to add to the filtertargetFalsePositiveProb
- A desired false positive probability per item- Returns:
- The suggested number of bits to use with the filter
-
getSerializedFilterSizeByAccuracy
public static long getSerializedFilterSizeByAccuracy(long maxDistinctItems, double targetFalsePositiveProb) Returns the minimum memory size, in bytes, needed for a serialized BloomFilter with an optimal number of bits and hash functions for the given inputs. This is also the minimum size of a WritableMemory for in-place filter initialization.- Parameters:
maxDistinctItems
- The maximum expected number of distinct items to add to the filtertargetFalsePositiveProb
- A desired false positive probability per item- Returns:
- The size, in bytes, required to hold the specified BloomFilter when serialized
-
getSerializedFilterSize
public static long getSerializedFilterSize(long numBits) Returns the minimum memory size, in bytes, needed for a serialized BloomFilter with the given number of bits. This is also the minimum size of a WritableMemory for in-place filter initialization.- Parameters:
numBits
- The number of bits in the target BloomFilter's bit array.- Returns:
- The size, in bytes, required to hold the specified BloomFilter when serialized
-
createByAccuracy
Creates a new BloomFilter with an optimal number of bits and hash functions for the given inputs, using a random base seed for the hash function.- Parameters:
maxDistinctItems
- The maximum expected number of distinct items to add to the filtertargetFalsePositiveProb
- A desired false positive probability per item- Returns:
- A new BloomFilter configured for the given input parameters
-
createByAccuracy
public static BloomFilter createByAccuracy(long maxDistinctItems, double targetFalsePositiveProb, long seed) Creates a new BloomFilter with an optimal number of bits and hash functions for the given inputs, using the provided base seed for the hash function.- Parameters:
maxDistinctItems
- The maximum expected number of distinct items to add to the filtertargetFalsePositiveProb
- A desired false positive probability per itemseed
- A base hash seed- Returns:
- A new BloomFilter configured for the given input parameters
-
createBySize
Creates a BloomFilter with given number of bits and number of hash functions, using a rnadom base seed for the hash function.- Parameters:
numBits
- The size of the BloomFilter, in bitsnumHashes
- The number of hash functions to apply to items- Returns:
- A new BloomFilter configured for the given input parameters
-
createBySize
Creates a BloomFilter with given number of bits and number of hash functions, using the provided base seed for the hash function.- Parameters:
numBits
- The size of the BloomFilter, in bitsnumHashes
- The number of hash functions to apply to itemsseed
- A base hash seed- Returns:
- A new BloomFilter configured for the given input parameters
-
initializeByAccuracy
public static BloomFilter initializeByAccuracy(long maxDistinctItems, double targetFalsePositiveProb, org.apache.datasketches.memory.WritableMemory dstMem) Creates a new BloomFilter with an optimal number of bits and hash functions for the given inputs, using a random base seed for the hash function and writing into the provided WritableMemory.- Parameters:
maxDistinctItems
- The maximum expected number of distinct items to add to the filtertargetFalsePositiveProb
- A desired false positive probability per itemdstMem
- A WritableMemory to hold the initialized filter- Returns:
- A new BloomFilter configured for the given input parameters
-
initializeByAccuracy
public static BloomFilter initializeByAccuracy(long maxDistinctItems, double targetFalsePositiveProb, long seed, org.apache.datasketches.memory.WritableMemory dstMem) Creates a new BloomFilter with an optimal number of bits and hash functions for the given inputs, using the provided base seed for the hash function and writing into the provided WritableMemory.- Parameters:
maxDistinctItems
- The maximum expected number of distinct items to add to the filtertargetFalsePositiveProb
- A desired false positive probability per itemseed
- A base hash seeddstMem
- A WritableMemory to hold the initialized filter- Returns:
- A new BloomFilter configured for the given input parameters
-
initializeBySize
public static BloomFilter initializeBySize(long numBits, int numHashes, org.apache.datasketches.memory.WritableMemory dstMem) Initializes a BloomFilter with given number of bits and number of hash functions, using a random base seed for the hash function and writing into the provided WritableMemory.- Parameters:
numBits
- The size of the BloomFilter, in bitsnumHashes
- The number of hash functions to apply to itemsdstMem
- A WritableMemory to hold the initialized filter- Returns:
- A new BloomFilter configured for the given input parameters
-
initializeBySize
public static BloomFilter initializeBySize(long numBits, int numHashes, long seed, org.apache.datasketches.memory.WritableMemory dstMem) Initializes a BloomFilter with given number of bits and number of hash functions, using the provided base seed for the hash function and writing into the provided WritableMemory.- Parameters:
numBits
- The size of the BloomFilter, in bitsnumHashes
- The number of hash functions to apply to itemsseed
- A base hash seeddstMem
- A WritableMemory to hold the initialized filter- Returns:
- A new BloomFilter configured for the given input parameters
-