Base class for Jaccard similarity.
More...
#include <theta_jaccard_similarity_base.hpp>
|
template<typename SketchA , typename SketchB > |
static std::array< double, 3 > | jaccard (const SketchA &sketch_a, const SketchB &sketch_b, uint64_t seed=DEFAULT_SEED) |
| Computes the Jaccard similarity index with upper and lower bounds. More...
|
|
template<typename SketchA , typename SketchB > |
static bool | exactly_equal (const SketchA &sketch_a, const SketchB &sketch_b, uint64_t seed=DEFAULT_SEED) |
| Returns true if the two given sketches are equivalent. More...
|
|
template<typename SketchA , typename SketchB > |
static bool | similarity_test (const SketchA &actual, const SketchB &expected, double threshold, uint64_t seed=DEFAULT_SEED) |
| Tests similarity of an actual Sketch against an expected Sketch. More...
|
|
template<typename SketchA , typename SketchB > |
static bool | dissimilarity_test (const SketchA &actual, const SketchB &expected, double threshold, uint64_t seed=DEFAULT_SEED) |
| Tests dissimilarity of an actual Sketch against an expected Sketch. More...
|
|
template<typename Union, typename Intersection, typename ExtractKey>
class datasketches::jaccard_similarity_base< Union, Intersection, ExtractKey >
Base class for Jaccard similarity.
◆ jaccard()
static std::array<double, 3> jaccard |
( |
const SketchA & |
sketch_a, |
|
|
const SketchB & |
sketch_b, |
|
|
uint64_t |
seed = DEFAULT_SEED |
|
) |
| |
|
inlinestatic |
Computes the Jaccard similarity index with upper and lower bounds.
The Jaccard similarity index J(A,B) = (A ^ B)/(A U B) is used to measure how similar the two sketches are to each other. If J = 1.0, the sketches are considered equal. If J = 0, the two sketches are disjoint. A Jaccard of .95 means the overlap between the two sets is 95% of the union of the two sets.
Note: For very large pairs of sketches, where the configured nominal entries of the sketches are 2^25 or 2^26, this method may produce unpredictable results.
- Parameters
-
sketch_a | given sketch A |
sketch_b | given sketch B |
seed | for the hash function that was used to create the sketch |
- Returns
- a double array {LowerBound, Estimate, UpperBound} of the Jaccard index. The Upper and Lower bounds are for a confidence interval of 95.4% or +/- 2 standard deviations.
◆ exactly_equal()
static bool exactly_equal |
( |
const SketchA & |
sketch_a, |
|
|
const SketchB & |
sketch_b, |
|
|
uint64_t |
seed = DEFAULT_SEED |
|
) |
| |
|
inlinestatic |
Returns true if the two given sketches are equivalent.
- Parameters
-
sketch_a | the given sketch A |
sketch_b | the given sketch B |
seed | for the hash function that was used to create the sketch |
- Returns
- true if the two given sketches are exactly equal
◆ similarity_test()
static bool similarity_test |
( |
const SketchA & |
actual, |
|
|
const SketchB & |
expected, |
|
|
double |
threshold, |
|
|
uint64_t |
seed = DEFAULT_SEED |
|
) |
| |
|
inlinestatic |
Tests similarity of an actual Sketch against an expected Sketch.
Computes the lower bound of the Jaccard index JLB of the actual and expected sketches. if JLB ≥ threshold, then the sketches are considered to be similar with a confidence of 97.7%.
- Parameters
-
actual | the sketch to be tested |
expected | the reference sketch that is considered to be correct |
threshold | a real value between zero and one |
seed | for the hash function that was used to create the sketch |
- Returns
- true if the similarity of the two sketches is greater than the given threshold with at least 97.7% confidence
◆ dissimilarity_test()
static bool dissimilarity_test |
( |
const SketchA & |
actual, |
|
|
const SketchB & |
expected, |
|
|
double |
threshold, |
|
|
uint64_t |
seed = DEFAULT_SEED |
|
) |
| |
|
inlinestatic |
Tests dissimilarity of an actual Sketch against an expected Sketch.
Computes the upper bound of the Jaccard index JUB of the actual and expected sketches. if JUB ≤ threshold, then the sketches are considered to be dissimilar with a confidence of 97.7%.
- Parameters
-
actual | the sketch to be tested |
expected | the reference sketch that is considered to be correct |
threshold | a real value between zero and one |
seed | for the hash function that was used to create the sketch |
- Returns
- true if the dissimilarity of the two sketches is greater than the given threshold with at least 97.7% confidence
The documentation for this class was generated from the following file: