Jaccard Similarity
These objects provide measures related to the Jaccard similarity
of theta_sketch
and tuple_sketch
objects.
Note that there are separate classes to be used for theta and tuple sketches.
- class theta_jaccard_similarity
An object to help compute Jaccard similarity between theta sketches.
- jaccard(sketch_a: _datasketches.theta_sketch, sketch_b: _datasketches.theta_sketch, seed: int = 9001) list[float]
Returns a list with {lower_bound, estimate, upper_bound} of the Jaccard similarity between sketches
- exactly_equal(sketch_a: _datasketches.theta_sketch, sketch_b: _datasketches.theta_sketch, seed: int = 9001) bool
Returns True if sketch_a and sketch_b are equivalent, otherwise False
- similarity_test(actual: _datasketches.theta_sketch, expected: _datasketches.theta_sketch, threshold: float, seed: int = 9001) bool
Tests similarity of an actual sketch against an expected sketch. Computers the lower bound of the Jaccard index J_{LB} of the actual and expected sketches. If J_{LB} >= threshold, then the sketches are considered to be similar with a confidence of 97.7% and returns True, otherwise False.
- dissimilarity_test(actual: _datasketches.theta_sketch, expected: _datasketches.theta_sketch, threshold: float, seed: int = 9001) bool
Tests dissimilarity of an actual sketch against an expected sketch. Computers the lower bound of the Jaccard index J_{UB} of the actual and expected sketches. If J_{UB} <= threshold, then the sketches are considered to be dissimilar with a confidence of 97.7% and returns True, otherwise False.
- class tuple_jaccard_similarity
An object to help compute Jaccard similarity between tuple sketches.
- jaccard(sketch_a: _datasketches.tuple_sketch, sketch_b: _datasketches.tuple_sketch, seed: int = 9001) list[float]
Returns a list with {lower_bound, estimate, upper_bound} of the Jaccard similarity between sketches
- exactly_equal(sketch_a: _datasketches.tuple_sketch, sketch_b: _datasketches.tuple_sketch, seed: int = 9001) bool
Returns True if sketch_a and sketch_b are equivalent, otherwise False
- similarity_test(actual: _datasketches.tuple_sketch, expected: _datasketches.tuple_sketch, threshold: float, seed: int = 9001) bool
Tests similarity of an actual sketch against an expected sketch. Computes the lower bound of the Jaccard index J_{LB} of the actual and expected sketches. If J_{LB} >= threshold, then the sketches are considered to be similar with a confidence of 97.7% and returns True, otherwise False.
- dissimilarity_test(actual: _datasketches.tuple_sketch, expected: _datasketches.tuple_sketch, threshold: float, seed: int = 9001) bool
Tests dissimilarity of an actual sketch against an expected sketch. Computes the upper bound of the Jaccard index J_{UB} of the actual and expected sketches. If J_{UB} <= threshold, then the sketches are considered to be dissimilar with a confidence of 97.7% and returns True, otherwise False.