Jaccard Similarity

These objects provide measures related to the Jaccard similarity of theta_sketch and tuple_sketch objects.

Note that there are separate classes to be used for theta and tuple sketches.

class theta_jaccard_similarity

An object to help compute Jaccard similarity between theta sketches.

jaccard(sketch_a: _datasketches.theta_sketch, sketch_b: _datasketches.theta_sketch, seed: int = 9001) list[float]

Returns a list with {lower_bound, estimate, upper_bound} of the Jaccard similarity between sketches

exactly_equal(sketch_a: _datasketches.theta_sketch, sketch_b: _datasketches.theta_sketch, seed: int = 9001) bool

Returns True if sketch_a and sketch_b are equivalent, otherwise False

similarity_test(actual: _datasketches.theta_sketch, expected: _datasketches.theta_sketch, threshold: float, seed: int = 9001) bool

Tests similarity of an actual sketch against an expected sketch. Computers the lower bound of the Jaccard index J_{LB} of the actual and expected sketches. If J_{LB} >= threshold, then the sketches are considered to be similar with a confidence of 97.7% and returns True, otherwise False.

dissimilarity_test(actual: _datasketches.theta_sketch, expected: _datasketches.theta_sketch, threshold: float, seed: int = 9001) bool

Tests dissimilarity of an actual sketch against an expected sketch. Computers the lower bound of the Jaccard index J_{UB} of the actual and expected sketches. If J_{UB} <= threshold, then the sketches are considered to be dissimilar with a confidence of 97.7% and returns True, otherwise False.

class tuple_jaccard_similarity

An object to help compute Jaccard similarity between tuple sketches.

jaccard(sketch_a: _datasketches.tuple_sketch, sketch_b: _datasketches.tuple_sketch, seed: int = 9001) list[float]

Returns a list with {lower_bound, estimate, upper_bound} of the Jaccard similarity between sketches

exactly_equal(sketch_a: _datasketches.tuple_sketch, sketch_b: _datasketches.tuple_sketch, seed: int = 9001) bool

Returns True if sketch_a and sketch_b are equivalent, otherwise False

similarity_test(actual: _datasketches.tuple_sketch, expected: _datasketches.tuple_sketch, threshold: float, seed: int = 9001) bool

Tests similarity of an actual sketch against an expected sketch. Computes the lower bound of the Jaccard index J_{LB} of the actual and expected sketches. If J_{LB} >= threshold, then the sketches are considered to be similar with a confidence of 97.7% and returns True, otherwise False.

dissimilarity_test(actual: _datasketches.tuple_sketch, expected: _datasketches.tuple_sketch, threshold: float, seed: int = 9001) bool

Tests dissimilarity of an actual sketch against an expected sketch. Computes the upper bound of the Jaccard index J_{UB} of the actual and expected sketches. If J_{UB} <= threshold, then the sketches are considered to be dissimilar with a confidence of 97.7% and returns True, otherwise False.