Confidence intervals for binomial proportions. More...

#include <bounds_binomial_proportions.hpp>

Static Public Member Functions
static double	approximate_lower_bound_on_p (uint64_t n, uint64_t k, double num_std_devs)
	Computes lower bound of approximate Clopper-Pearson confidence interval for a binomial proportion.

static double	approximate_upper_bound_on_p (uint64_t n, uint64_t k, double num_std_devs)
	Computes upper bound of approximate Clopper-Pearson confidence interval for a binomial proportion.

static double	estimate_unknown_p (uint64_t n, uint64_t k)
	Computes an estimate of an unknown binomial proportion.

static double	erf (double x)
	Computes an approximation to the erf() function.

static double	normal_cdf (double x)
	Computes an approximation to normal_cdf(x).

Detailed Description

Confidence intervals for binomial proportions.

This class computes an approximation to the Clopper-Pearson confidence interval for a binomial proportion. Exact Clopper-Pearson intervals are strictly conservative, but these approximations are not.

The main inputs are numbers n and k, which are not the same as other things that are called n and k in our sketching library. There is also a third parameter, numStdDev, that specifies the desired confidence level.

n is the number of independent randomized trials. It is given and therefore known.
p is the probability of a trial being a success. It is unknown.
k is the number of trials (out of n) that turn out to be successes. It is a random variable governed by a binomial distribution. After any given batch of n independent trials, the random variable k has a specific value which is observed and is therefore known.
pHat = k / n is an unbiased estimate of the unknown success probability p.

Alternatively, consider a coin with unknown heads probability p. Where n is the number of independent flips of that coin, and k is the number of times that the coin comes up heads during a given batch of n flips. This class computes a frequentist confidence interval [lowerBoundOnP, upperBoundOnP] for the unknown p.

Conceptually, the desired confidence level is specified by a tail probability delta.

Ideally, over a large ensemble of independent batches of trials, the fraction of batches in which the true p lies below lowerBoundOnP would be at most delta, and the fraction of batches in which the true p lies above upperBoundOnP would also be at most delta.

Setting aside the philosophical difficulties attaching to that statement, it isn't quite true because we are approximating the Clopper-Pearson interval.

Finally, we point out that in this class's interface, the confidence parameter delta is not specified directly, but rather through a "number of standard deviations" numStdDev. The library effectively converts that to a delta via delta = normalCDF (-1.0 * numStdDev).

It is perhaps worth emphasizing that the library is NOT merely adding and subtracting numStdDev standard deviations to the estimate. It is doing something better, that to some extent accounts for the fact that the binomial distribution has a non-gaussian shape.

In particular, it is using an approximation to the inverse of the incomplete beta function that appears as formula 26.5.22 on page 945 of the "Handbook of Mathematical Functions" by Abramowitz and Stegun.

Author: Kevin Lang; Jon Malkin

Member Function Documentation

◆ approximate_lower_bound_on_p()

static double approximate_lower_bound_on_p	(	uint64_t	n,
		uint64_t	k,
		double	num_std_devs
	)

inlinestatic

Computes lower bound of approximate Clopper-Pearson confidence interval for a binomial proportion.

Implementation Notes:
The approximateLowerBoundOnP is defined with respect to the right tail of the binomial distribution.

We want to solve for the p for which sum_j,k,nbino(j;n,p) = delta.
We now restate that in terms of the left tail.
We want to solve for the p for which sum_j,0,(k-1)bino(j;n,p) = 1 - delta.
Define x = 1-p.
We want to solve for the x for which I_x(n-k+1,k) = 1 - delta.
We specify 1-delta via numStdDevs through the right tail of the standard normal distribution.
Smaller values of numStdDevs correspond to bigger values of 1-delta and hence to smaller values of delta. In fact, usefully small values of delta correspond to negative values of numStdDevs.
return p = 1-x.

Parameters

n	is the number of trials. Must be non-negative.
k	is the number of successes. Must be non-negative, and cannot exceed n.
num_std_devs	the number of standard deviations defining the confidence interval

Returns: the lower bound of the approximate Clopper-Pearson confidence interval for the unknown success probability.

◆ approximate_upper_bound_on_p()

static double approximate_upper_bound_on_p	(	uint64_t	n,
		uint64_t	k,
		double	num_std_devs
	)

inlinestatic

Computes upper bound of approximate Clopper-Pearson confidence interval for a binomial proportion.

Implementation Notes:
The approximateUpperBoundOnP is defined with respect to the left tail of the binomial distribution.

We want to solve for the p for which sum_j,0,kbino(j;n,p) = delta.
Define x = 1-p.
We want to solve for the x for which I_x(n-k,k+1) = delta.
We specify delta via numStdDevs through the right tail of the standard normal distribution.
Bigger values of numStdDevs correspond to smaller values of delta.
return p = 1-x.

Parameters

n	is the number of trials. Must be non-negative.
k	is the number of successes. Must be non-negative, and cannot exceed n.
num_std_devs	the number of standard deviations defining the confidence interval

Returns: the upper bound of the approximate Clopper-Pearson confidence interval for the unknown success probability.

◆ estimate_unknown_p()

static double estimate_unknown_p	(	uint64_t	n,
		uint64_t	k
	)

inlinestatic

Computes an estimate of an unknown binomial proportion.

Parameters

n	is the number of trials. Must be non-negative.
k	is the number of successes. Must be non-negative, and cannot exceed n.

Returns: the estimate of the unknown binomial proportion.

◆ erf()

static double erf ( double x )

inlinestatic

Computes an approximation to the erf() function.

Parameters

x	is the input to the erf function

Returns: returns erf(x), accurate to roughly 7 decimal digits.

◆ normal_cdf()

static double normal_cdf ( double x )

inlinestatic

Computes an approximation to normal_cdf(x).

Parameters

x	is the input to the normal_cdf function

Returns: returns the approximation to normalCDF(x).

The documentation for this class was generated from the following file:

common/include/bounds_binomial_proportions.hpp

Static Public Member Functions

Detailed Description

Member Function Documentation

◆ approximate_lower_bound_on_p()

◆ approximate_upper_bound_on_p()

◆ estimate_unknown_p()

◆ erf()

◆ normal_cdf()