Class BinomialBoundsN


  • public final class BinomialBoundsN
    extends Object
    This class enables the estimation of error bounds given a sample set size, the sampling probability theta, the number of standard deviations and a simple noDataSeen flag. This can be used to estimate error bounds for fixed threshold sampling as well as the error bounds calculations for sketches.
    Author:
    Kevin Lang
    • Method Detail

      • getLowerBound

        public static double getLowerBound​(long numSamples,
                                           double theta,
                                           int numSDev,
                                           boolean noDataSeen)
        Returns the approximate lower bound value
        Parameters:
        numSamples - the number of samples in the sample set
        theta - the sampling probability
        numSDev - the number of "standard deviations" from the mean for the tail bounds. This must be an integer value of 1, 2 or 3.
        noDataSeen - this is normally false. However, in the case where you have zero samples and a theta < 1.0, this flag enables the distinction between a virgin case when no actual data has been seen and the case where the estimate may be zero but an upper error bound may still exist.
        Returns:
        the approximate lower bound value
      • getUpperBound

        public static double getUpperBound​(long numSamples,
                                           double theta,
                                           int numSDev,
                                           boolean noDataSeen)
        Returns the approximate upper bound value
        Parameters:
        numSamples - the number of samples in the sample set
        theta - the sampling probability
        numSDev - the number of "standard deviations" from the mean for the tail bounds. This must be an integer value of 1, 2 or 3.
        noDataSeen - this is normally false. However, in the case where you have zero samples and a theta < 1.0, this flag enables the distinction between a virgin case when no actual data has been seen and the case where the estimate may be zero but an upper error bound may still exist.
        Returns:
        the approximate upper bound value