Interface SampleEstimatorFactory


  • public interface SampleEstimatorFactory
    • Field Detail

      • LOG

        static final org.apache.commons.logging.Log LOG
    • Method Detail

      • distinctCount

        static int distinctCount​(int[] frequencies,
                                 int nRows,
                                 int sampleSize)
        Estimate a distinct number of values based on frequencies.
        Parameters:
        frequencies - A list of frequencies of unique values, Note all values contained should be larger than zero
        nRows - The total number of rows to consider, Note should always be larger or equal to sum(frequencies)
        sampleSize - The size of the sample, Note this should ideally be scaled to match the sum(frequencies) and should always be lower or equal to nRows
        Returns:
        A estimated number of unique values
      • distinctCount

        static int distinctCount​(int[] frequencies,
                                 int nRows,
                                 int sampleSize,
                                 SampleEstimatorFactory.EstimationType type)
        Estimate a distinct number of values based on frequencies.
        Parameters:
        frequencies - A list of frequencies of unique values, NOTE all values contained should be larger than zero
        nRows - The total number of rows to consider, NOTE should always be larger or equal to sum(frequencies)
        sampleSize - The size of the sample, NOTE this should ideally be scaled to match the sum(frequencies) and should always be lower or equal to nRows
        type - The type of estimator to use
        Returns:
        A estimated number of unique values
      • distinctCount

        static int distinctCount​(int[] frequencies,
                                 int nRows,
                                 int sampleSize,
                                 SampleEstimatorFactory.EstimationType type,
                                 HashMap<Integer,​Double> solveCache)
        Estimate a distinct number of values based on frequencies.
        Parameters:
        frequencies - A list of frequencies of unique values, NOTE all values contained should be larger than zero!
        nRows - The total number of rows to consider, NOTE should always be larger or equal to sum(frequencies)
        sampleSize - The size of the sample, NOTE this should ideally be scaled to match the sum(frequencies) and should always be lower or equal to nRows
        type - The type of estimator to use
        solveCache - A solve cache to avoid repeated calculations
        Returns:
        A estimated number of unique values