Interface SampleEstimatorFactory
-
public interface SampleEstimatorFactory
-
-
Nested Class Summary
Nested Classes Modifier and Type Interface Description static class
SampleEstimatorFactory.EstimationType
-
Field Summary
Fields Modifier and Type Field Description static org.apache.commons.logging.Log
LOG
-
Method Summary
Static Methods Modifier and Type Method Description static int
distinctCount(int[] frequencies, int nRows, int sampleSize)
Estimate a distinct number of values based on frequencies.static int
distinctCount(int[] frequencies, int nRows, int sampleSize, SampleEstimatorFactory.EstimationType type)
Estimate a distinct number of values based on frequencies.static int
distinctCount(int[] frequencies, int nRows, int sampleSize, SampleEstimatorFactory.EstimationType type, HashMap<Integer,Double> solveCache)
Estimate a distinct number of values based on frequencies.
-
-
-
Method Detail
-
distinctCount
static int distinctCount(int[] frequencies, int nRows, int sampleSize)
Estimate a distinct number of values based on frequencies.- Parameters:
frequencies
- A list of frequencies of unique values, Note all values contained should be larger than zeronRows
- The total number of rows to consider, Note should always be larger or equal to sum(frequencies)sampleSize
- The size of the sample, Note this should ideally be scaled to match the sum(frequencies) and should always be lower or equal to nRows- Returns:
- A estimated number of unique values
-
distinctCount
static int distinctCount(int[] frequencies, int nRows, int sampleSize, SampleEstimatorFactory.EstimationType type)
Estimate a distinct number of values based on frequencies.- Parameters:
frequencies
- A list of frequencies of unique values, NOTE all values contained should be larger than zeronRows
- The total number of rows to consider, NOTE should always be larger or equal to sum(frequencies)sampleSize
- The size of the sample, NOTE this should ideally be scaled to match the sum(frequencies) and should always be lower or equal to nRowstype
- The type of estimator to use- Returns:
- A estimated number of unique values
-
distinctCount
static int distinctCount(int[] frequencies, int nRows, int sampleSize, SampleEstimatorFactory.EstimationType type, HashMap<Integer,Double> solveCache)
Estimate a distinct number of values based on frequencies.- Parameters:
frequencies
- A list of frequencies of unique values, NOTE all values contained should be larger than zero!nRows
- The total number of rows to consider, NOTE should always be larger or equal to sum(frequencies)sampleSize
- The size of the sample, NOTE this should ideally be scaled to match the sum(frequencies) and should always be lower or equal to nRowstype
- The type of estimator to usesolveCache
- A solve cache to avoid repeated calculations- Returns:
- A estimated number of unique values
-
-