Interface ComEstFactory


  • public interface ComEstFactory
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static org.apache.commons.logging.Log LOG  
    • Method Summary

      Static Methods 
      Modifier and Type Method Description
      static AComEst createEstimator​(MatrixBlock data, CompressionSettings cs, int k)
      Create an estimator for the input data with the given settings and parallelization degree.
      static AComEst createEstimator​(MatrixBlock data, CompressionSettings cs, int sampleSize, int k)
      Create an estimator for the input data with the given settings and parallelization degree.
      static int getSampleSize​(double samplePower, int nRows, int nCols, double sparsity, int minSampleSize, int maxSampleSize)
      This function returns the sample size to use.
    • Field Detail

      • LOG

        static final org.apache.commons.logging.Log LOG
    • Method Detail

      • createEstimator

        static AComEst createEstimator​(MatrixBlock data,
                                       CompressionSettings cs,
                                       int k)
        Create an estimator for the input data with the given settings and parallelization degree.
        Parameters:
        data - The matrix to extract compression information from.
        cs - The settings for the compression
        k - The parallelization degree
        Returns:
        A new CompressionSizeEstimator used to extract information of column groups
      • createEstimator

        static AComEst createEstimator​(MatrixBlock data,
                                       CompressionSettings cs,
                                       int sampleSize,
                                       int k)
        Create an estimator for the input data with the given settings and parallelization degree.
        Parameters:
        data - The matrix to extract compression information from.
        cs - The settings for the compression
        sampleSize - The number of rows to extract from the input data to extract information from.
        k - The parallelization degree
        Returns:
        A new CompressionSizeEstimator used to extract information of column groups
      • getSampleSize

        static int getSampleSize​(double samplePower,
                                 int nRows,
                                 int nCols,
                                 double sparsity,
                                 int minSampleSize,
                                 int maxSampleSize)
        This function returns the sample size to use. The sampling is bound by the maximum sampling and the minimum sampling. The sampling is calculated based on the a power of the number of rows and a sampling fraction
        Parameters:
        samplePower - The sample power
        nRows - The number of rows
        nCols - The number of columns
        sparsity - The sparsity of the input
        minSampleSize - The minimum sample size
        maxSampleSize - The maximum sample size
        Returns:
        The sample size to use.