Class AComEst
- java.lang.Object
-
- org.apache.sysds.runtime.compress.estim.AComEst
-
- Direct Known Subclasses:
ComEstCompressed
,ComEstExact
,ComEstSample
public abstract class AComEst extends Object
Main abstract class for estimating size of compressions on columns.
-
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description void
clearNNZ()
Clear the pointer to the materialized list of nnz in columnsCompressedSizeInfoColGroup
combine(IColIndex combinedColumns, CompressedSizeInfoColGroup g1, CompressedSizeInfoColGroup g2)
Combine two analyzed column groups together.CompressedSizeInfoColGroup
combine(CompressedSizeInfoColGroup g1, CompressedSizeInfoColGroup g2)
combine two analyzed column groups together.CompressedSizeInfo
computeCompressedSizeInfos(int k)
Multi threaded version of extracting compression size infoCompressedSizeInfoColGroup
getColGroupInfo(IColIndex colIndexes)
Method for extracting Compressed Size Info of specified columns, together in a single ColGroupabstract CompressedSizeInfoColGroup
getColGroupInfo(IColIndex colIndexes, int estimate, int nrUniqueUpperBound)
A method to extract the Compressed Size Info for a given list of columns, This method further limits the estimated number of unique values, since in some cases the estimated number of uniques is estimated higher than the number estimated in sub groups of the given colIndexes.CompressedSizeInfoColGroup
getDeltaColGroupInfo(IColIndex colIndexes)
Method for extracting info of specified columns as delta encodings (delta from previous rows values)abstract CompressedSizeInfoColGroup
getDeltaColGroupInfo(IColIndex colIndexes, int estimate, int nrUniqueUpperBound)
A method to extract the Compressed Size Info for a given list of columns, This method further limits the estimated number of unique values, since in some cases the estimated number of uniques is estimated higher than the number estimated in sub groups of the given colIndexes.int
getNumColumns()
Get the number of columns in the overall compressing block.int
getNumRows()
Get the number of rows in the overall compressing block.String
toString()
-
-
-
Method Detail
-
getNumRows
public int getNumRows()
Get the number of rows in the overall compressing block.- Returns:
- The number of rows
-
getNumColumns
public int getNumColumns()
Get the number of columns in the overall compressing block.- Returns:
- The number of cols
-
computeCompressedSizeInfos
public final CompressedSizeInfo computeCompressedSizeInfos(int k)
Multi threaded version of extracting compression size info- Parameters:
k
- The concurrency degree.- Returns:
- The Compression Size info of each Column compressed isolated.
-
getColGroupInfo
public final CompressedSizeInfoColGroup getColGroupInfo(IColIndex colIndexes)
Method for extracting Compressed Size Info of specified columns, together in a single ColGroup- Parameters:
colIndexes
- The columns to group together inside a ColGroup- Returns:
- The CompressedSizeInformation associated with the selected ColGroups.
-
getColGroupInfo
public abstract CompressedSizeInfoColGroup getColGroupInfo(IColIndex colIndexes, int estimate, int nrUniqueUpperBound)
A method to extract the Compressed Size Info for a given list of columns, This method further limits the estimated number of unique values, since in some cases the estimated number of uniques is estimated higher than the number estimated in sub groups of the given colIndexes.- Parameters:
colIndexes
- The columns to extract compression information fromestimate
- An estimate of number of unique elements in these columnsnrUniqueUpperBound
- The upper bound of unique elements allowed in the estimate, can be calculated from the number of unique elements estimated in sub columns multiplied together. This is flexible in the sense that if the sample is small then this unique can be manually edited like in CoCodeCostMatrixMult.- Returns:
- The CompressedSizeInfoColGroup for the given column indexes.
-
getDeltaColGroupInfo
public final CompressedSizeInfoColGroup getDeltaColGroupInfo(IColIndex colIndexes)
Method for extracting info of specified columns as delta encodings (delta from previous rows values)- Parameters:
colIndexes
- The columns to group together inside a ColGroup- Returns:
- The CompressedSizeInformation assuming delta encoding of the column.
-
getDeltaColGroupInfo
public abstract CompressedSizeInfoColGroup getDeltaColGroupInfo(IColIndex colIndexes, int estimate, int nrUniqueUpperBound)
A method to extract the Compressed Size Info for a given list of columns, This method further limits the estimated number of unique values, since in some cases the estimated number of uniques is estimated higher than the number estimated in sub groups of the given colIndexes. The Difference for this method is that it extract the values as delta values from the matrix block input.- Parameters:
colIndexes
- The columns to extract compression information fromestimate
- An estimate of number of unique delta elements in these columnsnrUniqueUpperBound
- The upper bound of unique elements allowed in the estimate, can be calculated from the number of unique elements estimated in sub columns multiplied together. This is flexible in the sense that if the sample is small then this unique can be manually edited like in CoCodeCostMatrixMult.- Returns:
- The CompressedSizeInfoColGroup for the given column indexes.
-
combine
public final CompressedSizeInfoColGroup combine(CompressedSizeInfoColGroup g1, CompressedSizeInfoColGroup g2)
combine two analyzed column groups together. without materializing the dictionaries of either side. if the number of distinct elements in both sides multiplied is larger than Integer, return null. If either side was constructed without analysis then fall back to default materialization of double arrays. O- Parameters:
g1
- First groupg2
- Second group- Returns:
- A combined compressed size estimation for the group.
-
combine
public final CompressedSizeInfoColGroup combine(IColIndex combinedColumns, CompressedSizeInfoColGroup g1, CompressedSizeInfoColGroup g2)
Combine two analyzed column groups together. without materializing the dictionaries of either side. if the number of distinct elements in both sides multiplied is larger than Integer, return null. If either side was constructed without analysis then fall back to default materialization of double arrays.- Parameters:
combinedColumns
- The combined column indexes.g1
- First groupg2
- Second group- Returns:
- A combined compressed size estimation for the columns specified using the combining algorithm
-
clearNNZ
public void clearNNZ()
Clear the pointer to the materialized list of nnz in columns
-
-