Class LibMatrixReorg


  • public class LibMatrixReorg
    extends Object
    MB: Library for selected matrix reorg operations including special cases and all combinations of dense and sparse representations. Current list of supported operations: - reshape, - r' (transpose), - rdiag (diagV2M/diagM2V), - rsort (sorting data/indexes) - rmempty (remove empty) - rexpand (outer/table-seq expansion)
    • Field Detail

      • PAR_NUMCELL_THRESHOLD

        public static long PAR_NUMCELL_THRESHOLD
      • PAR_NUMCELL_THRESHOLD_SORT

        public static final int PAR_NUMCELL_THRESHOLD_SORT
        See Also:
        Constant Field Values
      • SPARSE_OUTPUTS_IN_CSR

        public static final boolean SPARSE_OUTPUTS_IN_CSR
        See Also:
        Constant Field Values
    • Method Detail

      • isSupportedReorgOperator

        public static boolean isSupportedReorgOperator​(ReorgOperator op)
      • sort

        public static MatrixBlock sort​(MatrixBlock in,
                                       MatrixBlock out,
                                       int[] by,
                                       boolean desc,
                                       boolean ixret,
                                       int k)
        Parameters:
        in - Input matrix to sort
        out - Output matrix where the sorted input is inserted to
        by - The Ordering parameter
        desc - A boolean, specifying if it should be descending order.
        ixret - A boolean, specifying if the return should be the sorted indexes.
        k - Number of parallel threads
        Returns:
        The sorted out matrix.
      • reshape

        public static MatrixBlock reshape​(MatrixBlock in,
                                          MatrixBlock out,
                                          int rows,
                                          int cols,
                                          boolean rowwise)
        CP reshape operation (single input, single output matrix) NOTE: In contrast to R, the rowwise parameter specifies both the read and write order, with row-wise being the default, while R uses always a column-wise read, rowwise specifying the write order and column-wise being the default.
        Parameters:
        in - input matrix
        out - output matrix
        rows - number of rows
        cols - number of columns
        rowwise - if true, reshape by row
        Returns:
        output matrix
      • reshape

        public static List<IndexedMatrixValue> reshape​(IndexedMatrixValue in,
                                                       DataCharacteristics mcIn,
                                                       DataCharacteristics mcOut,
                                                       boolean rowwise,
                                                       boolean outputEmptyBlocks)
        MR/SPARK reshape interface - for reshape we cannot view blocks independently, and hence, there are different CP and MR interfaces.
        Parameters:
        in - indexed matrix value
        mcIn - input matrix characteristics
        mcOut - output matrix characteristics
        rowwise - if true, reshape by row
        outputEmptyBlocks - output blocks with nnz=0
        Returns:
        list of indexed matrix values
      • rmempty

        public static MatrixBlock rmempty​(MatrixBlock in,
                                          MatrixBlock ret,
                                          boolean rows,
                                          boolean emptyReturn,
                                          MatrixBlock select)
        CP rmempty operation (single input, single output matrix)
        Parameters:
        in - input matrix
        ret - output matrix
        rows - ?
        emptyReturn - return row/column of zeros for empty input
        select - ?
        Returns:
        matrix block
      • rmempty

        public static void rmempty​(IndexedMatrixValue data,
                                   IndexedMatrixValue offset,
                                   boolean rmRows,
                                   long len,
                                   long blen,
                                   ArrayList<IndexedMatrixValue> outList)
        MR rmempty interface - for rmempty we cannot view blocks independently, and hence, there are different CP and MR interfaces.
        Parameters:
        data - ?
        offset - ?
        rmRows - ?
        len - ?
        blen - block length
        outList - list of indexed matrix values
      • rexpand

        public static MatrixBlock rexpand​(MatrixBlock in,
                                          MatrixBlock ret,
                                          double max,
                                          boolean rows,
                                          boolean cast,
                                          boolean ignore,
                                          int k)
        CP rexpand operation (single input, single output), the classic example of this operation is one hot encoding of a column to multiple columns.
        Parameters:
        in - Input matrix
        ret - Output matrix
        max - Number of rows/cols of the output
        rows - If the expansion is in rows direction
        cast - If the values contained should be cast to double (rounded up and down)
        ignore - Ignore if the input contain values below zero that technically is incorrect input.
        k - Degree of parallelism
        Returns:
        Output matrix rexpanded
      • rexpand

        public static MatrixBlock rexpand​(MatrixBlock in,
                                          MatrixBlock ret,
                                          int max,
                                          boolean rows,
                                          boolean cast,
                                          boolean ignore,
                                          int k)
        CP rexpand operation (single input, single output), the classic example of this operation is one hot encoding of a column to multiple columns.
        Parameters:
        in - Input matrix
        ret - Output matrix
        max - Number of rows/cols of the output
        rows - If the expansion is in rows direction
        cast - If the values contained should be cast to double (rounded up and down)
        ignore - Ignore if the input contain values below zero that technically is incorrect input.
        k - Degree of parallelism
        Returns:
        Output matrix rexpanded
      • checkRexpand

        public static void checkRexpand​(MatrixBlock in,
                                        boolean ignore)
        Quick check if the input is valid for rexpand, this check does not guarantee that the input is valid for rexpand
        Parameters:
        in - Input matrix block
        ignore - If zero valued cells should be ignored
      • rexpand

        public static void rexpand​(IndexedMatrixValue data,
                                   double max,
                                   boolean rows,
                                   boolean cast,
                                   boolean ignore,
                                   long blen,
                                   ArrayList<IndexedMatrixValue> outList)
        MR/Spark rexpand operation (single input, multiple outputs incl empty blocks)
        Parameters:
        data - Input indexed matrix block
        max - Total nrows/cols of the output
        rows - If the expansion is in rows direction
        cast - If the values contained should be cast to double (rounded up and down)
        ignore - Ignore if the input contain values below zero that technically is incorrect input.
        blen - The block size to slice the output up into
        outList - The output indexedMatrixValues (a list to add all the output blocks to / modify)
      • countNnzPerColumn

        public static int[] countNnzPerColumn​(MatrixBlock in)
      • countNnzPerColumn

        public static int[] countNnzPerColumn​(MatrixBlock in,
                                              int rl,
                                              int ru)
      • mergeNnzCounts

        public static int[] mergeNnzCounts​(int[] cnt,
                                           int[] cnt2)