Class DataConverter


  • public class DataConverter
    extends Object
    This class provides methods to read and write matrix blocks from to HDFS using different data formats. Those functionalities are used especially for CP read/write and exporting in-memory matrices to HDFS (before executing MR jobs).
    • Constructor Detail

      • DataConverter

        public DataConverter()
    • Method Detail

      • readMatrixFromHDFS

        public static MatrixBlock readMatrixFromHDFS​(ReadProperties prop)
                                              throws IOException
        Core method for reading matrices in format textcell, matrixmarket, binarycell, or binaryblock from HDFS into main memory. For expected dense matrices we directly copy value- or block-at-a-time into the target matrix. In contrast, for sparse matrices, we append (column-value)-pairs and do a final sort if required in order to prevent large reorg overheads and increased memory consumption in case of unordered inputs. DENSE MxN input: * best/average/worst: O(M*N) SPARSE MxN input * best (ordered, or binary block w/ clen<=blen): O(M*N) * average (unordered): O(M*N*log(N)) * worst (descending order per row): O(M * N^2) NOTE: providing an exact estimate of 'expected sparsity' can prevent a full copy of the result matrix block (required for changing sparse->dense, or vice versa)
        Parameters:
        prop - read properties
        Returns:
        matrix block
        Throws:
        IOException - if IOException occurs
      • convertToDoubleMatrix

        public static double[][] convertToDoubleMatrix​(MatrixBlock mb)
        Creates a two-dimensional double matrix of the input matrix block.
        Parameters:
        mb - matrix block
        Returns:
        2d double array
      • convertToBooleanVector

        public static boolean[] convertToBooleanVector​(MatrixBlock mb)
      • convertVectorToIndexList

        public static int[] convertVectorToIndexList​(MatrixBlock mb)
      • convertToIntVector

        public static int[] convertToIntVector​(MatrixBlock mb)
      • convertToLongVector

        public static long[] convertToLongVector​(MatrixBlock mb)
      • convertToDoubleVector

        public static double[] convertToDoubleVector​(MatrixBlock mb)
      • convertToDoubleVector

        public static double[] convertToDoubleVector​(MatrixBlock mb,
                                                     boolean deep)
      • convertToDoubleVector

        public static double[] convertToDoubleVector​(MatrixBlock mb,
                                                     boolean deep,
                                                     boolean allowNull)
      • convertToMatrixBlock

        public static MatrixBlock convertToMatrixBlock​(double[][] data)
        Creates a dense Matrix Block and copies the given double matrix into it.
        Parameters:
        data - 2d double array
        Returns:
        matrix block
      • convertToMatrixBlock

        public static MatrixBlock convertToMatrixBlock​(int[][] data)
        Converts an Integer matrix to an MatrixBlock
        Parameters:
        data - Int matrix input that is converted to double MatrixBlock
        Returns:
        The matrixBlock constructed.
      • convertToMatrixBlock

        public static MatrixBlock convertToMatrixBlock​(double[] data,
                                                       boolean columnVector)
        Creates a dense Matrix Block and copies the given double vector into it.
        Parameters:
        data - double array
        columnVector - if true, create matrix with single column. if false, create matrix with single row
        Returns:
        matrix block
      • convertToMatrixBlock

        public static MatrixBlock convertToMatrixBlock​(HashMap<MatrixIndexes,​Double> map,
                                                       int rlen,
                                                       int clen)
        NOTE: this method also ensures the specified matrix dimensions
        Parameters:
        map - map of matrix index keys and double values
        rlen - number of rows
        clen - number of columns
        Returns:
        matrix block
      • convertToMatrixBlock

        public static MatrixBlock convertToMatrixBlock​(CTableMap map,
                                                       int rlen,
                                                       int clen)
        NOTE: this method also ensures the specified matrix dimensions
        Parameters:
        map - ?
        rlen - number of rows
        clen - number of columns
        Returns:
        matrix block
      • convertToMatrixBlock

        public static MatrixBlock convertToMatrixBlock​(FrameBlock frame)
        Converts a frame block with arbitrary schema into a matrix block. Since matrix block only supports value type double, we do a best effort conversion of non-double types which might result in errors for non-numerical data.
        Parameters:
        frame - frame block
        Returns:
        matrix block
      • convertToStringFrame

        public static String[][] convertToStringFrame​(FrameBlock frame)
        Converts a frame block with arbitrary schema into a two dimensional string array.
        Parameters:
        frame - frame block
        Returns:
        2d string array
      • convertToFrameBlock

        public static FrameBlock convertToFrameBlock​(String[][] data)
        Converts a two dimensions string array into a frame block of value type string. If the given array is null or of length 0, we return an empty frame block.
        Parameters:
        data - 2d string array
        Returns:
        frame block
      • convertToFrameBlock

        public static FrameBlock convertToFrameBlock​(MatrixBlock mb)
        Converts a matrix block into a frame block of value type double.
        Parameters:
        mb - matrix block
        Returns:
        frame block of type double
      • convertToFrameBlock

        public static FrameBlock convertToFrameBlock​(MatrixBlock mb,
                                                     int k)
        Converts a matrix block into a frame block of value type double.
        Parameters:
        mb - matrix block
        k - parallelization degree
        Returns:
        frame block of type double
      • convertToFrameBlock

        public static FrameBlock convertToFrameBlock​(MatrixBlock mb,
                                                     Types.ValueType vt)
        Converts a matrix block into a frame block of value type given.
        Parameters:
        mb - matrix block
        vt - value type target
        Returns:
        frame block of type given
      • convertToFrameBlock

        public static FrameBlock convertToFrameBlock​(MatrixBlock mb,
                                                     Types.ValueType vt,
                                                     int k)
        Converts a matrix block into a frame block of a given value type.
        Parameters:
        mb - matrix block
        vt - value type
        k - parallelization degree
        Returns:
        a return frame block with the given schema
      • convertToFrameBlock

        public static FrameBlock convertToFrameBlock​(MatrixBlock mb,
                                                     Types.ValueType[] schema)
        Converts a matrix block into a frame block of with the given schema
        Parameters:
        mb - matrix block
        schema - schema
        Returns:
        a return frame block with the given schema
      • convertToFrameBlock

        public static FrameBlock convertToFrameBlock​(MatrixBlock mb,
                                                     Types.ValueType[] schema,
                                                     int k)
        Converts a matrix block into a frame block of with the given schema
        Parameters:
        mb - matrix block
        schema - schema
        k - parallelization degree
        Returns:
        a return frame block with the given schema
      • convertToMatrixBlockPartitions

        public static MatrixBlock[] convertToMatrixBlockPartitions​(MatrixBlock mb,
                                                                   boolean colwise)
      • convertToArray2DRowRealMatrix

        public static org.apache.commons.math3.linear.Array2DRowRealMatrix convertToArray2DRowRealMatrix​(MatrixBlock mb)
        Helper method that converts SystemDS matrix variable (varname) into a Array2DRowRealMatrix format, which is useful in invoking Apache CommonsMath.
        Parameters:
        mb - matrix object
        Returns:
        matrix as a commons-math3 Array2DRowRealMatrix
      • convertToBlockRealMatrix

        public static org.apache.commons.math3.linear.BlockRealMatrix convertToBlockRealMatrix​(MatrixBlock mb)
      • convertToMatrixBlock

        public static MatrixBlock convertToMatrixBlock​(org.apache.commons.math3.linear.RealMatrix rm)
      • copyToDoubleVector

        public static void copyToDoubleVector​(MatrixBlock mb,
                                              double[] dest,
                                              int destPos)
      • toString

        public static String toString​(MatrixBlock mb,
                                      boolean sparse,
                                      String separator,
                                      String lineseparator,
                                      int rowsToPrint,
                                      int colsToPrint,
                                      int decimal)
        Returns a string representation of a matrix
        Parameters:
        mb - matrix block
        sparse - if true, string will contain a table with row index, col index, value (where value != 0.0) otherwise it will be a rectangular string with all values of the matrix block
        separator - Separator string between each element in a row, or between the columns in sparse format
        lineseparator - Separator string between each row
        rowsToPrint - maximum number of rows to print, -1 for all
        colsToPrint - maximum number of columns to print, -1 for all
        decimal - number of decimal places to print, -1 for default
        Returns:
        matrix as a string
      • toString

        public static String toString​(TensorBlock tb,
                                      boolean sparse,
                                      String separator,
                                      String lineseparator,
                                      String leftBorder,
                                      String rightBorder,
                                      int rowsToPrint,
                                      int colsToPrint,
                                      int decimal)
        Returns a string representation of a tensor
        Parameters:
        tb - tensor block
        sparse - if true, string will contain a table with row index, col index, value (where value != 0.0) otherwise it will be a rectangular string with all values of the tensor block
        separator - Separator string between each element in a row, or between the columns in sparse format
        lineseparator - Separator string between each row
        leftBorder - Characters placed at the start of a new dimension level
        rightBorder - Characters placed at the end of a new dimension level
        rowsToPrint - maximum number of rows to print, -1 for all
        colsToPrint - maximum number of columns to print, -1 for all
        decimal - number of decimal places to print, -1 for default
        Returns:
        tensor as a string
      • toString

        public static String toString​(FrameBlock fb,
                                      boolean sparse,
                                      String separator,
                                      String lineseparator,
                                      int rowsToPrint,
                                      int colsToPrint,
                                      int decimal)
      • toString

        public static String toString​(ListObject list,
                                      int rows,
                                      int cols,
                                      boolean sparse,
                                      String separator,
                                      String lineSeparator,
                                      int rowsToPrint,
                                      int colsToPrint,
                                      int decimal)
      • toDouble

        public static double[] toDouble​(float[] data)
      • toDouble

        public static double[] toDouble​(long[] data)
      • toDouble

        public static double[] toDouble​(int[] data)
      • toDouble

        public static double[] toDouble​(BitSet data,
                                        int len)
      • toDouble

        public static double[] toDouble​(String[] data)
      • toFloat

        public static float[] toFloat​(double[] data)
      • toInt

        public static int[] toInt​(double[] data)
      • toLong

        public static long[] toLong​(double[] data)
      • toBitSet

        public static BitSet toBitSet​(double[] data)
      • toString

        public static String[] toString​(double[] data)