Class MLContextConversionUtil


  • public class MLContextConversionUtil
    extends Object
    Utility class containing methods to perform data conversions.
    • Constructor Detail

      • MLContextConversionUtil

        public MLContextConversionUtil()
    • Method Detail

      • doubleMatrixToMatrixObject

        public static MatrixObject doubleMatrixToMatrixObject​(String variableName,
                                                              double[][] doubleMatrix)
        Convert a two-dimensional double array to a MatrixObject.
        Parameters:
        variableName - name of the variable associated with the matrix
        doubleMatrix - matrix of double values
        Returns:
        the two-dimensional double matrix converted to a MatrixObject
      • doubleMatrixToMatrixObject

        public static MatrixObject doubleMatrixToMatrixObject​(String variableName,
                                                              double[][] doubleMatrix,
                                                              MatrixMetadata matrixMetadata)
        Convert a two-dimensional double array to a MatrixObject.
        Parameters:
        variableName - name of the variable associated with the matrix
        doubleMatrix - matrix of double values
        matrixMetadata - the matrix metadata
        Returns:
        the two-dimensional double matrix converted to a MatrixObject
      • urlToMatrixObject

        public static MatrixObject urlToMatrixObject​(URL url,
                                                     MatrixMetadata matrixMetadata)
        Convert a matrix at a URL to a MatrixObject.
        Parameters:
        url - the URL to a matrix (in CSV or IJV format)
        matrixMetadata - the matrix metadata
        Returns:
        the matrix at a URL converted to a MatrixObject
      • matrixBlockToMatrixObject

        public static MatrixObject matrixBlockToMatrixObject​(String variableName,
                                                             MatrixBlock matrixBlock,
                                                             MatrixMetadata matrixMetadata)
        Convert a MatrixBlock to a MatrixObject.
        Parameters:
        variableName - name of the variable associated with the matrix
        matrixBlock - matrix as a MatrixBlock
        matrixMetadata - the matrix metadata
        Returns:
        the MatrixBlock converted to a MatrixObject
      • frameBlockToFrameObject

        public static FrameObject frameBlockToFrameObject​(String variableName,
                                                          FrameBlock frameBlock,
                                                          FrameMetadata frameMetadata)
        Convert a FrameBlock to a FrameObject.
        Parameters:
        variableName - name of the variable associated with the frame
        frameBlock - frame as a FrameBlock
        frameMetadata - the frame metadata
        Returns:
        the FrameBlock converted to a FrameObject
      • binaryBlocksToMatrixObject

        public static MatrixObject binaryBlocksToMatrixObject​(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,​MatrixBlock> binaryBlocks)
        Convert a JavaPairRDD<MatrixIndexes, MatrixBlock> to a MatrixObject.
        Parameters:
        binaryBlocks - JavaPairRDD<MatrixIndexes, MatrixBlock> representation of a binary-block matrix
        Returns:
        the JavaPairRDD<MatrixIndexes, MatrixBlock> matrix converted to a MatrixObject
      • binaryBlocksToMatrixObject

        public static MatrixObject binaryBlocksToMatrixObject​(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,​MatrixBlock> binaryBlocks,
                                                              MatrixMetadata matrixMetadata)
        Convert a JavaPairRDD<MatrixIndexes, MatrixBlock> to a MatrixObject.
        Parameters:
        binaryBlocks - JavaPairRDD<MatrixIndexes, MatrixBlock> representation of a binary-block matrix
        matrixMetadata - the matrix metadata
        Returns:
        the JavaPairRDD<MatrixIndexes, MatrixBlock> matrix converted to a MatrixObject
      • binaryBlocksToMatrixBlock

        public static MatrixBlock binaryBlocksToMatrixBlock​(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,​MatrixBlock> binaryBlocks,
                                                            MatrixMetadata matrixMetadata)
        Convert a JavaPairRDD<MatrixIndexes, MatrixBlock> to a MatrixBlock
        Parameters:
        binaryBlocks - JavaPairRDD<MatrixIndexes, MatrixBlock> representation of a binary-block matrix
        matrixMetadata - the matrix metadata
        Returns:
        the JavaPairRDD<MatrixIndexes, MatrixBlock> matrix converted to a MatrixBlock
      • binaryBlocksToFrameObject

        public static FrameObject binaryBlocksToFrameObject​(org.apache.spark.api.java.JavaPairRDD<Long,​FrameBlock> binaryBlocks)
        Convert a JavaPairRDD<Long, FrameBlock> to a FrameObject.
        Parameters:
        binaryBlocks - JavaPairRDD<Long, FrameBlock> representation of a binary-block frame
        Returns:
        the JavaPairRDD<Long, FrameBlock> frame converted to a FrameObject
      • binaryBlocksToFrameObject

        public static FrameObject binaryBlocksToFrameObject​(org.apache.spark.api.java.JavaPairRDD<Long,​FrameBlock> binaryBlocks,
                                                            FrameMetadata frameMetadata)
        Convert a JavaPairRDD<Long, FrameBlock> to a FrameObject.
        Parameters:
        binaryBlocks - JavaPairRDD<Long, FrameBlock> representation of a binary-block frame
        frameMetadata - the frame metadata
        Returns:
        the JavaPairRDD<Long, FrameBlock> frame converted to a FrameObject
      • dataFrameToMatrixObject

        public static MatrixObject dataFrameToMatrixObject​(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame)
        Convert a DataFrame to a MatrixObject.
        Parameters:
        dataFrame - the Spark DataFrame
        Returns:
        the DataFrame matrix converted to a converted to a MatrixObject
      • dataFrameToMatrixObject

        public static MatrixObject dataFrameToMatrixObject​(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame,
                                                           MatrixMetadata matrixMetadata)
        Convert a DataFrame to a MatrixObject.
        Parameters:
        dataFrame - the Spark DataFrame
        matrixMetadata - the matrix metadata
        Returns:
        the DataFrame matrix converted to a converted to a MatrixObject
      • dataFrameToFrameObject

        public static FrameObject dataFrameToFrameObject​(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame)
        Convert a DataFrame to a FrameObject.
        Parameters:
        dataFrame - the Spark DataFrame
        Returns:
        the DataFrame matrix converted to a converted to a FrameObject
      • dataFrameToFrameObject

        public static FrameObject dataFrameToFrameObject​(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame,
                                                         FrameMetadata frameMetadata)
        Convert a DataFrame to a FrameObject.
        Parameters:
        dataFrame - the Spark DataFrame
        frameMetadata - the frame metadata
        Returns:
        the DataFrame frame converted to a converted to a FrameObject
      • dataFrameToMatrixBinaryBlocks

        public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,​MatrixBlock> dataFrameToMatrixBinaryBlocks​(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame)
        Convert a DataFrame to a JavaPairRDD<MatrixIndexes, MatrixBlock> binary-block matrix.
        Parameters:
        dataFrame - the Spark DataFrame
        Returns:
        the DataFrame matrix converted to a JavaPairRDD<MatrixIndexes, MatrixBlock> binary-block matrix
      • dataFrameToMatrixBinaryBlocks

        public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,​MatrixBlock> dataFrameToMatrixBinaryBlocks​(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame,
                                                                                                                           MatrixMetadata matrixMetadata)
        Convert a DataFrame to a JavaPairRDD<MatrixIndexes, MatrixBlock> binary-block matrix.
        Parameters:
        dataFrame - the Spark DataFrame
        matrixMetadata - the matrix metadata
        Returns:
        the DataFrame matrix converted to a JavaPairRDD<MatrixIndexes, MatrixBlock> binary-block matrix
      • dataFrameToFrameBinaryBlocks

        public static org.apache.spark.api.java.JavaPairRDD<Long,​FrameBlock> dataFrameToFrameBinaryBlocks​(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame,
                                                                                                                FrameMetadata frameMetadata)
        Convert a DataFrame to a JavaPairRDD<Long, FrameBlock> binary-block frame.
        Parameters:
        dataFrame - the Spark DataFrame
        frameMetadata - the frame metadata
        Returns:
        the DataFrame matrix converted to a JavaPairRDD<Long, FrameBlock> binary-block frame
      • determineMatrixFormatIfNeeded

        public static void determineMatrixFormatIfNeeded​(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame,
                                                         MatrixMetadata matrixMetadata)
        If the MatrixFormat of the DataFrame has not been explicitly specified, attempt to determine the proper MatrixFormat.
        Parameters:
        dataFrame - the Spark DataFrame
        matrixMetadata - the matrix metadata, if available
      • determineFrameFormatIfNeeded

        public static void determineFrameFormatIfNeeded​(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame,
                                                        FrameMetadata frameMetadata)
        If the FrameFormat of the DataFrame has not been explicitly specified, attempt to determine the proper FrameFormat.
        Parameters:
        dataFrame - the Spark DataFrame
        frameMetadata - the frame metadata, if available
      • isDataFrameWithIDColumn

        public static boolean isDataFrameWithIDColumn​(MatrixMetadata matrixMetadata)
        Return whether or not the DataFrame has an ID column.
        Parameters:
        matrixMetadata - the matrix metadata
        Returns:
        true if the DataFrame has an ID column, false otherwise.
      • isDataFrameWithIDColumn

        public static boolean isDataFrameWithIDColumn​(FrameMetadata frameMetadata)
        Return whether or not the DataFrame has an ID column.
        Parameters:
        frameMetadata - the frame metadata
        Returns:
        true if the DataFrame has an ID column, false otherwise.
      • isVectorBasedDataFrame

        public static boolean isVectorBasedDataFrame​(MatrixMetadata matrixMetadata)
        Return whether or not the DataFrame is vector-based.
        Parameters:
        matrixMetadata - the matrix metadata
        Returns:
        true if the DataFrame is vector-based, false otherwise.
      • javaRDDStringCSVToMatrixObject

        public static MatrixObject javaRDDStringCSVToMatrixObject​(org.apache.spark.api.java.JavaRDD<String> javaRDD)
        Convert a JavaRDD<String> in CSV format to a MatrixObject
        Parameters:
        javaRDD - the Java RDD of strings
        Returns:
        the JavaRDD<String> converted to a MatrixObject
      • javaRDDStringCSVToMatrixObject

        public static MatrixObject javaRDDStringCSVToMatrixObject​(org.apache.spark.api.java.JavaRDD<String> javaRDD,
                                                                  MatrixMetadata matrixMetadata)
        Convert a JavaRDD<String> in CSV format to a MatrixObject
        Parameters:
        javaRDD - the Java RDD of strings
        matrixMetadata - matrix metadata
        Returns:
        the JavaRDD<String> converted to a MatrixObject
      • javaRDDStringCSVToFrameObject

        public static FrameObject javaRDDStringCSVToFrameObject​(org.apache.spark.api.java.JavaRDD<String> javaRDD)
        Convert a JavaRDD<String> in CSV format to a FrameObject
        Parameters:
        javaRDD - the Java RDD of strings
        Returns:
        the JavaRDD<String> converted to a FrameObject
      • javaRDDStringCSVToFrameObject

        public static FrameObject javaRDDStringCSVToFrameObject​(org.apache.spark.api.java.JavaRDD<String> javaRDD,
                                                                FrameMetadata frameMetadata)
        Convert a JavaRDD<String> in CSV format to a FrameObject
        Parameters:
        javaRDD - the Java RDD of strings
        frameMetadata - frame metadata
        Returns:
        the JavaRDD<String> converted to a FrameObject
      • javaRDDStringIJVToMatrixObject

        public static MatrixObject javaRDDStringIJVToMatrixObject​(org.apache.spark.api.java.JavaRDD<String> javaRDD,
                                                                  MatrixMetadata matrixMetadata)
        Convert a JavaRDD<String> in IJV format to a MatrixObject . Note that metadata is required for IJV format.
        Parameters:
        javaRDD - the Java RDD of strings
        matrixMetadata - matrix metadata
        Returns:
        the JavaRDD<String> converted to a MatrixObject
      • javaRDDStringIJVToFrameObject

        public static FrameObject javaRDDStringIJVToFrameObject​(org.apache.spark.api.java.JavaRDD<String> javaRDD,
                                                                FrameMetadata frameMetadata)
        Convert a JavaRDD<String> in IJV format to a FrameObject . Note that metadata is required for IJV format.
        Parameters:
        javaRDD - the Java RDD of strings
        frameMetadata - frame metadata
        Returns:
        the JavaRDD<String> converted to a FrameObject
      • rddStringCSVToMatrixObject

        public static MatrixObject rddStringCSVToMatrixObject​(org.apache.spark.rdd.RDD<String> rdd)
        Convert a RDD<String> in CSV format to a MatrixObject
        Parameters:
        rdd - the RDD of strings
        Returns:
        the RDD<String> converted to a MatrixObject
      • rddStringCSVToMatrixObject

        public static MatrixObject rddStringCSVToMatrixObject​(org.apache.spark.rdd.RDD<String> rdd,
                                                              MatrixMetadata matrixMetadata)
        Convert a RDD<String> in CSV format to a MatrixObject
        Parameters:
        rdd - the RDD of strings
        matrixMetadata - matrix metadata
        Returns:
        the RDD<String> converted to a MatrixObject
      • rddStringCSVToFrameObject

        public static FrameObject rddStringCSVToFrameObject​(org.apache.spark.rdd.RDD<String> rdd)
        Convert a RDD<String> in CSV format to a FrameObject
        Parameters:
        rdd - the RDD of strings
        Returns:
        the RDD<String> converted to a FrameObject
      • rddStringCSVToFrameObject

        public static FrameObject rddStringCSVToFrameObject​(org.apache.spark.rdd.RDD<String> rdd,
                                                            FrameMetadata frameMetadata)
        Convert a RDD<String> in CSV format to a FrameObject
        Parameters:
        rdd - the RDD of strings
        frameMetadata - frame metadata
        Returns:
        the RDD<String> converted to a FrameObject
      • rddStringIJVToMatrixObject

        public static MatrixObject rddStringIJVToMatrixObject​(org.apache.spark.rdd.RDD<String> rdd,
                                                              MatrixMetadata matrixMetadata)
        Convert a RDD<String> in IJV format to a MatrixObject. Note that metadata is required for IJV format.
        Parameters:
        rdd - the RDD of strings
        matrixMetadata - matrix metadata
        Returns:
        the RDD<String> converted to a MatrixObject
      • rddStringIJVToFrameObject

        public static FrameObject rddStringIJVToFrameObject​(org.apache.spark.rdd.RDD<String> rdd,
                                                            FrameMetadata frameMetadata)
        Convert a RDD<String> in IJV format to a FrameObject. Note that metadata is required for IJV format.
        Parameters:
        rdd - the RDD of strings
        frameMetadata - frame metadata
        Returns:
        the RDD<String> converted to a FrameObject
      • matrixObjectToJavaRDDStringCSV

        public static org.apache.spark.api.java.JavaRDD<String> matrixObjectToJavaRDDStringCSV​(MatrixObject matrixObject)
        Convert a MatrixObject to a JavaRDD<String> in CSV format.
        Parameters:
        matrixObject - the MatrixObject
        Returns:
        the MatrixObject converted to a JavaRDD<String>
      • frameObjectToJavaRDDStringCSV

        public static org.apache.spark.api.java.JavaRDD<String> frameObjectToJavaRDDStringCSV​(FrameObject frameObject,
                                                                                              String delimiter)
        Convert a FrameObject to a JavaRDD<String> in CSV format.
        Parameters:
        frameObject - the FrameObject
        delimiter - the delimiter
        Returns:
        the FrameObject converted to a JavaRDD<String>
      • matrixObjectToJavaRDDStringIJV

        public static org.apache.spark.api.java.JavaRDD<String> matrixObjectToJavaRDDStringIJV​(MatrixObject matrixObject)
        Convert a MatrixObject to a JavaRDD<String> in IJV format.
        Parameters:
        matrixObject - the MatrixObject
        Returns:
        the MatrixObject converted to a JavaRDD<String>
      • frameObjectToJavaRDDStringIJV

        public static org.apache.spark.api.java.JavaRDD<String> frameObjectToJavaRDDStringIJV​(FrameObject frameObject)
        Convert a FrameObject to a JavaRDD<String> in IJV format.
        Parameters:
        frameObject - the FrameObject
        Returns:
        the FrameObject converted to a JavaRDD<String>
      • matrixObjectToRDDStringIJV

        public static org.apache.spark.rdd.RDD<String> matrixObjectToRDDStringIJV​(MatrixObject matrixObject)
        Convert a MatrixObject to a RDD<String> in IJV format.
        Parameters:
        matrixObject - the MatrixObject
        Returns:
        the MatrixObject converted to a RDD<String>
      • frameObjectToRDDStringIJV

        public static org.apache.spark.rdd.RDD<String> frameObjectToRDDStringIJV​(FrameObject frameObject)
        Convert a FrameObject to a RDD<String> in IJV format.
        Parameters:
        frameObject - the FrameObject
        Returns:
        the FrameObject converted to a RDD<String>
      • matrixObjectToRDDStringCSV

        public static org.apache.spark.rdd.RDD<String> matrixObjectToRDDStringCSV​(MatrixObject matrixObject)
        Convert a MatrixObject to a RDD<String> in CSV format.
        Parameters:
        matrixObject - the MatrixObject
        Returns:
        the MatrixObject converted to a RDD<String>
      • frameObjectToRDDStringCSV

        public static org.apache.spark.rdd.RDD<String> frameObjectToRDDStringCSV​(FrameObject frameObject,
                                                                                 String delimiter)
        Convert a FrameObject to a RDD<String> in CSV format.
        Parameters:
        frameObject - the FrameObject
        delimiter - the delimiter
        Returns:
        the FrameObject converted to a RDD<String>
      • matrixObjectToListStringCSV

        public static List<String> matrixObjectToListStringCSV​(MatrixObject matrixObject)
        Convert a MatrixObject to a List<String> in CSV format.
        Parameters:
        matrixObject - the MatrixObject
        Returns:
        the MatrixObject converted to a List<String>
      • frameObjectToListStringCSV

        public static List<String> frameObjectToListStringCSV​(FrameObject frameObject,
                                                              String delimiter)
        Convert a FrameObject to a List<String> in CSV format.
        Parameters:
        frameObject - the FrameObject
        delimiter - the delimiter
        Returns:
        the FrameObject converted to a List<String>
      • matrixObjectToListStringIJV

        public static List<String> matrixObjectToListStringIJV​(MatrixObject matrixObject)
        Convert a MatrixObject to a List<String> in IJV format.
        Parameters:
        matrixObject - the MatrixObject
        Returns:
        the MatrixObject converted to a List<String>
      • frameObjectToListStringIJV

        public static List<String> frameObjectToListStringIJV​(FrameObject frameObject)
        Convert a FrameObject to a List<String> in IJV format.
        Parameters:
        frameObject - the FrameObject
        Returns:
        the FrameObject converted to a List<String>
      • matrixObjectTo2DDoubleArray

        public static double[][] matrixObjectTo2DDoubleArray​(MatrixObject matrixObject)
        Convert a MatrixObject to a two-dimensional double array.
        Parameters:
        matrixObject - the MatrixObject
        Returns:
        the MatrixObject converted to a double[][]
      • matrixObjectToDataFrame

        public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> matrixObjectToDataFrame​(MatrixObject matrixObject,
                                                                                                     SparkExecutionContext sparkExecutionContext,
                                                                                                     boolean isVectorDF)
        Convert a MatrixObject to a DataFrame.
        Parameters:
        matrixObject - the MatrixObject
        sparkExecutionContext - the Spark execution context
        isVectorDF - is the DataFrame a vector DataFrame?
        Returns:
        the MatrixObject converted to a DataFrame
      • frameObjectToDataFrame

        public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> frameObjectToDataFrame​(FrameObject frameObject,
                                                                                                    SparkExecutionContext sparkExecutionContext)
        Convert a FrameObject to a DataFrame.
        Parameters:
        frameObject - the FrameObject
        sparkExecutionContext - the Spark execution context
        Returns:
        the FrameObject converted to a DataFrame
      • matrixObjectToBinaryBlocks

        public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,​MatrixBlock> matrixObjectToBinaryBlocks​(MatrixObject matrixObject,
                                                                                                                        SparkExecutionContext sparkExecutionContext)
        Convert a MatrixObject to a JavaPairRDD<MatrixIndexes, MatrixBlock>.
        Parameters:
        matrixObject - the MatrixObject
        sparkExecutionContext - the Spark execution context
        Returns:
        the MatrixObject converted to a JavaPairRDD<MatrixIndexes, MatrixBlock>
      • frameObjectToBinaryBlocks

        public static org.apache.spark.api.java.JavaPairRDD<Long,​FrameBlock> frameObjectToBinaryBlocks​(FrameObject frameObject,
                                                                                                             SparkExecutionContext sparkExecutionContext)
        Convert a FrameObject to a JavaPairRDD<Long, FrameBlock>.
        Parameters:
        frameObject - the FrameObject
        sparkExecutionContext - the Spark execution context
        Returns:
        the FrameObject converted to a JavaPairRDD<Long, FrameBlock>
      • frameObjectTo2DStringArray

        public static String[][] frameObjectTo2DStringArray​(FrameObject frameObject)
        Convert a FrameObject to a two-dimensional string array.
        Parameters:
        frameObject - the FrameObject
        Returns:
        the FrameObject converted to a String[][]
      • jsc

        public static org.apache.spark.api.java.JavaSparkContext jsc()
        Obtain JavaSparkContext from MLContextProxy.
        Returns:
        the Java Spark Context
      • sc

        public static org.apache.spark.SparkContext sc()
        Obtain SparkContext from MLContextProxy.
        Returns:
        the Spark Context
      • spark

        public static org.apache.spark.sql.SparkSession spark()
        Obtain SparkSession from MLContextProxy.
        Returns:
        the Spark Session