Class RDDConverterUtilsExt


  • public class RDDConverterUtilsExt
    extends Object
    NOTE: These are experimental converter utils. Once thoroughly tested, they can be moved to RDDConverterUtils.
    • Constructor Detail

      • RDDConverterUtilsExt

        public RDDConverterUtilsExt()
    • Method Detail

      • coordinateMatrixToBinaryBlock

        public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,​MatrixBlock> coordinateMatrixToBinaryBlock​(org.apache.spark.api.java.JavaSparkContext sc,
                                                                                                                           org.apache.spark.mllib.linalg.distributed.CoordinateMatrix input,
                                                                                                                           DataCharacteristics mcIn,
                                                                                                                           boolean outputEmptyBlocks)
      • coordinateMatrixToBinaryBlock

        public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,​MatrixBlock> coordinateMatrixToBinaryBlock​(org.apache.spark.SparkContext sc,
                                                                                                                           org.apache.spark.mllib.linalg.distributed.CoordinateMatrix input,
                                                                                                                           DataCharacteristics mcIn,
                                                                                                                           boolean outputEmptyBlocks)
      • projectColumns

        public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> projectColumns​(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> df,
                                                                                            ArrayList<String> columns)
      • copyRowBlocks

        public static void copyRowBlocks​(MatrixBlock mb,
                                         int rowIndex,
                                         MatrixBlock ret,
                                         int numRowsPerBlock,
                                         int rlen,
                                         int clen)
      • copyRowBlocks

        public static void copyRowBlocks​(MatrixBlock mb,
                                         long rowIndex,
                                         MatrixBlock ret,
                                         int numRowsPerBlock,
                                         int rlen,
                                         int clen)
      • copyRowBlocks

        public static void copyRowBlocks​(MatrixBlock mb,
                                         int rowIndex,
                                         MatrixBlock ret,
                                         long numRowsPerBlock,
                                         long rlen,
                                         long clen)
      • copyRowBlocks

        public static void copyRowBlocks​(MatrixBlock mb,
                                         long rowIndex,
                                         MatrixBlock ret,
                                         long numRowsPerBlock,
                                         long rlen,
                                         long clen)
      • postProcessAfterCopying

        public static void postProcessAfterCopying​(MatrixBlock ret)
      • addIDToDataFrame

        public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> addIDToDataFrame​(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> df,
                                                                                              org.apache.spark.sql.SparkSession sparkSession,
                                                                                              String nameOfCol)
        Add element indices as new column to DataFrame
        Parameters:
        df - input data frame
        sparkSession - the Spark Session
        nameOfCol - name of index column
        Returns:
        new data frame
      • stringDataFrameToVectorDataFrame

        public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> stringDataFrameToVectorDataFrame​(org.apache.spark.sql.SparkSession sparkSession,
                                                                                                              org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> inputDF)
        Convert a dataframe of comma-separated string rows to a dataframe of ml.linalg.Vector rows.

        Example input rows:
        ((1.2, 4.3, 3.4))
        (1.2, 3.4, 2.2)
        [[1.2, 34.3, 1.2, 1.25]]
        [1.2, 3.4]

        Parameters:
        sparkSession - Spark Session
        inputDF - dataframe of comma-separated row strings to convert to dataframe of ml.linalg.Vector rows
        Returns:
        dataframe of ml.linalg.Vector rows