Class RDDConverterUtilsExt
- java.lang.Object
-
- org.apache.sysds.runtime.instructions.spark.utils.RDDConverterUtilsExt
-
public class RDDConverterUtilsExt extends Object
NOTE: These are experimental converter utils. Once thoroughly tested, they can be moved to RDDConverterUtils.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
RDDConverterUtilsExt.AddRowID
static class
RDDConverterUtilsExt.RDDConverterTypes
-
Constructor Summary
Constructors Constructor Description RDDConverterUtilsExt()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
addIDToDataFrame(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> df, org.apache.spark.sql.SparkSession sparkSession, String nameOfCol)
Add element indices as new column to DataFramestatic org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>
coordinateMatrixToBinaryBlock(org.apache.spark.api.java.JavaSparkContext sc, org.apache.spark.mllib.linalg.distributed.CoordinateMatrix input, DataCharacteristics mcIn, boolean outputEmptyBlocks)
static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>
coordinateMatrixToBinaryBlock(org.apache.spark.SparkContext sc, org.apache.spark.mllib.linalg.distributed.CoordinateMatrix input, DataCharacteristics mcIn, boolean outputEmptyBlocks)
static void
copyRowBlocks(MatrixBlock mb, int rowIndex, MatrixBlock ret, int numRowsPerBlock, int rlen, int clen)
static void
copyRowBlocks(MatrixBlock mb, int rowIndex, MatrixBlock ret, long numRowsPerBlock, long rlen, long clen)
static void
copyRowBlocks(MatrixBlock mb, long rowIndex, MatrixBlock ret, int numRowsPerBlock, int rlen, int clen)
static void
copyRowBlocks(MatrixBlock mb, long rowIndex, MatrixBlock ret, long numRowsPerBlock, long rlen, long clen)
static void
postProcessAfterCopying(MatrixBlock ret)
static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
projectColumns(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> df, ArrayList<String> columns)
static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>
stringDataFrameToVectorDataFrame(org.apache.spark.sql.SparkSession sparkSession, org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> inputDF)
Convert a dataframe of comma-separated string rows to a dataframe of ml.linalg.Vector rows.
-
-
-
Method Detail
-
coordinateMatrixToBinaryBlock
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> coordinateMatrixToBinaryBlock(org.apache.spark.api.java.JavaSparkContext sc, org.apache.spark.mllib.linalg.distributed.CoordinateMatrix input, DataCharacteristics mcIn, boolean outputEmptyBlocks)
-
coordinateMatrixToBinaryBlock
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> coordinateMatrixToBinaryBlock(org.apache.spark.SparkContext sc, org.apache.spark.mllib.linalg.distributed.CoordinateMatrix input, DataCharacteristics mcIn, boolean outputEmptyBlocks)
-
projectColumns
public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> projectColumns(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> df, ArrayList<String> columns)
-
copyRowBlocks
public static void copyRowBlocks(MatrixBlock mb, int rowIndex, MatrixBlock ret, int numRowsPerBlock, int rlen, int clen)
-
copyRowBlocks
public static void copyRowBlocks(MatrixBlock mb, long rowIndex, MatrixBlock ret, int numRowsPerBlock, int rlen, int clen)
-
copyRowBlocks
public static void copyRowBlocks(MatrixBlock mb, int rowIndex, MatrixBlock ret, long numRowsPerBlock, long rlen, long clen)
-
copyRowBlocks
public static void copyRowBlocks(MatrixBlock mb, long rowIndex, MatrixBlock ret, long numRowsPerBlock, long rlen, long clen)
-
postProcessAfterCopying
public static void postProcessAfterCopying(MatrixBlock ret)
-
addIDToDataFrame
public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> addIDToDataFrame(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> df, org.apache.spark.sql.SparkSession sparkSession, String nameOfCol)
Add element indices as new column to DataFrame- Parameters:
df
- input data framesparkSession
- the Spark SessionnameOfCol
- name of index column- Returns:
- new data frame
-
stringDataFrameToVectorDataFrame
public static org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> stringDataFrameToVectorDataFrame(org.apache.spark.sql.SparkSession sparkSession, org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> inputDF)
Convert a dataframe of comma-separated string rows to a dataframe of ml.linalg.Vector rows.Example input rows:
((1.2, 4.3, 3.4))
(1.2, 3.4, 2.2)
[[1.2, 34.3, 1.2, 1.25]]
[1.2, 3.4]
- Parameters:
sparkSession
- Spark SessioninputDF
- dataframe of comma-separated row strings to convert to dataframe of ml.linalg.Vector rows- Returns:
- dataframe of ml.linalg.Vector rows
-
-