Class SparkUtils
- java.lang.Object
-
- org.apache.sysds.runtime.instructions.spark.utils.SparkUtils
-
public class SparkUtils extends Object
-
-
Field Summary
Fields Modifier and Type Field Description static org.apache.spark.storage.StorageLevel
DEFAULT_TMP
-
Constructor Summary
Constructors Constructor Description SparkUtils()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixCell>
cacheBinaryCellRDD(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixCell> input)
static void
checkSparsity(String varname, ExecutionContext ec)
static DataCharacteristics
computeDataCharacteristics(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixCell> input)
Utility to compute dimensions and non-zeros in a given RDD of binary cells.static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>
copyBinaryBlockMatrix(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)
Creates a partitioning-preserving deep copy of the input matrix RDD, where the indexes and values are copied.static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>
copyBinaryBlockMatrix(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, boolean deep)
Creates a partitioning-preserving copy of the input matrix RDD.static org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock>
copyBinaryBlockTensor(org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock> in)
Creates a partitioning-preserving deep copy of the input tensor RDD, where the indexes and values are copied.static org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock>
copyBinaryBlockTensor(org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock> in, boolean deep)
Creates a partitioning-preserving copy of the input tensor RDD.static List<scala.Tuple2<Long,FrameBlock>>
fromIndexedFrameBlock(List<Pair<Long,FrameBlock>> in)
static scala.Tuple2<Long,FrameBlock>
fromIndexedFrameBlock(Pair<Long,FrameBlock> in)
static List<scala.Tuple2<MatrixIndexes,MatrixBlock>>
fromIndexedMatrixBlock(List<IndexedMatrixValue> in)
static scala.Tuple2<MatrixIndexes,MatrixBlock>
fromIndexedMatrixBlock(IndexedMatrixValue in)
static List<Pair<MatrixIndexes,MatrixBlock>>
fromIndexedMatrixBlockToPair(List<IndexedMatrixValue> in)
static Pair<MatrixIndexes,MatrixBlock>
fromIndexedMatrixBlockToPair(IndexedMatrixValue in)
static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>
getEmptyBlockRDD(org.apache.spark.api.java.JavaSparkContext sc, DataCharacteristics mc)
Creates an RDD of empty blocks according to the given matrix characteristics.static long
getNonZeros(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> input)
static long
getNonZeros(MatrixObject mo)
static int
getNumPreferredPartitions(DataCharacteristics dc)
static int
getNumPreferredPartitions(DataCharacteristics dc, boolean outputEmptyBlocks)
static int
getNumPreferredPartitions(DataCharacteristics dc, org.apache.spark.api.java.JavaPairRDD<?,?> in)
static String
getPrefixFromSparkDebugInfo(String line)
static String
getStartLineFromSparkDebugInfo(String line)
static boolean
isHashPartitioned(org.apache.spark.api.java.JavaPairRDD<?,?> in)
Indicates if the input RDD is hash partitioned, i.e., it has a partitioner of typeorg.apache.spark.HashPartitioner
.static void
postprocessUltraSparseOutput(MatrixObject mo, DataCharacteristics mcOut)
static Pair<Long,FrameBlock>
toIndexedFrameBlock(scala.Tuple2<Long,FrameBlock> in)
static List<Pair<Long,Long>>
toIndexedLong(List<scala.Tuple2<Long,Long>> in)
static IndexedMatrixValue
toIndexedMatrixBlock(MatrixIndexes ix, MatrixBlock mb)
static IndexedMatrixValue
toIndexedMatrixBlock(scala.Tuple2<MatrixIndexes,MatrixBlock> in)
static IndexedTensorBlock
toIndexedTensorBlock(TensorIndexes ix, TensorBlock mb)
static IndexedTensorBlock
toIndexedTensorBlock(scala.Tuple2<TensorIndexes,TensorBlock> in)
-
-
-
Method Detail
-
toIndexedMatrixBlock
public static IndexedMatrixValue toIndexedMatrixBlock(scala.Tuple2<MatrixIndexes,MatrixBlock> in)
-
toIndexedMatrixBlock
public static IndexedMatrixValue toIndexedMatrixBlock(MatrixIndexes ix, MatrixBlock mb)
-
toIndexedTensorBlock
public static IndexedTensorBlock toIndexedTensorBlock(scala.Tuple2<TensorIndexes,TensorBlock> in)
-
toIndexedTensorBlock
public static IndexedTensorBlock toIndexedTensorBlock(TensorIndexes ix, TensorBlock mb)
-
fromIndexedMatrixBlock
public static scala.Tuple2<MatrixIndexes,MatrixBlock> fromIndexedMatrixBlock(IndexedMatrixValue in)
-
fromIndexedMatrixBlock
public static List<scala.Tuple2<MatrixIndexes,MatrixBlock>> fromIndexedMatrixBlock(List<IndexedMatrixValue> in)
-
fromIndexedMatrixBlockToPair
public static Pair<MatrixIndexes,MatrixBlock> fromIndexedMatrixBlockToPair(IndexedMatrixValue in)
-
fromIndexedMatrixBlockToPair
public static List<Pair<MatrixIndexes,MatrixBlock>> fromIndexedMatrixBlockToPair(List<IndexedMatrixValue> in)
-
fromIndexedFrameBlock
public static scala.Tuple2<Long,FrameBlock> fromIndexedFrameBlock(Pair<Long,FrameBlock> in)
-
fromIndexedFrameBlock
public static List<scala.Tuple2<Long,FrameBlock>> fromIndexedFrameBlock(List<Pair<Long,FrameBlock>> in)
-
toIndexedLong
public static List<Pair<Long,Long>> toIndexedLong(List<scala.Tuple2<Long,Long>> in)
-
toIndexedFrameBlock
public static Pair<Long,FrameBlock> toIndexedFrameBlock(scala.Tuple2<Long,FrameBlock> in)
-
isHashPartitioned
public static boolean isHashPartitioned(org.apache.spark.api.java.JavaPairRDD<?,?> in)
Indicates if the input RDD is hash partitioned, i.e., it has a partitioner of typeorg.apache.spark.HashPartitioner
.- Parameters:
in
- input JavaPairRDD- Returns:
- true if input is hash partitioned
-
getNumPreferredPartitions
public static int getNumPreferredPartitions(DataCharacteristics dc, org.apache.spark.api.java.JavaPairRDD<?,?> in)
-
getNumPreferredPartitions
public static int getNumPreferredPartitions(DataCharacteristics dc)
-
getNumPreferredPartitions
public static int getNumPreferredPartitions(DataCharacteristics dc, boolean outputEmptyBlocks)
-
copyBinaryBlockMatrix
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> copyBinaryBlockMatrix(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)
Creates a partitioning-preserving deep copy of the input matrix RDD, where the indexes and values are copied.- Parameters:
in
- matrix asJavaPairRDD<MatrixIndexes,MatrixBlock>
- Returns:
- matrix as
JavaPairRDD<MatrixIndexes,MatrixBlock>
-
copyBinaryBlockMatrix
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> copyBinaryBlockMatrix(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, boolean deep)
Creates a partitioning-preserving copy of the input matrix RDD. If a deep copy is requested, indexes and values are copied, otherwise they are simply passed through.- Parameters:
in
- matrix asJavaPairRDD<MatrixIndexes,MatrixBlock>
deep
- if true, perform deep copy- Returns:
- matrix as
JavaPairRDD<MatrixIndexes,MatrixBlock>
-
copyBinaryBlockTensor
public static org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock> copyBinaryBlockTensor(org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock> in)
Creates a partitioning-preserving deep copy of the input tensor RDD, where the indexes and values are copied.- Parameters:
in
- tensor asJavaPairRDD<TensorIndexes,HomogTensor>
- Returns:
- tensor as
JavaPairRDD<TensorIndexes,HomogTensor>
-
copyBinaryBlockTensor
public static org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock> copyBinaryBlockTensor(org.apache.spark.api.java.JavaPairRDD<TensorIndexes,BasicTensorBlock> in, boolean deep)
Creates a partitioning-preserving copy of the input tensor RDD. If a deep copy is requested, indexes and values are copied, otherwise they are simply passed through.- Parameters:
in
- tensor asJavaPairRDD<TensorIndexes,HomogTensor>
deep
- if true, perform deep copy- Returns:
- tensor as
JavaPairRDD<TensorIndexes,HomogTensor>
-
checkSparsity
public static void checkSparsity(String varname, ExecutionContext ec)
-
getEmptyBlockRDD
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> getEmptyBlockRDD(org.apache.spark.api.java.JavaSparkContext sc, DataCharacteristics mc)
Creates an RDD of empty blocks according to the given matrix characteristics. This is done in a scalable manner by parallelizing block ranges and generating empty blocks in a distributed manner, under awareness of preferred output partition sizes.- Parameters:
sc
- spark contextmc
- matrix characteristics- Returns:
- pair rdd of empty matrix blocks
-
cacheBinaryCellRDD
public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixCell> cacheBinaryCellRDD(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixCell> input)
-
computeDataCharacteristics
public static DataCharacteristics computeDataCharacteristics(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixCell> input)
Utility to compute dimensions and non-zeros in a given RDD of binary cells.- Parameters:
input
- matrix asJavaPairRDD<MatrixIndexes, MatrixCell>
- Returns:
- matrix characteristics
-
getNonZeros
public static long getNonZeros(MatrixObject mo)
-
getNonZeros
public static long getNonZeros(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> input)
-
postprocessUltraSparseOutput
public static void postprocessUltraSparseOutput(MatrixObject mo, DataCharacteristics mcOut)
-
-