Class SparkUtils


  • public class SparkUtils
    extends Object
    • Field Detail

      • DEFAULT_TMP

        public static final org.apache.spark.storage.StorageLevel DEFAULT_TMP
    • Constructor Detail

      • SparkUtils

        public SparkUtils()
    • Method Detail

      • isHashPartitioned

        public static boolean isHashPartitioned​(org.apache.spark.api.java.JavaPairRDD<?,​?> in)
        Indicates if the input RDD is hash partitioned, i.e., it has a partitioner of type org.apache.spark.HashPartitioner.
        Parameters:
        in - input JavaPairRDD
        Returns:
        true if input is hash partitioned
      • getNumPreferredPartitions

        public static int getNumPreferredPartitions​(DataCharacteristics dc,
                                                    org.apache.spark.api.java.JavaPairRDD<?,​?> in)
      • getNumPreferredPartitions

        public static int getNumPreferredPartitions​(DataCharacteristics dc)
      • getNumPreferredPartitions

        public static int getNumPreferredPartitions​(DataCharacteristics dc,
                                                    boolean outputEmptyBlocks)
      • copyBinaryBlockMatrix

        public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,​MatrixBlock> copyBinaryBlockMatrix​(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,​MatrixBlock> in)
        Creates a partitioning-preserving deep copy of the input matrix RDD, where the indexes and values are copied.
        Parameters:
        in - matrix as JavaPairRDD<MatrixIndexes,MatrixBlock>
        Returns:
        matrix as JavaPairRDD<MatrixIndexes,MatrixBlock>
      • copyBinaryBlockMatrix

        public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,​MatrixBlock> copyBinaryBlockMatrix​(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,​MatrixBlock> in,
                                                                                                                   boolean deep)
        Creates a partitioning-preserving copy of the input matrix RDD. If a deep copy is requested, indexes and values are copied, otherwise they are simply passed through.
        Parameters:
        in - matrix as JavaPairRDD<MatrixIndexes,MatrixBlock>
        deep - if true, perform deep copy
        Returns:
        matrix as JavaPairRDD<MatrixIndexes,MatrixBlock>
      • copyBinaryBlockTensor

        public static org.apache.spark.api.java.JavaPairRDD<TensorIndexes,​BasicTensorBlock> copyBinaryBlockTensor​(org.apache.spark.api.java.JavaPairRDD<TensorIndexes,​BasicTensorBlock> in)
        Creates a partitioning-preserving deep copy of the input tensor RDD, where the indexes and values are copied.
        Parameters:
        in - tensor as JavaPairRDD<TensorIndexes,HomogTensor>
        Returns:
        tensor as JavaPairRDD<TensorIndexes,HomogTensor>
      • copyBinaryBlockTensor

        public static org.apache.spark.api.java.JavaPairRDD<TensorIndexes,​BasicTensorBlock> copyBinaryBlockTensor​(org.apache.spark.api.java.JavaPairRDD<TensorIndexes,​BasicTensorBlock> in,
                                                                                                                        boolean deep)
        Creates a partitioning-preserving copy of the input tensor RDD. If a deep copy is requested, indexes and values are copied, otherwise they are simply passed through.
        Parameters:
        in - tensor as JavaPairRDD<TensorIndexes,HomogTensor>
        deep - if true, perform deep copy
        Returns:
        tensor as JavaPairRDD<TensorIndexes,HomogTensor>
      • getStartLineFromSparkDebugInfo

        public static String getStartLineFromSparkDebugInfo​(String line)
      • getPrefixFromSparkDebugInfo

        public static String getPrefixFromSparkDebugInfo​(String line)
      • getEmptyBlockRDD

        public static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,​MatrixBlock> getEmptyBlockRDD​(org.apache.spark.api.java.JavaSparkContext sc,
                                                                                                              DataCharacteristics mc)
        Creates an RDD of empty blocks according to the given matrix characteristics. This is done in a scalable manner by parallelizing block ranges and generating empty blocks in a distributed manner, under awareness of preferred output partition sizes.
        Parameters:
        sc - spark context
        mc - matrix characteristics
        Returns:
        pair rdd of empty matrix blocks
      • computeDataCharacteristics

        public static DataCharacteristics computeDataCharacteristics​(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,​MatrixCell> input)
        Utility to compute dimensions and non-zeros in a given RDD of binary cells.
        Parameters:
        input - matrix as JavaPairRDD<MatrixIndexes, MatrixCell>
        Returns:
        matrix characteristics
      • getNonZeros

        public static long getNonZeros​(MatrixObject mo)
      • getNonZeros

        public static long getNonZeros​(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,​MatrixBlock> input)