Class PartitionedBroadcast<T extends CacheBlock>

  • All Implemented Interfaces:
    Serializable

    public class PartitionedBroadcast<T extends CacheBlock>
    extends Object
    implements Serializable
    This class is a wrapper around an array of broadcasts of partitioned matrix/frame blocks, which is required due to 2GB limitations of Spark's broadcast handling. Without this partitioning of Broadcast<PartitionedBlock> into Broadcast<PartitionedBlock>[], we got java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE issue. Despite various jiras, this issue still showed up in Spark 2.1.
    See Also:
    Serialized Form
    • Constructor Detail

      • PartitionedBroadcast

        public PartitionedBroadcast()
    • Method Detail

      • getBroadcasts

        public org.apache.spark.broadcast.Broadcast<PartitionedBlock<T>>[] getBroadcasts()
      • getNumRows

        public long getNumRows()
      • getNumCols

        public long getNumCols()
      • getNumRowBlocks

        public int getNumRowBlocks()
      • getNumColumnBlocks

        public int getNumColumnBlocks()
      • computeBlocksPerPartition

        public static int computeBlocksPerPartition​(long rlen,
                                                    long clen,
                                                    long blen)
      • computeBlocksPerPartition

        public static int computeBlocksPerPartition​(long[] dims,
                                                    int blen)
      • getBlock

        public T getBlock​(int rowIndex,
                          int colIndex)
      • getBlock

        public T getBlock​(int[] ix)
      • slice

        public T slice​(long rl,
                       long ru,
                       long cl,
                       long cu,
                       T block)
        Utility for slice operations over partitioned matrices, where the index range can cover multiple blocks. The result is always a single result matrix block. All semantics are equivalent to the core matrix block slice operations.
        Parameters:
        rl - row lower bound
        ru - row upper bound
        cl - column lower bound
        cu - column upper bound
        block - block object
        Returns:
        block object
      • destroy

        public void destroy()
        This method cleanups all underlying broadcasts of a partitioned broadcast, by forward the calls to SparkExecutionContext.cleanupBroadcastVariable.