Class DataPartitionerLocal


  • public class DataPartitionerLocal
    extends DataPartitioner
    Partitions a given matrix into row or column partitions with a two pass-approach. In the first phase the input matrix is read from HDFS and sorted into block partitions in a staging area in the local file system according to the partition format. In order to allow for scalable partitioning, we process one block at a time. Furthermore, in the second phase, all blocks of a partition are append to a sequence file on HDFS. Block-wise partitioning and write-once semantics of sequence files require the indirection over the local staging area. For scalable computation, we process one sequence file at a time. NOTE: For the resulting partitioned matrix, we store block and cell indexes wrt partition boundaries. This means that the partitioned matrix CANNOT be read as a traditional matrix because there are for example multiple blocks with same index (while the actual index is encoded in the path). In order to enable full read of partition matrices, data converter would need to handle negative row/col offsets for partitioned read. Currently not done in order to avoid overhead from normal read and since partitioning only applied if exclusively indexed access.
    • Constructor Detail

      • DataPartitionerLocal

        public DataPartitionerLocal​(ParForProgramBlock.PartitionFormat dpf,
                                    int par)
        DataPartitionerLocal constructor.
        Parameters:
        dpf - data partitionformat
        par - -1 for serial otherwise number of threads, can be ignored by implementation
    • Method Detail

      • writeBinaryBlockSequenceFileToHDFS

        public void writeBinaryBlockSequenceFileToHDFS​(org.apache.hadoop.mapred.JobConf job,
                                                       String dir,
                                                       String lpdir,
                                                       boolean threadsafe)
                                                throws IOException
        Throws:
        IOException
      • writeBinaryCellSequenceFileToHDFS

        public void writeBinaryCellSequenceFileToHDFS​(org.apache.hadoop.mapred.JobConf job,
                                                      String dir,
                                                      String lpdir)
                                               throws IOException
        Throws:
        IOException
      • writeTextCellFileToHDFS

        public void writeTextCellFileToHDFS​(org.apache.hadoop.mapred.JobConf job,
                                            String dir,
                                            String lpdir)
                                     throws IOException
        Throws:
        IOException