Package org.apache.sysds.runtime.io
Class ReaderTextCellParallel
- java.lang.Object
-
- org.apache.sysds.runtime.io.MatrixReader
-
- org.apache.sysds.runtime.io.ReaderTextCell
-
- org.apache.sysds.runtime.io.ReaderTextCellParallel
-
public class ReaderTextCellParallel extends ReaderTextCell
Parallel version of ReaderTextCell.java. To summarize, we create read tasks per split and use a fixed-size thread pool, to executed these tasks. If the target matrix is dense, the inserts are done lock-free. If the matrix is sparse, we use a buffer to collect unordered input cells, lock the the target sparse matrix once, and append all buffered values. Note MatrixMarket: 1) For matrix market files each read task probes for comments until it finds data because for very small tasks or large comments, any split might encounter % or %%. Hence, the parallel reader does not do the validity check for. 2) In extreme scenarios, the last comment might be in one split, and the following meta data in the subsequent split. This would create incorrect results or errors. However, this scenario is extremely unlikely (num threads > num lines if 1 comment line) and hence ignored similar to our parallel MR setting (but there we have a 128MB guarantee). 3) However, we use MIN_FILESIZE_MM (8KB) to give guarantees for the common case of small headers in order the issue described in (2).
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
ReaderTextCellParallel.CellBuffer
Useful class for buffering unordered cells before locking target onces and appending all buffered cells.static class
ReaderTextCellParallel.CountNnzTask
static class
ReaderTextCellParallel.ReadTask
-
Constructor Summary
Constructors Constructor Description ReaderTextCellParallel(Types.FileFormat fmt)
-
Method Summary
-
Methods inherited from class org.apache.sysds.runtime.io.ReaderTextCell
readMatrixFromHDFS, readMatrixFromInputStream
-
-
-
-
Constructor Detail
-
ReaderTextCellParallel
public ReaderTextCellParallel(Types.FileFormat fmt)
-
-