Using SystemML with GPU


User Guide

To use SystemML on GPUs, please ensure that CUDA 9 and CuDNN 7 is installed on your system.

Python users

Please install SystemML using pip:

Then you can use the setGPU(True) method of MLContext and MLLearn APIs to enable the GPU usage.

from systemml.mllearn import Caffe2DML
lenet = Caffe2DML(spark, solver='lenet_solver.proto', input_shape=(1, 28, 28))
lenet.setGPU(True)

To skip memory-checking and force all GPU-enabled operations on the GPU, please use the setForceGPU(True) method after setGPU(True) method.

from systemml.mllearn import Caffe2DML
lenet = Caffe2DML(spark, solver='lenet_solver.proto', input_shape=(1, 28, 28))
lenet.setGPU(True).setForceGPU(True)

Command-line users

To enable the GPU backend via command-line, please provide systemml-1.*-extra.jar in the classpath and -gpu flag.

spark-submit --jars systemml-1.*-extra.jar SystemML.jar -f myDML.dml -gpu

To skip memory-checking and force all GPU-enabled operations on the GPU, please provide force option to the -gpu flag.

spark-submit --jars systemml-1.*-extra.jar SystemML.jar -f myDML.dml -gpu force

Scala users

To enable the GPU backend via command-line, please provide systemml-1.*-extra.jar in the classpath and use the setGPU(True) method of MLContext API to enable the GPU usage.

spark-shell --jars systemml-1.*-extra.jar,SystemML.jar

Troubleshooting guide

sudo yum install libmpc-devel mpfr-devel gmp-devel zlib-devel*
curl ftp://ftp.gnu.org/pub/gnu/gcc/gcc-5.3.0/gcc-5.3.0.tar.bz2 -O
tar xvfj gcc-5.3.0.tar.bz2
cd gcc-5.3.0
./configure --with-system-zlib --disable-multilib --enable-languages=c,c++
num_cores=`grep -c ^processor /proc/cpuinfo`
make -j $num_cores
sudo make install

Advanced Configuration

Using single precision

By default, SystemML uses double precision to store its matrices in the GPU memory. To use single precision, the user needs to set the configuration property ‘sysml.floating.point.precision’ to ‘single’. However, with exception of BLAS operations, SystemML always performs all CPU operations in double precision.

Training very deep network

Shadow buffer

To train very deep network with double precision, no additional configurations are necessary. But to train very deep network with single precision, the user can speed up the eviction by using shadow buffer. The fraction of the driver memory to be allocated to the shadow buffer can
be set by using the configuration property ‘sysml.gpu.eviction.shadow.bufferSize’. In the current version, the shadow buffer is currently not guarded by SystemML and can potentially lead to OOM if the network is deep as well as wide.

Unified memory allocator

By default, SystemML uses CUDA’s memory allocator and performs on-demand eviction using the eviction policy set by the configuration property ‘sysml.gpu.eviction.policy’. To use CUDA’s unified memory allocator that performs page-level eviction instead, please set the configuration property ‘sysml.gpu.memory.allocator’ to ‘unified_memory’.