java.lang.Object
- org.apache.sysds.runtime.matrix.data.LibMatrixCUDA
- - org.apache.sysds.runtime.matrix.data.LibMatrixCuDNN

```
public class LibMatrixCuDNN
extends LibMatrixCUDA
```
This class contains method that invoke CuDNN operations.

Field Summary
- Fields inherited from class org.apache.sysds.runtime.matrix.data.LibMatrixCUDA
  cudaSupportFunctions, customKernelSuffix, sizeOfDataType

Constructor Summary

Constructors
Constructor Description

LibMatrixCuDNN()

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method	Description
`static void`	`batchNormalizationBackward(GPUContext gCtx, String instName, MatrixObject image, MatrixObject dout, MatrixObject scale, MatrixObject dX, MatrixObject dScale, MatrixObject dBias, double epsilon, MatrixObject resultSaveMean, MatrixObject resultSaveInvVariance)`	This method computes the backpropagation errors for image, scale and bias of batch normalization layer
`static void`	`batchNormalizationForwardInference(GPUContext gCtx, String instName, MatrixObject image, MatrixObject scale, MatrixObject bias, MatrixObject runningMean, MatrixObject runningVar, MatrixObject ret, double epsilon)`	Performs the forward BatchNormalization layer computation for inference
`static void`	`batchNormalizationForwardTraining(GPUContext gCtx, String instName, MatrixObject image, MatrixObject scale, MatrixObject bias, MatrixObject runningMean, MatrixObject runningVar, MatrixObject ret, MatrixObject retRunningMean, MatrixObject retRunningVar, double epsilon, double exponentialAverageFactor, MatrixObject resultSaveMean, MatrixObject resultSaveInvVariance)`	Performs the forward BatchNormalization layer computation for training
`static void`	`conv2d(GPUContext gCtx, String instName, MatrixObject image, MatrixObject filter, MatrixObject outputBlock, int N, int C, int H, int W, int K, int R, int S, int pad_h, int pad_w, int stride_h, int stride_w, int P, int Q, double intermediateMemoryBudget)`	Performs a 2D convolution
`static void`	`conv2dBackwardData(GPUContext gCtx, String instName, MatrixObject filter, MatrixObject dout, MatrixObject output, int N, int C, int H, int W, int K, int R, int S, int pad_h, int pad_w, int stride_h, int stride_w, int P, int Q, double intermediateMemoryBudget)`	This method computes the backpropogation errors for previous layer of convolution operation
`static void`	`conv2dBackwardFilter(GPUContext gCtx, String instName, MatrixObject image, MatrixObject dout, MatrixObject outputBlock, int N, int C, int H, int W, int K, int R, int S, int pad_h, int pad_w, int stride_h, int stride_w, int P, int Q, double intermediateMemoryBudget)`	This method computes the backpropogation errors for filter of convolution operation
`static void`	`conv2dBiasAdd(GPUContext gCtx, String instName, MatrixObject image, MatrixObject bias, MatrixObject filter, MatrixObject output, int N, int C, int H, int W, int K, int R, int S, int pad_h, int pad_w, int stride_h, int stride_w, int P, int Q, double intermediateMemoryBudget)`	Does a 2D convolution followed by a bias_add
`static jcuda.Pointer`	`getDensePointerForCuDNN(GPUContext gCtx, MatrixObject image, String instName, int numRows, int numCols)`	Convenience method to get jcudaDenseMatrixPtr.
`static void`	`lstm(ExecutionContext ec, GPUContext gCtx, String instName, jcuda.Pointer X, jcuda.Pointer wPointer, jcuda.Pointer out0, jcuda.Pointer c0, boolean return_sequences, String outputName, String cyName, int N, int M, int D, int T)`	Computes the forward pass for an LSTM layer with M neurons.
`static void`	`lstmBackward(ExecutionContext ec, GPUContext gCtx, String instName, jcuda.Pointer x, jcuda.Pointer hx, jcuda.Pointer cx, jcuda.Pointer wPointer, String doutName, String dcyName, String dxName, String dwName, String dbName, String dhxName, String dcxName, boolean return_sequences, int N, int M, int D, int T)`
`static void`	`pooling(GPUContext gCtx, String instName, MatrixObject image, MatrixObject outputBlock, int N, int C, int H, int W, int K, int R, int S, int pad_h, int pad_w, int stride_h, int stride_w, int P, int Q, LibMatrixDNN.PoolingType poolingType, double intermediateMemoryBudget)`	performs maxpooling on GPU by exploiting cudnnPoolingForward(...)
`static void`	`poolingBackward(GPUContext gCtx, String instName, MatrixObject image, MatrixObject dout, MatrixObject maxpoolOutput, MatrixObject outputBlock, int N, int C, int H, int W, int K, int R, int S, int pad_h, int pad_w, int stride_h, int stride_w, int P, int Q, LibMatrixDNN.PoolingType poolingType, double intermediateMemoryBudget)`	Performs maxpoolingBackward on GPU by exploiting cudnnPoolingBackward(...) This method computes the backpropogation errors for previous layer of maxpooling operation
`static void`	`relu(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in, String outputName)`	Performs the relu operation on the GPU.
`static void`	`softmax(ExecutionContext ec, GPUContext gCtx, String instName, MatrixObject in1, String outputName)`	Performs an "softmax" operation on a matrix on the GPU

Methods inherited from class org.apache.sysds.runtime.matrix.data.LibMatrixCUDA
abs, acos, asin, atan, axpy, biasAdd, biasMultiply, cbind, ceil, channelSums, computeNNZ, cos, cosh, cumulativeScan, cumulativeSumProduct, denseTranspose, deviceCopy, double2float, exp, float2double, floor, getCudaKernels, getDenseMatrixOutputForGPUInstruction, getDenseMatrixOutputForGPUInstruction, getDensePointer, getNnz, isInSparseFormat, log, matmultTSMM, matrixMatrixArithmetic, matrixMatrixRelational, matrixScalarArithmetic, matrixScalarOp, matrixScalarRelational, one, rbind, reluBackward, resetFloatingPointPrecision, round, sigmoid, sign, sin, sinh, sliceOperations, solve, sqrt, tan, tanh, toInt, transpose, unaryAggregate, zero

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail
- LibMatrixCuDNN
```
public LibMatrixCuDNN()
```

Method Detail

conv2dBiasAdd

public static void conv2dBiasAdd(GPUContext gCtx,
                                 String instName,
                                 MatrixObject image,
                                 MatrixObject bias,
                                 MatrixObject filter,
                                 MatrixObject output,
                                 int N,
                                 int C,
                                 int H,
                                 int W,
                                 int K,
                                 int R,
                                 int S,
                                 int pad_h,
                                 int pad_w,
                                 int stride_h,
                                 int stride_w,
                                 int P,
                                 int Q,
                                 double intermediateMemoryBudget)

Does a 2D convolution followed by a bias_add

Parameters:: gCtx - a valid GPUContext; instName - the invoking instruction's name for record Statistics.; image - input image matrix object; bias - bias matrix object; filter - filter matrix object; output - output matrix object; N - number of input images; C - number of channels; H - height of each image; W - width of each image; K - number of output "channels"; R - height of filter; S - width of filter; pad_h - padding height; pad_w - padding width; stride_h - stride height; stride_w - string width; P - output height; Q - output width; intermediateMemoryBudget - intermediate memory budget

conv2d

public static void conv2d(GPUContext gCtx,
                          String instName,
                          MatrixObject image,
                          MatrixObject filter,
                          MatrixObject outputBlock,
                          int N,
                          int C,
                          int H,
                          int W,
                          int K,
                          int R,
                          int S,
                          int pad_h,
                          int pad_w,
                          int stride_h,
                          int stride_w,
                          int P,
                          int Q,
                          double intermediateMemoryBudget)

Performs a 2D convolution

Parameters:: gCtx - a valid GPUContext; instName - the invoking instruction's name for record Statistics.; image - input matrix object; filter - filter matrix object; outputBlock - output matrix object; N - number of input images; C - number of channels; H - height of each image; W - width of each image; K - number of output "channels"; R - height of filter; S - width of filter; pad_h - padding height; pad_w - padding width; stride_h - stride height; stride_w - string width; P - output height; Q - output width; intermediateMemoryBudget - intermediate memory budget

softmax

public static void softmax(ExecutionContext ec,
                           GPUContext gCtx,
                           String instName,
                           MatrixObject in1,
                           String outputName)

Performs an "softmax" operation on a matrix on the GPU

Parameters:: ec - execution context; gCtx - a valid GPUContext; instName - the invoking instruction's name for record Statistics.; in1 - input matrix; outputName - output matrix name

conv2dBackwardFilter

public static void conv2dBackwardFilter(GPUContext gCtx,
                                        String instName,
                                        MatrixObject image,
                                        MatrixObject dout,
                                        MatrixObject outputBlock,
                                        int N,
                                        int C,
                                        int H,
                                        int W,
                                        int K,
                                        int R,
                                        int S,
                                        int pad_h,
                                        int pad_w,
                                        int stride_h,
                                        int stride_w,
                                        int P,
                                        int Q,
                                        double intermediateMemoryBudget)

This method computes the backpropogation errors for filter of convolution operation

Parameters:: gCtx - a valid GPUContext; instName - the invoking instruction's name for record Statistics.; image - input image; dout - errors from next layer; outputBlock - output errors; N - number of images; C - number of channels; H - height; W - width; K - number of filters; R - filter height; S - filter width; pad_h - pad height; pad_w - pad width; stride_h - stride height; stride_w - stride width; P - output activation height; Q - output activation width; intermediateMemoryBudget - intermediate memory budget

conv2dBackwardData

public static void conv2dBackwardData(GPUContext gCtx,
                                      String instName,
                                      MatrixObject filter,
                                      MatrixObject dout,
                                      MatrixObject output,
                                      int N,
                                      int C,
                                      int H,
                                      int W,
                                      int K,
                                      int R,
                                      int S,
                                      int pad_h,
                                      int pad_w,
                                      int stride_h,
                                      int stride_w,
                                      int P,
                                      int Q,
                                      double intermediateMemoryBudget)

This method computes the backpropogation errors for previous layer of convolution operation

Parameters:: gCtx - a valid GPUContext; instName - the invoking instruction's name for record Statistics.; filter - filter used in conv2d; dout - errors from next layer; output - output errors; N - number of images; C - number of channels; H - height; W - width; K - number of filters; R - filter height; S - filter width; pad_h - pad height; pad_w - pad width; stride_h - stride height; stride_w - stride width; P - output activation height; Q - output activation width; intermediateMemoryBudget - intermediate memory budget

pooling

public static void pooling(GPUContext gCtx,
                           String instName,
                           MatrixObject image,
                           MatrixObject outputBlock,
                           int N,
                           int C,
                           int H,
                           int W,
                           int K,
                           int R,
                           int S,
                           int pad_h,
                           int pad_w,
                           int stride_h,
                           int stride_w,
                           int P,
                           int Q,
                           LibMatrixDNN.PoolingType poolingType,
                           double intermediateMemoryBudget)

performs maxpooling on GPU by exploiting cudnnPoolingForward(...)

Parameters:: gCtx - a valid GPUContext; instName - the invoking instruction's name for record Statistics.; image - image as matrix object; outputBlock - output matrix; N - batch size; C - number of channels; H - height of image; W - width of image; K - number of filters; R - height of filter; S - width of filter; pad_h - vertical padding; pad_w - horizontal padding; stride_h - horizontal stride; stride_w - vertical stride; P - (H - R + 1 + 2*pad_h)/stride_h; Q - (W - S + 1 + 2*pad_w)/stride_w; poolingType - type of pooling; intermediateMemoryBudget - intermediate memory budget

poolingBackward

public static void poolingBackward(GPUContext gCtx,
                                   String instName,
                                   MatrixObject image,
                                   MatrixObject dout,
                                   MatrixObject maxpoolOutput,
                                   MatrixObject outputBlock,
                                   int N,
                                   int C,
                                   int H,
                                   int W,
                                   int K,
                                   int R,
                                   int S,
                                   int pad_h,
                                   int pad_w,
                                   int stride_h,
                                   int stride_w,
                                   int P,
                                   int Q,
                                   LibMatrixDNN.PoolingType poolingType,
                                   double intermediateMemoryBudget)

Performs maxpoolingBackward on GPU by exploiting cudnnPoolingBackward(...) This method computes the backpropogation errors for previous layer of maxpooling operation

Parameters:: gCtx - a valid GPUContext; instName - the invoking instruction's name for record Statistics.; image - image as matrix object; dout - delta matrix, output of previous layer; maxpoolOutput - (optional and can be null) output of maxpool forward function; outputBlock - output matrix; N - batch size; C - number of channels; H - height of image; W - width of image; K - number of filters; R - height of filter; S - width of filter; pad_h - vertical padding; pad_w - horizontal padding; stride_h - horizontal stride; stride_w - vertical stride; P - (H - R + 1 + 2*pad_h)/stride_h; Q - (W - S + 1 + 2*pad_w)/stride_w; poolingType - type of pooling; intermediateMemoryBudget - intermediate memory budget

relu

public static void relu(ExecutionContext ec,
                        GPUContext gCtx,
                        String instName,
                        MatrixObject in,
                        String outputName)

Performs the relu operation on the GPU.

Parameters:: ec - currently active ExecutionContext; gCtx - a valid GPUContext; instName - the invoking instruction's name for record Statistics.; in - input matrix; outputName - name of the output matrix

lstm

public static void lstm(ExecutionContext ec,
                        GPUContext gCtx,
                        String instName,
                        jcuda.Pointer X,
                        jcuda.Pointer wPointer,
                        jcuda.Pointer out0,
                        jcuda.Pointer c0,
                        boolean return_sequences,
                        String outputName,
                        String cyName,
                        int N,
                        int M,
                        int D,
                        int T)
                 throws DMLRuntimeException

Computes the forward pass for an LSTM layer with M neurons. The input data has N sequences of T examples, each with D features.

Parameters:: ec - execution context; gCtx - gpu context; instName - name of the instruction; X - input matrix pointer; wPointer - weight matrix pointer; out0 - Outputs from previous timestep; c0 - Initial cell state; return_sequences - Whether to return `out` at all timesteps, or just for the final timestep.; outputName - name of the out variable. If `return_sequences` is True, outputs for all timesteps.; cyName - name of the output cell state. Cell state for final timestep.; N - minibatch size; M - hidden size; D - number of features; T - sequence length
Throws:: DMLRuntimeException - if error

lstmBackward

public static void lstmBackward(ExecutionContext ec,
                                GPUContext gCtx,
                                String instName,
                                jcuda.Pointer x,
                                jcuda.Pointer hx,
                                jcuda.Pointer cx,
                                jcuda.Pointer wPointer,
                                String doutName,
                                String dcyName,
                                String dxName,
                                String dwName,
                                String dbName,
                                String dhxName,
                                String dcxName,
                                boolean return_sequences,
                                int N,
                                int M,
                                int D,
                                int T)
                         throws DMLRuntimeException

Throws:: DMLRuntimeException

batchNormalizationForwardTraining

public static void batchNormalizationForwardTraining(GPUContext gCtx,
                                                     String instName,
                                                     MatrixObject image,
                                                     MatrixObject scale,
                                                     MatrixObject bias,
                                                     MatrixObject runningMean,
                                                     MatrixObject runningVar,
                                                     MatrixObject ret,
                                                     MatrixObject retRunningMean,
                                                     MatrixObject retRunningVar,
                                                     double epsilon,
                                                     double exponentialAverageFactor,
                                                     MatrixObject resultSaveMean,
                                                     MatrixObject resultSaveInvVariance)
                                              throws DMLRuntimeException

Performs the forward BatchNormalization layer computation for training

Parameters:: gCtx - a valid GPUContext; instName - name of the instruction; image - input image; scale - scale (as per CuDNN) and gamma as per original paper: shape [1, C, 1, 1]; bias - bias (as per CuDNN) and beta as per original paper: shape [1, C, 1, 1]; runningMean - running mean accumulated during training phase: shape [1, C, 1, 1]; runningVar - running variance accumulated during training phase: shape [1, C, 1, 1]; ret - (output) normalized input; retRunningMean - (output) running mean accumulated during training phase: shape [1, C, 1, 1]; retRunningVar - (output) running variance accumulated during training phase: shape [1, C, 1, 1]; epsilon - epsilon value used in the batch normalization formula; exponentialAverageFactor - factor used in the moving average computation; resultSaveMean - (output) running mean accumulated during training phase: shape [1, C, 1, 1]; resultSaveInvVariance - (output) running variance accumulated during training phase: shape [1, C, 1, 1]
Throws:: DMLRuntimeException - if error occurs

batchNormalizationForwardInference

public static void batchNormalizationForwardInference(GPUContext gCtx,
                                                      String instName,
                                                      MatrixObject image,
                                                      MatrixObject scale,
                                                      MatrixObject bias,
                                                      MatrixObject runningMean,
                                                      MatrixObject runningVar,
                                                      MatrixObject ret,
                                                      double epsilon)
                                               throws DMLRuntimeException

Performs the forward BatchNormalization layer computation for inference

Parameters:: gCtx - a valid GPUContext; instName - name of the instruction; image - input image; scale - scale (as per CuDNN) and gamma as per original paper: shape [1, C, 1, 1]; bias - bias (as per CuDNN) and beta as per original paper: shape [1, C, 1, 1]; runningMean - running mean accumulated during training phase: shape [1, C, 1, 1]; runningVar - running variance accumulated during training phase: shape [1, C, 1, 1]; ret - normalized input; epsilon - epsilon value used in the batch normalization formula
Throws:: DMLRuntimeException - if error occurs

batchNormalizationBackward

public static void batchNormalizationBackward(GPUContext gCtx,
                                              String instName,
                                              MatrixObject image,
                                              MatrixObject dout,
                                              MatrixObject scale,
                                              MatrixObject dX,
                                              MatrixObject dScale,
                                              MatrixObject dBias,
                                              double epsilon,
                                              MatrixObject resultSaveMean,
                                              MatrixObject resultSaveInvVariance)
                                       throws DMLRuntimeException

This method computes the backpropagation errors for image, scale and bias of batch normalization layer

Parameters:: gCtx - a valid GPUContext; instName - name of the instruction; image - input image; dout - input errors of shape C, H, W; scale - scale (as per CuDNN) and gamma as per original paper: shape [1, C, 1, 1]; dX - (output) backpropagation errors for previous layer; dScale - backpropagation error for scale; dBias - backpropagation error for bias; epsilon - epsilon value used in the batch normalization formula; resultSaveMean - (input) running mean accumulated during training phase: shape [1, C, 1, 1]; resultSaveInvVariance - (input) running variance accumulated during training phase: shape [1, C, 1, 1]
Throws:: DMLRuntimeException - if error occurs

getDensePointerForCuDNN

public static jcuda.Pointer getDensePointerForCuDNN(GPUContext gCtx,
                                                    MatrixObject image,
                                                    String instName,
                                                    int numRows,
                                                    int numCols)
                                             throws DMLRuntimeException

Convenience method to get jcudaDenseMatrixPtr. This method explicitly converts sparse to dense format, so use it judiciously.

Parameters:: gCtx - a valid GPUContext; image - input matrix object; instName - name of the instruction; numRows - expected number of rows; numCols - expected number of columns
Returns:: jcuda pointer
Throws:: DMLRuntimeException - if error occurs while sparse to dense conversion

Class LibMatrixCuDNN

Field Summary

Fields inherited from class org.apache.sysds.runtime.matrix.data.LibMatrixCUDA

Constructor Summary