Class LibMatrixCUDA

    • Field Detail

      • sizeOfDataType

        public static int sizeOfDataType
      • customKernelSuffix

        public static String customKernelSuffix
    • Constructor Detail

      • LibMatrixCUDA

        public LibMatrixCUDA()
    • Method Detail

      • resetFloatingPointPrecision

        public static void resetFloatingPointPrecision()
        Sets the internal state based on the DMLScript.DATA_TYPE
      • getNnz

        public static long getNnz​(GPUContext gCtx,
                                  String instName,
                                  MatrixObject mo,
                                  boolean recomputeDenseNNZ)
        Note: if the matrix is in dense format, it explicitly re-computes the number of nonzeros.
        Parameters:
        gCtx - a valid GPU context
        instName - instruction name
        mo - matrix object
        recomputeDenseNNZ - recompute NNZ if dense
        Returns:
        number of non-zeroes
      • double2float

        public static jcuda.Pointer double2float​(GPUContext gCtx,
                                                 jcuda.Pointer A,
                                                 jcuda.Pointer ret,
                                                 int numElems)
      • float2double

        public static jcuda.Pointer float2double​(GPUContext gCtx,
                                                 jcuda.Pointer A,
                                                 jcuda.Pointer ret,
                                                 int numElems)
      • one

        public static jcuda.Pointer one()
        Convenience method to get a pointer to value '1.0' on device. Instead of allocating and deallocating it for every kernel invocation.
        Returns:
        jcuda pointer
      • zero

        public static jcuda.Pointer zero()
        Convenience method to get a pointer to value '0.0f' on device. Instead of allocating and deallocating it for every kernel invocation.
        Returns:
        jcuda pointer
      • getDensePointer

        public static jcuda.Pointer getDensePointer​(GPUContext gCtx,
                                                    MatrixObject input,
                                                    String instName)
                                             throws DMLRuntimeException
        Convenience method to get jcudaDenseMatrixPtr. This method explicitly converts sparse to dense format, so use it judiciously.
        Parameters:
        gCtx - a valid GPUContext
        input - input matrix object
        instName - the invoking instruction's name for record Statistics.
        Returns:
        jcuda pointer
        Throws:
        DMLRuntimeException
      • reluBackward

        public static void reluBackward​(GPUContext gCtx,
                                        String instName,
                                        MatrixObject input,
                                        MatrixObject dout,
                                        MatrixObject outputBlock)
        This method computes the backpropagation errors for previous layer of relu operation
        Parameters:
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        input - input image
        dout - next layer error propogation
        outputBlock - output
      • channelSums

        public static void channelSums​(GPUContext gCtx,
                                       String instName,
                                       MatrixObject input,
                                       MatrixObject outputBlock,
                                       long C,
                                       long HW)
        Perform channel_sums operations: out = rowSums(matrix(colSums(A), rows=C, cols=HW))
        Parameters:
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        input - input image
        outputBlock - output
        C - number of channels
        HW - height*width
      • biasMultiply

        public static void biasMultiply​(GPUContext gCtx,
                                        String instName,
                                        MatrixObject input,
                                        MatrixObject bias,
                                        MatrixObject outputBlock)
        Performs the operation corresponding to the DML script: ones = matrix(1, rows=1, cols=Hout*Wout) output = input * matrix(bias %*% ones, rows=1, cols=F*Hout*Wout) This operation is often followed by conv2d and hence we have introduced bias_add(input, bias) built-in function
        Parameters:
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        input - input image
        bias - bias
        outputBlock - output
      • biasAdd

        public static void biasAdd​(GPUContext gCtx,
                                   String instName,
                                   MatrixObject input,
                                   MatrixObject bias,
                                   MatrixObject outputBlock)
        Performs the operation corresponding to the DML script: ones = matrix(1, rows=1, cols=Hout*Wout) output = input + matrix(bias %*% ones, rows=1, cols=F*Hout*Wout) This operation is often followed by conv2d and hence we have introduced bias_add(input, bias) built-in function
        Parameters:
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        input - input image
        bias - bias
        outputBlock - output
      • matmultTSMM

        public static void matmultTSMM​(ExecutionContext ec,
                                       GPUContext gCtx,
                                       String instName,
                                       MatrixObject left,
                                       String outputName,
                                       boolean isLeftTransposed)
        Performs tsmm, A %*% A' or A' %*% A, on GPU by exploiting cublasDsyrk(...)

        Memory Usage - If dense, input space - rows * cols, no intermediate memory, output - Max(rows*rows, cols*cols) If sparse, calls matmult

        Parameters:
        ec - execution context
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        left - input matrix, as in a tsmm expression like A %*% A' or A' %*% A, we just need to check whether the left one is transposed or not, I named it 'left'
        outputName - output matrix name
        isLeftTransposed - if true, left transposed
      • unaryAggregate

        public static void unaryAggregate​(ExecutionContext ec,
                                          GPUContext gCtx,
                                          String instName,
                                          MatrixObject in1,
                                          String output,
                                          AggregateUnaryOperator op)
        Entry point to perform Unary aggregate operations on the GPU. The execution context object is used to allocate memory for the GPU.
        Parameters:
        ec - Instance of ExecutionContext, from which the output variable will be allocated
        gCtx - a valid GPUContext
        instName - name of the invoking instruction to recordStatistics.
        in1 - input matrix
        output - output matrix/scalar name
        op - Instance of AggregateUnaryOperator which encapsulates the direction of reduction/aggregation and the reduction operation.
      • matrixScalarRelational

        public static void matrixScalarRelational​(ExecutionContext ec,
                                                  GPUContext gCtx,
                                                  String instName,
                                                  MatrixObject in,
                                                  String outputName,
                                                  ScalarOperator op)
        Entry point to perform elementwise matrix-scalar relational operation specified by op
        Parameters:
        ec - execution context
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        in - input matrix
        outputName - output matrix name
        op - scalar operator
      • matrixScalarArithmetic

        public static void matrixScalarArithmetic​(ExecutionContext ec,
                                                  GPUContext gCtx,
                                                  String instName,
                                                  MatrixObject in,
                                                  String outputName,
                                                  boolean isInputTransposed,
                                                  ScalarOperator op)
        Entry point to perform elementwise matrix-scalar arithmetic operation specified by op
        Parameters:
        ec - execution context
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        in - input matrix
        outputName - output matrix name
        isInputTransposed - true if input transposed
        op - scalar operator
      • matrixMatrixRelational

        public static void matrixMatrixRelational​(ExecutionContext ec,
                                                  GPUContext gCtx,
                                                  String instName,
                                                  MatrixObject in1,
                                                  MatrixObject in2,
                                                  String outputName,
                                                  BinaryOperator op)
        Performs elementwise operation relational specified by op of two input matrices in1 and in2
        Parameters:
        ec - execution context
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        in1 - input matrix 1
        in2 - input matrix 2
        outputName - output matrix name
        op - binary operator
      • matrixMatrixArithmetic

        public static void matrixMatrixArithmetic​(ExecutionContext ec,
                                                  GPUContext gCtx,
                                                  String instName,
                                                  MatrixObject in1,
                                                  MatrixObject in2,
                                                  String outputName,
                                                  boolean isLeftTransposed,
                                                  boolean isRightTransposed,
                                                  BinaryOperator op)
        Performs elementwise arithmetic operation specified by op of two input matrices in1 and in2
        Parameters:
        ec - execution context
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        in1 - input matrix 1
        in2 - input matrix 2
        outputName - output matrix name
        isLeftTransposed - true if left-transposed
        isRightTransposed - true if right-transposed
        op - binary operator
      • matrixScalarOp

        public static void matrixScalarOp​(ExecutionContext ec,
                                          GPUContext gCtx,
                                          String instName,
                                          MatrixObject in,
                                          String outputName,
                                          boolean isInputTransposed,
                                          ScalarOperator op)
        Utility to do matrix-scalar operation kernel
        Parameters:
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        ec - execution context
        in - input matrix
        outputName - output variable name
        isInputTransposed - true if input is transposed
        op - operator
      • deviceCopy

        public static void deviceCopy​(String instName,
                                      jcuda.Pointer src,
                                      jcuda.Pointer dest,
                                      int rlen,
                                      int clen)
        Performs a deep copy of input device double pointer corresponding to matrix
        Parameters:
        instName - the invoking instruction's name for record Statistics.
        src - source matrix
        dest - destination matrix
        rlen - number of rows
        clen - number of columns
      • denseTranspose

        public static void denseTranspose​(ExecutionContext ec,
                                          GPUContext gCtx,
                                          String instName,
                                          jcuda.Pointer A,
                                          jcuda.Pointer C,
                                          long numRowsA,
                                          long numColsA)
                                   throws DMLRuntimeException
        Computes C = t(A)
        Parameters:
        ec - execution context
        gCtx - gpu context
        instName - name of the instruction
        A - pointer to the input matrix
        C - pointer to the output matrix
        numRowsA - number of rows of the input matrix
        numColsA - number of columns of the output matrix
        Throws:
        DMLRuntimeException - if error
      • transpose

        public static void transpose​(ExecutionContext ec,
                                     GPUContext gCtx,
                                     String instName,
                                     MatrixObject in,
                                     String outputName)
        Transposes the input matrix using cublasDgeam
        Parameters:
        ec - execution context
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        in - input matrix
        outputName - output matrix name
      • toInt

        public static int toInt​(long num)
      • sliceOperations

        public static void sliceOperations​(ExecutionContext ec,
                                           GPUContext gCtx,
                                           String instName,
                                           MatrixObject in1,
                                           IndexRange ixrange,
                                           String outputName)
        Method to perform rightIndex operation for a given lower and upper bounds in row and column dimensions.
        Parameters:
        ec - current execution context
        gCtx - current gpu context
        instName - name of the instruction for maintaining statistics
        in1 - input matrix object
        ixrange - index range (0-based)
        outputName - output matrix object
      • exp

        public static void exp​(ExecutionContext ec,
                               GPUContext gCtx,
                               String instName,
                               MatrixObject in1,
                               String outputName)
        Performs an "exp" operation on a matrix on the GPU
        Parameters:
        ec - execution context
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        in1 - input matrix
        outputName - output matrix name
      • sqrt

        public static void sqrt​(ExecutionContext ec,
                                GPUContext gCtx,
                                String instName,
                                MatrixObject in1,
                                String outputName)
        Performs an "sqrt" operation on a matrix on the GPU
        Parameters:
        ec - execution context
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        in1 - input matrix
        outputName - output matrix name
      • round

        public static void round​(ExecutionContext ec,
                                 GPUContext gCtx,
                                 String instName,
                                 MatrixObject in1,
                                 String outputName)
        Performs an "round" operation on a matrix on the GPU
        Parameters:
        ec - execution context
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        in1 - input matrix
        outputName - output matrix name
      • abs

        public static void abs​(ExecutionContext ec,
                               GPUContext gCtx,
                               String instName,
                               MatrixObject in1,
                               String outputName)
        Performs an "abs" operation on a matrix on the GPU
        Parameters:
        ec - execution context
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        in1 - input matrix
        outputName - output matrix name
      • log

        public static void log​(ExecutionContext ec,
                               GPUContext gCtx,
                               String instName,
                               MatrixObject in1,
                               String outputName)
        Performs an "log" operation on a matrix on the GPU
        Parameters:
        ec - execution context
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        in1 - input matrix
        outputName - output matrix name
      • floor

        public static void floor​(ExecutionContext ec,
                                 GPUContext gCtx,
                                 String instName,
                                 MatrixObject in1,
                                 String outputName)
        Performs an "floor" operation on a matrix on the GPU
        Parameters:
        ec - execution context
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        in1 - input matrix
        outputName - output matrix name
      • ceil

        public static void ceil​(ExecutionContext ec,
                                GPUContext gCtx,
                                String instName,
                                MatrixObject in1,
                                String outputName)
        Performs an "ceil" operation on a matrix on the GPU
        Parameters:
        ec - execution context
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        in1 - input matrix
        outputName - output matrix name
      • sin

        public static void sin​(ExecutionContext ec,
                               GPUContext gCtx,
                               String instName,
                               MatrixObject in1,
                               String outputName)
        Performs an "sin" operation on a matrix on the GPU
        Parameters:
        ec - execution context
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        in1 - input matrix
        outputName - output matrix name
      • cos

        public static void cos​(ExecutionContext ec,
                               GPUContext gCtx,
                               String instName,
                               MatrixObject in1,
                               String outputName)
        Performs an "cos" operation on a matrix on the GPU
        Parameters:
        ec - execution context
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        in1 - input matrix
        outputName - output matrix name
      • tan

        public static void tan​(ExecutionContext ec,
                               GPUContext gCtx,
                               String instName,
                               MatrixObject in1,
                               String outputName)
        Performs an "tan" operation on a matrix on the GPU
        Parameters:
        ec - execution context
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        in1 - input matrix
        outputName - output matrix name
      • sinh

        public static void sinh​(ExecutionContext ec,
                                GPUContext gCtx,
                                String instName,
                                MatrixObject in1,
                                String outputName)
        Performs an "sinh" operation on a matrix on the GPU
        Parameters:
        ec - execution context
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        in1 - input matrix
        outputName - output matrix name
      • cosh

        public static void cosh​(ExecutionContext ec,
                                GPUContext gCtx,
                                String instName,
                                MatrixObject in1,
                                String outputName)
        Performs an "cosh" operation on a matrix on the GPU
        Parameters:
        ec - execution context
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        in1 - input matrix
        outputName - output matrix name
      • tanh

        public static void tanh​(ExecutionContext ec,
                                GPUContext gCtx,
                                String instName,
                                MatrixObject in1,
                                String outputName)
        Performs an "tanh" operation on a matrix on the GPU
        Parameters:
        ec - execution context
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        in1 - input matrix
        outputName - output matrix name
      • asin

        public static void asin​(ExecutionContext ec,
                                GPUContext gCtx,
                                String instName,
                                MatrixObject in1,
                                String outputName)
        Performs an "asin" operation on a matrix on the GPU
        Parameters:
        ec - execution context
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        in1 - input matrix
        outputName - output matrix name
      • acos

        public static void acos​(ExecutionContext ec,
                                GPUContext gCtx,
                                String instName,
                                MatrixObject in1,
                                String outputName)
        Performs an "acos" operation on a matrix on the GPU
        Parameters:
        ec - execution context
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        in1 - input matrix
        outputName - output matrix name
      • atan

        public static void atan​(ExecutionContext ec,
                                GPUContext gCtx,
                                String instName,
                                MatrixObject in1,
                                String outputName)
        Performs an "atan" operation on a matrix on the GPU
        Parameters:
        ec - execution context
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        in1 - input matrix
        outputName - output matrix name
      • sign

        public static void sign​(ExecutionContext ec,
                                GPUContext gCtx,
                                String instName,
                                MatrixObject in1,
                                String outputName)
        Performs an "sign" operation on a matrix on the GPU
        Parameters:
        ec - execution context
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        in1 - input matrix
        outputName - output matrix name
      • sigmoid

        public static void sigmoid​(ExecutionContext ec,
                                   GPUContext gCtx,
                                   String instName,
                                   MatrixObject in1,
                                   String outputName)
        Performs an "sigmoid" operation on a matrix on the GPU
        Parameters:
        ec - execution context
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        in1 - input matrix
        outputName - output matrix name
      • cumulativeScan

        public static void cumulativeScan​(ExecutionContext ec,
                                          GPUContext gCtx,
                                          String instName,
                                          String kernelFunction,
                                          MatrixObject in,
                                          String outputName)
        Cumulative scan
        Parameters:
        ec - valid execution context
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        kernelFunction - The name of the cuda kernel to call
        in - input matrix
        outputName - output matrix name
      • cumulativeSumProduct

        public static void cumulativeSumProduct​(ExecutionContext ec,
                                                GPUContext gCtx,
                                                String instName,
                                                String kernelFunction,
                                                MatrixObject in,
                                                String outputName)
        Cumulative sum-product kernel cascade invokation
        Parameters:
        ec - valid execution context
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        kernelFunction - The name of the cuda kernel to call
        in - input matrix
        outputName - output matrix name
      • axpy

        public static void axpy​(ExecutionContext ec,
                                GPUContext gCtx,
                                String instName,
                                MatrixObject in1,
                                MatrixObject in2,
                                String outputName,
                                double constant)
        Performs daxpy operation
        Parameters:
        ec - execution context
        gCtx - a valid GPUContext
        instName - the invoking instruction's name for record Statistics.
        in1 - input matrix 1
        in2 - input matrix 2
        outputName - output matrix name
        constant - pointer constant
      • getDenseMatrixOutputForGPUInstruction

        public static MatrixObject getDenseMatrixOutputForGPUInstruction​(ExecutionContext ec,
                                                                         String instName,
                                                                         String name,
                                                                         long numRows,
                                                                         long numCols)
        Helper method to get the output block (allocated on the GPU) Also records performance information into Statistics
        Parameters:
        ec - active ExecutionContext
        instName - the invoking instruction's name for record Statistics.
        name - name of input matrix (that the ExecutionContext is aware of)
        numRows - number of rows of output matrix object
        numCols - number of columns of output matrix object
        Returns:
        the matrix object
      • getDenseMatrixOutputForGPUInstruction

        public static MatrixObject getDenseMatrixOutputForGPUInstruction​(ExecutionContext ec,
                                                                         String instName,
                                                                         String name,
                                                                         long numRows,
                                                                         long numCols,
                                                                         boolean initialize)
      • computeNNZ

        public static int computeNNZ​(GPUContext gCtx,
                                     jcuda.Pointer densePtr,
                                     int length)
        Utility to compute number of non-zeroes on the GPU
        Parameters:
        gCtx - the associated GPUContext
        densePtr - device pointer to the dense matrix
        length - length of the dense pointer
        Returns:
        the number of non-zeroes