Algorithms

SystemDS support different Machine learning algorithms out of the box.

As an example the lm algorithm can be used as follows:

# Import numpy and SystemDS
import numpy as np
from systemds.context import SystemDSContext
from systemds.operator.algorithm import lm

# Set a seed
np.random.seed(0)
# Generate matrix of feature vectors
features = np.random.rand(10, 15)
# Generate a 1-column matrix of response values
y = np.random.rand(10, 1)

# compute the weights
with SystemDSContext() as sds:
  weights = lm(sds.from_numpy(features), sds.from_numpy(y)).compute()
  print(weights)

The output should be similar to

[[-0.11538199]
[-0.20386541]
[-0.39956035]
[ 1.04078623]
[ 0.4327084 ]
[ 0.18954599]
[ 0.49858968]
[-0.26812763]
[ 0.09961844]
[-0.57000751]
[-0.43386048]
[ 0.55358873]
[-0.54638565]
[ 0.2205885 ]
[ 0.37957689]]
systemds.operator.algorithm.abstain(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, threshold: float, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • threshold

  • verbose – flag specifying if logging information should be printed

Returns

‘OperationNode’ containing

systemds.operator.algorithm.als(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • rank – Rank of the factorization

  • reg – Regularization:

  • lambda – Regularization parameter, no regularization if 0.0

  • maxi – Maximum number of iterations

  • check – Check for convergence after every iteration, i.e., updating U and V once

  • thr – Assuming check is set to TRUE, the algorithm stops and convergence is declared

  • if – in loss in any two consecutive iterations falls below this threshold;

  • if – FALSE thr is ignored

Returns

‘OperationNode’ containing m x r matrix where r is the factorization rank & m x r matrix where r is the factorization rank

systemds.operator.algorithm.alsCG(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • rank – Rank of the factorization

  • reg – Regularization:

  • lambda – Regularization parameter, no regularization if 0.0

  • maxi – Maximum number of iterations

  • check – Check for convergence after every iteration, i.e., updating U and V once

  • thr – Assuming check is set to TRUE, the algorithm stops and convergence is declared

  • if – in loss in any two consecutive iterations falls below this threshold;

  • if – FALSE thr is ignored

Returns

‘OperationNode’ containing

systemds.operator.algorithm.alsDS(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • rank – Rank of the factorization

  • lambda – Regularization parameter, no regularization if 0.0

  • maxi – Maximum number of iterations

  • check – Check for convergence after every iteration, i.e., updating L and R once

  • thr – Assuming check is set to TRUE, the algorithm stops and convergence is declared

  • if – in loss in any two consecutive iterations falls below this threshold;

  • if – FALSE thr is ignored

Returns

‘OperationNode’ containing

systemds.operator.algorithm.alsTopkPredict(userIDs: systemds.operator.nodes.matrix.Matrix, I: systemds.operator.nodes.matrix.Matrix, L: systemds.operator.nodes.matrix.Matrix, R: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters

K – The number of top-K items

Returns

‘OperationNode’ containing

systemds.operator.algorithm.arima(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • p – non-seasonal AR order

  • d – non-seasonal differencing order

  • q – non-seasonal MA order

  • P – seasonal AR order

  • D – seasonal differencing order

  • Q – seasonal MA order

  • s – period in terms of number of time-steps

  • include_mean – center to mean 0, and include in result

  • solver – solver, is either “cg” or “jacobi”

Returns

‘OperationNode’ containing

systemds.operator.algorithm.bivar(X: systemds.operator.nodes.matrix.Matrix, S1: systemds.operator.nodes.matrix.Matrix, S2: systemds.operator.nodes.matrix.Matrix, T1: systemds.operator.nodes.matrix.Matrix, T2: systemds.operator.nodes.matrix.Matrix, verbose: bool)
Parameters

verbose – Print bivar stats

Returns

‘OperationNode’ containing as output with bivar stats & as output with bivar stats & as output with bivar stats & as output with bivar stats

systemds.operator.algorithm.components(G: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • X – Location to read the matrix of feature vectors

  • Y – Location to read the matrix with category labels

  • icpt – Intercept presence, shifting and rescaling X columns: 0 = no intercept,

  • tol – tolerance (“epsilon”)

  • reg – regularization parameter (lambda = 1/C); intercept is not regularized

  • maxi – max. number of outer (Newton) iterations

  • maxii – max. number of inner (conjugate gradient) iterations, 0 = no max

  • verbose – flag specifying if logging information should be printed

Returns

‘OperationNode’ containing

systemds.operator.algorithm.confusionMatrix(P: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix)
Parameters

encoded – actual labels

Returns

‘OperationNode’ containing

systemds.operator.algorithm.cox(X: systemds.operator.nodes.matrix.Matrix, TE: systemds.operator.nodes.matrix.Matrix, F: systemds.operator.nodes.matrix.Matrix, R: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • X – Location to read the input matrix X containing the survival data

  • containing – information

  • TE – Column indices of X as a column vector which contain timestamp

  • F – Column indices of X as a column vector which are to be used for

  • fitting – model

  • R – If factors (categorical variables) are available in the input matrix

  • the – X

  • each – needs to be removed from X; in this case the start

  • and – corresponding to the baseline level need to be the same;

  • if – not provided by default all variables are considered to be continuous

  • alpha – Parameter to compute a 100*(1-alpha)% confidence interval for the betas

  • tol – Tolerance (“epsilon”)

  • moi – Max. number of outer (Newton) iterations

  • mii – Max. number of inner (conjugate gradient) iterations, 0 = no max

Returns

‘OperationNode’ containing matrix rt that contains the order-preserving recoded timestamps from x & which is matrix x with sorted timestamps & matrix mf that contains the column indices of x with the baseline factors removed (if available)

systemds.operator.algorithm.cspline(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, inp_x: float, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • monotonically – there is no duplicates points in X

  • inp_x – the given input x, for which the cspline will find predicted y

  • mode – Specifies the method for cspline (DS - Direct Solve, CG - Conjugate Gradient)

  • tol – Tolerance (epsilon); conjugate graduent procedure terminates early if

  • L2 – the beta-residual is less than tolerance * its initial norm

  • maxi – Maximum number of conjugate gradient iterations, 0 = no maximum

Returns

‘OperationNode’ containing

systemds.operator.algorithm.csplineCG(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, inp_x: float, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • monotonically – there is no duplicates points in X

  • inp_x – the given input x, for which the cspline will find predicted y.

  • tol – Tolerance (epsilon); conjugate graduent procedure terminates early if

  • L2 – the beta-residual is less than tolerance * its initial norm

  • maxi – Maximum number of conjugate gradient iterations, 0 = no maximum

Returns

‘OperationNode’ containing

systemds.operator.algorithm.csplineDS(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, inp_x: float)
Parameters
  • monotonically – there is no duplicates points in X

  • inp_x – the given input x, for which the cspline will find predicted y.

Returns

‘OperationNode’ containing

systemds.operator.algorithm.cvlm(X: systemds.operator.nodes.matrix.Matrix, y: systemds.operator.nodes.matrix.Matrix, k: int, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • k – Number of subsets needed, It should always be more than 1 and less than nrow(X)

  • icpt – Intercept presence, shifting and rescaling the columns of X

  • reg – Regularization constant (lambda) for L2-regularization. set to nonzero for

Returns

‘OperationNode’ containing

systemds.operator.algorithm.dbscan(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • eps – Maximum distance between two points for one to be considered reachable for the other.

  • minPts – Number of points in a neighborhood for a point to be considered as a core point

Returns

‘OperationNode’ containing

systemds.operator.algorithm.dbscanApply(X: systemds.operator.nodes.matrix.Matrix, clusterModel: systemds.operator.nodes.matrix.Matrix, eps: float)
Parameters

eps – Maximum distance between two points for one to be considered reachable for the other.

Returns

‘OperationNode’ containing

systemds.operator.algorithm.decisionTree(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, R: systemds.operator.nodes.matrix.Matrix, verbose: bool, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • a – vector, other positive Integers indicate the number of categories

  • If – not provided by default all variables are assumed to be scale

  • bins – Number of equiheight bins per scale feature to choose thresholds

  • depth – Maximum depth of the learned tree

  • verbose – boolean specifying if the algorithm should print information while executing

Returns

‘OperationNode’ containing information: & if the feature is categorical) & looks at if j is an internal node, otherwise 0 & as r input vector & of the subset of values & 6,7,… if j is categorical & a leaf node: number of misclassified samples reaching at node j & at m[6,j] if the feature chosen for j is scale, & feature chosen for j is categorical rows 6,7,… depict the value subset chosen for j & a leaf node 1 if j is impure and the number of samples at j > threshold, otherwise 0

systemds.operator.algorithm.decisionTreePredict(M: systemds.operator.nodes.matrix.Matrix, X: systemds.operator.nodes.matrix.Matrix, strategy: str)
Parameters
  • to – in the learned tree and each row contains the following information:

  • categorical – if the feature is categorical)

  • that – looks at if j is an internal node, otherwise 0

  • the – as R input vector

  • otherwise – of the subset of values

  • stored – 6,7,… if j is categorical

  • If – a leaf node: number of misclassified samples reaching at node j

  • to – at M[6,j] if the feature chosen for j is scale,

  • otherwise – feature chosen for j is categorical rows 6,7,… depict the value subset chosen for j

  • If – a leaf node 1 if j is impure and the number of samples at j > threshold, otherwise 0

  • strategy – strategy, can be one of [“GEMM”, “TT”, “PTT”], referring to “Generic matrix multiplication”,

Returns

‘OperationNode’ containing

systemds.operator.algorithm.deepWalk(Graph: systemds.operator.nodes.matrix.Matrix, w: int, d: int, gamma: int, t: int, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • w – window size

  • d – embedding size

  • gamma – walks per vertex

  • t – walk length

  • alpha – learning rate

  • beta – factor for decreasing learning rate

Returns

‘OperationNode’ containing

systemds.operator.algorithm.discoverFD(X: systemds.operator.nodes.matrix.Matrix, Mask: systemds.operator.nodes.matrix.Matrix, threshold: float)
Parameters

will – second column from processing

Returns

‘OperationNode’ containing

systemds.operator.algorithm.executePipeline(X: systemds.operator.nodes.matrix.Matrix)
Parameters
  • flagsCount

  • test

Returns

‘OperationNode’ containing validation check & convert the matrix row-vector into list & flag & append flag & append flag & append flag & of hyper-parameters and loop till that & flag & and remove categorical & and remove numerics & + 1 for nan replacement & matrix & matrix & ohe call, to call inside eval as a function & encoding of categorical features & features & ohe call, to call inside eval as a function & to call inside eval as a function & doing relative over-sampling & count & replace the null with default values & replace the null with default values & flip the noisy labels & best option

systemds.operator.algorithm.ffTrain(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, out_activation: str, loss_fcn: str, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • batch_size – Batch size

  • epochs – Number of epochs

  • learning_rate – Learning rate

  • out_activation – User specified ouptut activation function. Possible values:

  • loss_fcn – User specified loss function. Possible values:

  • shuffle – Flag which indicates if dataset should be shuffled or not

  • validation_split – Fraction of training set used as validation set

  • seed – Seed for model initialization

  • verbose – Flag which indicates if function should print to stdout

Returns

‘OperationNode’ containing by the model & by the model

systemds.operator.algorithm.garch(X: systemds.operator.nodes.matrix.Matrix, kmax: int, momentum: float, start_stepsize: float, end_stepsize: float, start_vicinity: float, end_vicinity: float, sim_seed: int, verbose: bool)
Parameters
  • kmax – Number of iterations

  • momentum – Momentum for momentum-gradient descent (set to 0 to deactivate)

  • start_stepsize – Initial gradient-descent stepsize

  • end_stepsize – gradient-descent stepsize at end (linear descent)

  • start_vicinity – proportion of randomness of restart-location for gradient descent at beginning

  • end_vicinity – same at end (linear decay)

  • sim_seed – seed for simulation of process on fitted coefficients

  • verbose – verbosity, comments during fitting

Returns

‘OperationNode’ containing term of fitted process & arch-coefficient of fitted process & garch-coefficient of fitted process & drawbacks: slow convergence of optimization (sort of simulated annealing/gradient descent)

systemds.operator.algorithm.gaussianClassifier(D: systemds.operator.nodes.matrix.Matrix, C: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • varSmoothing – Smoothing factor for variances

  • verbose – Print accuracy of the training set

Returns

‘OperationNode’ containing

systemds.operator.algorithm.getAccuracy(y: systemds.operator.nodes.matrix.Matrix, yhat: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters

isWeighted – Flag for weighted or non-weighted accuracy calculation

Returns

‘OperationNode’ containing of the predicted labels

systemds.operator.algorithm.glm(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • dfam – Distribution family code: 1 = Power, 2 = Binomial

  • vpow – Power for Variance defined as (mean)^power (ignored if dfam != 1):

  • link – Link function code: 0 = canonical (depends on distribution),

  • lpow – Power for Link function defined as (mean)^power (ignored if link != 1):

  • yneg – Response value for Bernoulli “No” label, usually 0.0 or -1.0

  • icpt – Intercept presence, X columns shifting and rescaling:

  • reg – Regularization parameter (lambda) for L2 regularization

  • tol – Tolerance (epsilon)

  • disp – (Over-)dispersion value, or 0.0 to estimate it from data

  • moi – Maximum number of outer (Newton / Fisher Scoring) iterations

  • mii – Maximum number of inner (Conjugate Gradient) iterations, 0 = no maximum

Returns

‘OperationNode’ containing line, as follows: & integer indicating success/failure as follows: & value (regression coefficient), excluding the intercept & for the smallest beta value & value (regression coefficient), excluding the intercept & for the largest beta value & or nan if there is no intercept (if icpt=0) & to scale deviance, provided as “disp” input parameter & from the dataset & the saturated model, assuming dispersion == 1.0 & the saturated model, scaled by the dispersion value & when requested, contains the following per-iteration variables in csv format, & triple (name, iteration, value) with iteration = 0 for initial values: & inner (conj.gradient) iterations in this outer iteration & function we minimize (i.e. negative partial log-likelihood) & the objective during this iteration, actual value & the objective predicted by a quadratic approximation & value of x %*% beta, used to check for overflows & value of x %*% beta, used to check for overflows & region size, the “delta” & supported glm distribution families & lpow distribution.link nical?

systemds.operator.algorithm.gmm(X: systemds.operator.nodes.matrix.Matrix, verbose: bool, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • n_components – Number of n_components in the Gaussian mixture model

  • model – “VVV”: unequal variance (full),each component has its own general covariance matrix

  • init_param – initialize weights with “kmeans” or “random”

  • iterations – Number of iterations

  • reg_covar – regularization parameter for covariance matrix

  • tol – tolerance value for convergence

Returns

‘OperationNode’ containing of estimated parameters & information criterion for best iteration & kth class

systemds.operator.algorithm.gmmPredict(X: systemds.operator.nodes.matrix.Matrix, weight: systemds.operator.nodes.matrix.Matrix, mu: systemds.operator.nodes.matrix.Matrix, precisions_cholesky: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters

model – fitted model

Returns

‘OperationNode’ containing cluster labels & of belongingness & for new instances given the variance and mean of fitted data

systemds.operator.algorithm.gnmf(X: systemds.operator.nodes.matrix.Matrix, rnk: int, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • rnk – Number of components into which matrix X is to be factored

  • eps – Tolerance

  • maxi – Maximum number of conjugate gradient iterations

Returns

‘OperationNode’ containing

systemds.operator.algorithm.hospitalResidencyMatch(R: systemds.operator.nodes.matrix.Matrix, H: systemds.operator.nodes.matrix.Matrix, capacity: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • It – an ORDERED matrix.

  • It – an UNORDRED matrix.

  • It – a [n*1] matrix with non zero values.

  • with – and vice-versa (higher is better).

  • verbose – If the operation is verbose

Returns

‘OperationNode’ containing an ordered matrix, this means that resident 1 (row 1) likes hospital 2 the most, followed by hospital 1 and hospital 3. & unordered, this would mean that resident 1 (row 1) likes hospital 3 the most (since the value at [1,3] is the row max), & 1 (2.0 preference value) and hospital 2 (1.0 preference value). & an unordered matrix this means that hospital 1 (row 1) likes resident 1 the most (since the value at [1,1] is the row max). & matched with hospital 3 (since [1,3] is non-zero) at a preference level of 2.0. & matched with hospital 1 (since [2,1] is non-zero) at a preference level of 1.0. & matched with hospital 2 (since [3,2] is non-zero) at a preference level of 2.0.

systemds.operator.algorithm.hyperband(X_train: systemds.operator.nodes.matrix.Matrix, y_train: systemds.operator.nodes.matrix.Matrix, X_val: systemds.operator.nodes.matrix.Matrix, y_val: systemds.operator.nodes.matrix.Matrix, params: Iterable, paramRanges: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • One – hyper parameter, first column specifies min, second column max value.

  • verbose – If TRUE print messages are activated

Returns

‘OperationNode’ containing

systemds.operator.algorithm.img_brightness(img_in: systemds.operator.nodes.matrix.Matrix, value: float, channel_max: int)
Parameters
  • value – The amount of brightness to be changed for the image

  • channel_max – Maximum value of the brightness of the image

Returns

‘OperationNode’ containing

systemds.operator.algorithm.img_crop(img_in: systemds.operator.nodes.matrix.Matrix, w: int, h: int, x_offset: int, y_offset: int)
Parameters
  • w – The width of the subregion required

  • h – The height of the subregion required

  • x_offset – The horizontal coordinate in the image to begin the crop operation

  • y_offset – The vertical coordinate in the image to begin the crop operation

Returns

‘OperationNode’ containing

systemds.operator.algorithm.img_cutout(img_in: systemds.operator.nodes.matrix.Matrix, x: int, y: int, width: int, height: int, fill_value: float)
Parameters
  • x – Column index of the top left corner of the rectangle (starting at 1)

  • y – Row index of the top left corner of the rectangle (starting at 1)

  • width – Width of the rectangle (must be positive)

  • height – Height of the rectangle (must be positive)

  • fill_value – The value to set for the rectangle

Returns

‘OperationNode’ containing

systemds.operator.algorithm.img_invert(img_in: systemds.operator.nodes.matrix.Matrix, max_value: float)
Parameters

max_value – The maximum value pixels can have

Returns

‘OperationNode’ containing

systemds.operator.algorithm.img_mirror(img_in: systemds.operator.nodes.matrix.Matrix, horizontal_axis: bool)
Parameters

max_value – The maximum value pixels can have

Returns

‘OperationNode’ containing

systemds.operator.algorithm.img_posterize(img_in: systemds.operator.nodes.matrix.Matrix, bits: int)
Parameters
  • bits – The number of bits keep for the values.

  • 1 – and white, 8 means every integer between 0 and 255.

Returns

‘OperationNode’ containing

systemds.operator.algorithm.img_rotate(img_in: systemds.operator.nodes.matrix.Matrix, radians: float, fill_value: float)
Parameters
  • radians – The value by which to rotate in radian.

  • fill_value – The background color revealed by the rotation

Returns

‘OperationNode’ containing

systemds.operator.algorithm.img_sample_pairing(img_in1: systemds.operator.nodes.matrix.Matrix, img_in2: systemds.operator.nodes.matrix.Matrix, weight: float)
Parameters
  • weight – The weight given to the second image.

  • 0 – img_in1, 1 means only img_in2 will be visible

Returns

‘OperationNode’ containing

systemds.operator.algorithm.img_shear(img_in: systemds.operator.nodes.matrix.Matrix, shear_x: float, shear_y: float, fill_value: float)
Parameters
  • shear_x – Shearing factor for horizontal shearing

  • shear_y – Shearing factor for vertical shearing

  • fill_value – The background color revealed by the shearing

Returns

‘OperationNode’ containing

systemds.operator.algorithm.img_transform(img_in: systemds.operator.nodes.matrix.Matrix, out_w: int, out_h: int, a: float, b: float, c: float, d: float, e: float, f: float, fill_value: float)
Parameters
  • out_w – Width of the output image

  • out_h – Height of the output image

  • fill_value – The background of the image

Returns

‘OperationNode’ containing image as 2d matrix with top left corner at [1, 1]

systemds.operator.algorithm.img_translate(img_in: systemds.operator.nodes.matrix.Matrix, offset_x: float, offset_y: float, out_w: int, out_h: int, fill_value: float)
Parameters
  • offset_x – The distance to move the image in x direction

  • offset_y – The distance to move the image in y direction

  • out_w – Width of the output image

  • out_h – Height of the output image

  • fill_value – The background of the image

Returns

‘OperationNode’ containing

systemds.operator.algorithm.imputeByFD(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, threshold: float, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • threshold – threshold value in interval [0, 1] for robust FDs

  • verbose – flag for printing verbose debug output

Returns

‘OperationNode’ containing

systemds.operator.algorithm.imputeByFDApply(X: systemds.operator.nodes.matrix.Matrix, Y_imp: systemds.operator.nodes.matrix.Matrix)
Parameters
  • source – source attribute to use for imputation and error correction

  • target – attribute to be fixed

  • threshold – threshold value in interval [0, 1] for robust FDs

Returns

‘OperationNode’ containing

systemds.operator.algorithm.km(X: systemds.operator.nodes.matrix.Matrix, TE: systemds.operator.nodes.matrix.Matrix, GI: systemds.operator.nodes.matrix.Matrix, SI: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • number – (categorical features) for grouping and/or stratifying

  • alpha – Parameter to compute 100*(1-alpha)% confidence intervals for the survivor

  • function – median

  • err_type – Parameter to specify the error type according to “greenwood” (the default) or “peto”

  • conf_type – Parameter to modify the confidence interval; “plain” keeps the lower and

  • upper – the confidence interval unmodified, “log” (the default)

  • corresponds – transformation and “log-log” corresponds to the

  • test_type – If survival data for multiple groups is available specifies which test to

  • perform – survival data across multiple groups: “none” (the default)

Returns

‘OperationNode’ containing 7 consecutive columns in km corresponds to a unique & and strata in the data with the following schema & number of factors used for stratifying, i.e., ncol(si)) & of groups and strata is equal to 1, m will have 4 columns with & 4 matrix t and an g x 5 matrix t_groups_oe with

systemds.operator.algorithm.kmeans(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • k – Number of centroids

  • runs – Number of runs (with different initial centroids)

  • max_iter – Maximum number of iterations per run

  • eps – Tolerance (epsilon) for WCSS change ratio

  • is_verbose – do not print per-iteration stats

  • avg_sample_size_per_centroid – Average number of records per centroid in data samples

  • seed – The seed used for initial sampling. If set to -1

  • random – selected.

Returns

‘OperationNode’ containing

systemds.operator.algorithm.l2svm(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • intercept – No Intercept ( If set to TRUE then a constant bias column is added to X)

  • epsilon – Procedure terminates early if the reduction in objective function value is less

  • lambda – Regularization parameter (lambda) for L2 regularization

  • maxIterations – Maximum number of conjugate gradient iterations

  • maxii

  • verbose – Set to true if one wants print statements updating on loss.

  • columnId – The column Id used if one wants to add a ID to the print statement, Specificly

  • usefull – is used in MSVM.

Returns

‘OperationNode’ containing

systemds.operator.algorithm.l2svmPredict(X: systemds.operator.nodes.matrix.Matrix, W: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters

verbose – Set to true if one wants print statements.

Returns

‘OperationNode’ containing

systemds.operator.algorithm.lasso(X: systemds.operator.nodes.matrix.Matrix, y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • tol – target convergence tolerance

  • M – history length

  • tau – regularization component

  • maxi – maximum number of iterations until convergence

  • verbose – if the builtin should be verbose

Returns

‘OperationNode’ containing

systemds.operator.algorithm.lenetTrain(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, X_val: systemds.operator.nodes.matrix.Matrix, Y_val: systemds.operator.nodes.matrix.Matrix, C: int, Hin: int, Win: int, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • C – Number of input channels (dimensionality of input depth)

  • Hin – Input width

  • Win – Input height

  • batch_size – Batch size

  • epochs – Number of epochs

  • lr – Learning rate

  • mu – Momentum value

  • decay – Learning rate decay

  • lambda – Regularization strength

  • seed – Seed for model initialization

  • verbose – Flag indicates if function should print to stdout

Returns

‘OperationNode’ containing

systemds.operator.algorithm.lm(X: systemds.operator.nodes.matrix.Matrix, y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • icpt – Intercept presence, shifting and rescaling the columns of X

  • reg – Regularization constant (lambda) for L2-regularization. set to nonzero

  • tol – Tolerance (epsilon); conjugate gradient procedure terminates early if L2

  • norm – beta-residual is less than tolerance * its initial norm

  • maxi – Maximum number of conjugate gradient iterations. 0 = no maximum

  • verbose – If TRUE print messages are activated

Returns

‘OperationNode’ containing

systemds.operator.algorithm.lmCG(X: systemds.operator.nodes.matrix.Matrix, y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • icpt – Intercept presence, shifting and rescaling the columns of X

  • reg – Regularization constant (lambda) for L2-regularization. set to nonzero

  • tol – Tolerance (epsilon); conjugate gradient procedure terminates early if L2

  • norm – beta-residual is less than tolerance * its initial norm

  • maxi – Maximum number of conjugate gradient iterations. 0 = no maximum

  • verbose – If TRUE print messages are activated

Returns

‘OperationNode’ containing

systemds.operator.algorithm.lmDS(X: systemds.operator.nodes.matrix.Matrix, y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • icpt – Intercept presence, shifting and rescaling the columns of X

  • reg – Regularization constant (lambda) for L2-regularization. set to nonzero

  • tol – Tolerance (epsilon); conjugate gradient procedure terminates early if L2

  • norm – beta-residual is less than tolerance * its initial norm

  • maxi – Maximum number of conjugate gradient iterations. 0 = no maximum

  • verbose – If TRUE print messages are activated

Returns

‘OperationNode’ containing

systemds.operator.algorithm.logSumExp(M: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • margin – if the logsumexp of rows is required set margin = “row”

  • if – of columns is required set margin = “col”

  • if – “none” then a single scalar is returned computing logsumexp of matrix

Returns

‘OperationNode’ containing

systemds.operator.algorithm.matrixProfile(ts: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • window_size – Sliding window size

  • sample_percent – Degree of approximation

  • between – one (1

  • computes – solution)

  • is_verbose – Print debug information

Returns

‘OperationNode’ containing

systemds.operator.algorithm.miceApply(X: systemds.operator.nodes.matrix.Matrix, meta: systemds.operator.nodes.matrix.Matrix, threshold: float, dM: systemds.operator.nodes.frame.Frame, betaList: Iterable)
Parameters
  • threshold – confidence value [0, 1] for robust imputation, values will only be imputed

  • if – value has probability greater than threshold,

  • only – categorical data

  • verbose – Boolean value.

Returns

‘OperationNode’ containing are represented with empty string i.e “,,” in csv file & n are storing continuos/numeric data and variables with & storing categorical data

systemds.operator.algorithm.msvm(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • intercept – No Intercept ( If set to TRUE then a constant bias column is added to X)

  • num_classes – Number of classes

  • epsilon – Procedure terminates early if the reduction in objective function

  • value – than epsilon (tolerance) times the initial objective function value.

  • lambda – Regularization parameter (lambda) for L2 regularization

  • maxIterations – Maximum number of conjugate gradient iterations

  • verbose – Set to true to print while training.

Returns

‘OperationNode’ containing

systemds.operator.algorithm.multiLogReg(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • icpt – Intercept presence, shifting and rescaling X columns: 0 = no intercept,

  • tol – tolerance (“epsilon”)

  • reg – regularization parameter (lambda = 1/C); intercept is not regularized

  • maxi – max. number of outer (Newton) iterations

  • maxii – max. number of inner (conjugate gradient) iterations, 0 = no max

  • verbose – flag specifying if logging information should be printed

Returns

‘OperationNode’ containing

systemds.operator.algorithm.multiLogRegPredict(X: systemds.operator.nodes.matrix.Matrix, B: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters

verbose – flag specifying if logging information should be printed

Returns

‘OperationNode’ containing value of accuracy

systemds.operator.algorithm.na_locf(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • option – String “locf” (last observation moved forward) to do forward fill

  • verbose – to print output on screen

Returns

‘OperationNode’ containing

systemds.operator.algorithm.naiveBayes(D: systemds.operator.nodes.matrix.Matrix, C: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • Laplace – Any Double value.

  • Verbose – Boolean value.

Returns

‘OperationNode’ containing

systemds.operator.algorithm.outlier(X: systemds.operator.nodes.matrix.Matrix, opposite: bool)
Parameters

opposite – (1)TRUE for evaluating outlier from upper quartile range,

Returns

‘OperationNode’ containing

systemds.operator.algorithm.outlierByArima(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • repairMethod – values: 0 = delete rows having outliers, 1 = replace outliers as zeros

  • p – non-seasonal AR order

  • d – non-seasonal differencing order

  • q – non-seasonal MA order

  • P – seasonal AR order

  • D – seasonal differencing order

  • Q – seasonal MA order

  • s – period in terms of number of time-steps

  • solver – solver, is either “cg” or “jacobi”

Returns

‘OperationNode’ containing

systemds.operator.algorithm.outlierByIQR(X: systemds.operator.nodes.matrix.Matrix, k: float, max_iterations: int, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • k – a constant used to discern outliers k*IQR

  • isIterative – iterative repair or single repair

  • repairMethod – values: 0 = delete rows having outliers,

  • max_iterations – values: 0 = arbitrary number of iteraition until all outliers are removed,

  • verbose – flag specifying if logging information should be printed

Returns

‘OperationNode’ containing meaning & matrix x with no outliers

systemds.operator.algorithm.outlierByIQRApply(X: systemds.operator.nodes.matrix.Matrix, Q1: systemds.operator.nodes.matrix.Matrix, Q3: systemds.operator.nodes.matrix.Matrix, IQR: systemds.operator.nodes.matrix.Matrix, k: float, repairMethod: int)
Parameters
  • k – a constant used to discern outliers k*IQR

  • repairMethod – values: 0 = delete rows having outliers,

Returns

‘OperationNode’ containing meaning & matrix x with no outliers

systemds.operator.algorithm.outlierBySd(X: systemds.operator.nodes.matrix.Matrix, max_iterations: int, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • k – threshold values 1, 2, 3 for 68%, 95%, 99.7% respectively (3-sigma rule)

  • repairMethod – values: 0 = delete rows having outliers, 1 = replace outliers as zeros

  • max_iterations – values: 0 = arbitrary number of iteration until all outliers are removed,

Returns

‘OperationNode’ containing

systemds.operator.algorithm.outlierBySdApply(X: systemds.operator.nodes.matrix.Matrix, colMean: systemds.operator.nodes.matrix.Matrix, colSD: systemds.operator.nodes.matrix.Matrix, k: float, repairMethod: int)
Parameters
  • k – a constant used to discern outliers k*IQR

  • isIterative – iterative repair or single repair

  • repairMethod – values: 0 = delete rows having outliers,

  • max_iterations – values: 0 = arbitrary number of iteraition until all outliers are removed,

  • verbose – flag specifying if logging information should be printed

Returns

‘OperationNode’ containing meaning & matrix x with no outliers

systemds.operator.algorithm.pca(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • K – Number of reduced dimensions (i.e., columns)

  • Center – Indicates whether or not to center the feature matrix

  • Scale – Indicates whether or not to scale the feature matrix

Returns

‘OperationNode’ containing

systemds.operator.algorithm.pnmf(X: systemds.operator.nodes.matrix.Matrix, rnk: int, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • rnk – Number of components into which matrix X is to be factored.

  • maxi – Maximum number of conjugate gradient iterations.

  • verbose – If TRUE, ‘iter’ and ‘obj’ are printed.

Returns

‘OperationNode’ containing

systemds.operator.algorithm.ppca(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • k – indicates dimension of the new vector space constructed from eigen vectors

  • maxi – maximum number of iterations until convergence

  • tolobj – objective function tolerance value to stop ppca algorithm

  • tolrecerr – reconstruction error tolerance value to stop the algorithm

  • verbose – verbose debug output

Returns

‘OperationNode’ containing

systemds.operator.algorithm.randomForest(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, R: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • If – not provided by default all variables are assumed to be scale

  • bins – Number of equiheight bins per scale feature to choose thresholds

  • depth – Maximum depth of the learned tree

  • num_leaf – Number of samples when splitting stops and a leaf node is added

  • num_samples – Number of samples at which point we switch to in-memory subtree building

  • num_trees – Number of trees to be learned in the random forest model

  • subsamp_rate – Parameter controlling the size of each tree in the forest; samples are selected from a

  • Poisson – parameter subsamp_rate (the default value is 1.0)

  • feature_subset – Parameter that controls the number of feature used as candidates for splitting at each tree node

  • as – of number of features in the dataset;

  • by – root of features (i.e., feature_subset = 0.5) are used at each tree node

  • impurity – Impurity measure: entropy or Gini (the default)

Returns

‘OperationNode’ containing tree and each row contains the following information: & that leaf node j is supposed to predict & subset of values & 7,8,… if j is categorical & stored at m[7,j] if the feature chosen for j is scale; & chosen for j is categorical rows 7,8,… depict the value subset chosen for j

systemds.operator.algorithm.scale(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • center – Indicates whether or not to center the feature matrix

  • scale – Indicates whether or not to scale the feature matrix

Returns

‘OperationNode’ containing

systemds.operator.algorithm.shortestPath(G: systemds.operator.nodes.matrix.Matrix, sourceNode: int, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • The – G can be 0/1 (just specifying whether the nodes

  • are – not) or integer values (representing the weight

  • of – or the distances between nodes, 0 if not connected).

  • maxi – Integer max number of iterations accepted (0 for FALSE, i.e.

  • max – iterations not defined)

  • sourceNode – index to calculate the shortest paths to all other nodes.

  • verbose – flag for verbose debug output

Returns

‘OperationNode’ containing minimum distance shortest-path from vertex i to vertex j. & of the minimum distance is infinity, the two nodes are

systemds.operator.algorithm.sigmoid(X: systemds.operator.nodes.matrix.Matrix)
Returns

‘OperationNode’ containing meaning

systemds.operator.algorithm.slicefinder(X: systemds.operator.nodes.matrix.Matrix, e: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • k – Number of subsets required

  • maxL – level L (conjunctions of L predicates), 0 unlimited

  • minSup – support (min number of rows per slice)

  • alpha – [0,1]: 0 only size, 1 only error

  • tpEval – for task-parallel slice evaluation,

  • tpBlksz – size for task-parallel execution (num slices)

  • selFeat – for removing one-hot-encoded features that don’t satisfy

  • the – constraint and/or have zero error

  • verbose – for verbose debug output

Returns

‘OperationNode’ containing

systemds.operator.algorithm.smote(X: systemds.operator.nodes.matrix.Matrix, mask: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • s – Amount of SMOTE (percentage of oversampling), integral multiple of 100

  • k – Number of nearest neighbour

  • verbose – if the algorithm should be verbose

Returns

‘OperationNode’ containing

systemds.operator.algorithm.split(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • f – Train set fraction [0,1]

  • cont – contiuous splits, otherwise sampled

  • seed – The seed to reandomly select rows in sampled mode

Returns

‘OperationNode’ containing

systemds.operator.algorithm.splitBalanced(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • f – Train set fraction [0,1]

  • verbose – print available

Returns

‘OperationNode’ containing

systemds.operator.algorithm.stableMarriage(P: systemds.operator.nodes.matrix.Matrix, A: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • It – a square matrix with no zeros.

  • It – a square matrix with no zeros.

  • ordered – If true, P and A are assumed to be ordered,

  • index – vice-versa (higher is better).

Returns

‘OperationNode’ containing to the match. & 1 (2.0 preference value) and acceptor 2 (1.0 preference value). & 3 (2.0 preference value) and proposer 2 & matched with proposer 3 (since [1,3] is non-zero) at a & 3.0. & matched with proposer 2 (since [2,2] is non-zero) at a & 3.0. & matched with proposer 1 (since [3,1] is non-zero) at a & 1.0.

systemds.operator.algorithm.statsNA(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • bins – Split number for bin stats. Number of bins the time series gets

  • missing – printed.

  • verbose – Print detailed information.

Returns

‘OperationNode’ containing

systemds.operator.algorithm.steplm(X: systemds.operator.nodes.matrix.Matrix, y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • icpt – Intercept presence, shifting and rescaling the columns of X:

  • reg – learning rate

  • tol – Tolerance threashold to train until achieved

  • maxi – maximum iterations 0 means until tolerange is reached

  • verbose – If the algorithm should be verbose

Returns

‘OperationNode’ containing

systemds.operator.algorithm.tSNE(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
  • reduced_dims – Output dimensionality

  • perplexity – Perplexity Parameter

  • lr – Learning rate

  • momentum – Momentum Parameter

  • max_iter – Number of iterations

  • seed – The seed used for initial values.

  • If – -1 random seeds are selected.

  • is_verbose – Print debug information

Returns

‘OperationNode’ containing

systemds.operator.algorithm.toOneHot(X: systemds.operator.nodes.matrix.Matrix, numClasses: int)
Parameters

numclasses – Number of columns, must be be greater than or equal to largest value in X

Returns

‘OperationNode’ containing

systemds.operator.algorithm.underSampling(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, ratio: float)
Parameters

ratio – The ratio to sample

Returns

‘OperationNode’ containing

systemds.operator.algorithm.univar(X: systemds.operator.nodes.matrix.Matrix, types: systemds.operator.nodes.matrix.Matrix)
Parameters

1 – 2 for nominal, 3 for ordinal

Returns

‘OperationNode’ containing

systemds.operator.algorithm.winsorize(X: systemds.operator.nodes.matrix.Matrix, verbose: bool, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters

verbose – To print output on screen

Returns

‘OperationNode’ containing