# Algorithms¶

SystemDS support different Machine learning algorithms out of the box.

As an example the lm algorithm can be used as follows:

```# Import numpy and SystemDS
import numpy as np
from systemds.context import SystemDSContext
from systemds.operator.algorithm import lm

# Set a seed
np.random.seed(0)
# Generate matrix of feature vectors
features = np.random.rand(10, 15)
# Generate a 1-column matrix of response values
y = np.random.rand(10, 1)

# compute the weights
with SystemDSContext() as sds:
weights = lm(sds.from_numpy(features), sds.from_numpy(y)).compute()
print(weights)
```

The output should be similar to

```[[-0.11538199]
[-0.20386541]
[-0.39956035]
[ 1.04078623]
[ 0.4327084 ]
[ 0.18954599]
[ 0.49858968]
[-0.26812763]
[ 0.09961844]
[-0.57000751]
[-0.43386048]
[ 0.55358873]
[-0.54638565]
[ 0.2205885 ]
[ 0.37957689]]
```
`systemds.operator.algorithm.``abstain`(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, threshold: float, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• threshold

• verbose – flag specifying if logging information should be printed

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``als`(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• rank – Rank of the factorization

• reg – Regularization:

• lambda – Regularization parameter, no regularization if 0.0

• maxi – Maximum number of iterations

• check – Check for convergence after every iteration, i.e., updating U and V once

• thr – Assuming check is set to TRUE, the algorithm stops and convergence is declared

• if – in loss in any two consecutive iterations falls below this threshold;

• if – FALSE thr is ignored

Returns

‘OperationNode’ containing m x r matrix where r is the factorization rank & m x r matrix where r is the factorization rank

`systemds.operator.algorithm.``alsCG`(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• rank – Rank of the factorization

• reg – Regularization:

• lambda – Regularization parameter, no regularization if 0.0

• maxi – Maximum number of iterations

• check – Check for convergence after every iteration, i.e., updating U and V once

• thr – Assuming check is set to TRUE, the algorithm stops and convergence is declared

• if – in loss in any two consecutive iterations falls below this threshold;

• if – FALSE thr is ignored

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``alsDS`(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• rank – Rank of the factorization

• lambda – Regularization parameter, no regularization if 0.0

• maxi – Maximum number of iterations

• check – Check for convergence after every iteration, i.e., updating L and R once

• thr – Assuming check is set to TRUE, the algorithm stops and convergence is declared

• if – in loss in any two consecutive iterations falls below this threshold;

• if – FALSE thr is ignored

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``alsTopkPredict`(userIDs: systemds.operator.nodes.matrix.Matrix, I: systemds.operator.nodes.matrix.Matrix, L: systemds.operator.nodes.matrix.Matrix, R: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters

K – The number of top-K items

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``arima`(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• p – non-seasonal AR order

• d – non-seasonal differencing order

• q – non-seasonal MA order

• P – seasonal AR order

• D – seasonal differencing order

• Q – seasonal MA order

• s – period in terms of number of time-steps

• include_mean – center to mean 0, and include in result

• solver – solver, is either “cg” or “jacobi”

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``bivar`(X: systemds.operator.nodes.matrix.Matrix, S1: systemds.operator.nodes.matrix.Matrix, S2: systemds.operator.nodes.matrix.Matrix, T1: systemds.operator.nodes.matrix.Matrix, T2: systemds.operator.nodes.matrix.Matrix, verbose: bool)
Parameters

verbose – Print bivar stats

Returns

‘OperationNode’ containing as output with bivar stats & as output with bivar stats & as output with bivar stats & as output with bivar stats

`systemds.operator.algorithm.``components`(G: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• X – Location to read the matrix of feature vectors

• Y – Location to read the matrix with category labels

• icpt – Intercept presence, shifting and rescaling X columns: 0 = no intercept,

• tol – tolerance (“epsilon”)

• reg – regularization parameter (lambda = 1/C); intercept is not regularized

• maxi – max. number of outer (Newton) iterations

• maxii – max. number of inner (conjugate gradient) iterations, 0 = no max

• verbose – flag specifying if logging information should be printed

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``confusionMatrix`(P: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix)
Parameters

encoded – actual labels

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``cox`(X: systemds.operator.nodes.matrix.Matrix, TE: systemds.operator.nodes.matrix.Matrix, F: systemds.operator.nodes.matrix.Matrix, R: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• X – Location to read the input matrix X containing the survival data

• containing – information

• TE – Column indices of X as a column vector which contain timestamp

• F – Column indices of X as a column vector which are to be used for

• fitting – model

• R – If factors (categorical variables) are available in the input matrix

• the – X

• each – needs to be removed from X; in this case the start

• and – corresponding to the baseline level need to be the same;

• if – not provided by default all variables are considered to be continuous

• alpha – Parameter to compute a 100*(1-alpha)% confidence interval for the betas

• tol – Tolerance (“epsilon”)

• moi – Max. number of outer (Newton) iterations

• mii – Max. number of inner (conjugate gradient) iterations, 0 = no max

Returns

‘OperationNode’ containing matrix rt that contains the order-preserving recoded timestamps from x & which is matrix x with sorted timestamps & matrix mf that contains the column indices of x with the baseline factors removed (if available)

`systemds.operator.algorithm.``cspline`(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, inp_x: float, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• monotonically – there is no duplicates points in X

• inp_x – the given input x, for which the cspline will find predicted y

• mode – Specifies the method for cspline (DS - Direct Solve, CG - Conjugate Gradient)

• tol – Tolerance (epsilon); conjugate graduent procedure terminates early if

• L2 – the beta-residual is less than tolerance * its initial norm

• maxi – Maximum number of conjugate gradient iterations, 0 = no maximum

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``csplineCG`(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, inp_x: float, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• monotonically – there is no duplicates points in X

• inp_x – the given input x, for which the cspline will find predicted y.

• tol – Tolerance (epsilon); conjugate graduent procedure terminates early if

• L2 – the beta-residual is less than tolerance * its initial norm

• maxi – Maximum number of conjugate gradient iterations, 0 = no maximum

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``csplineDS`(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, inp_x: float)
Parameters
• monotonically – there is no duplicates points in X

• inp_x – the given input x, for which the cspline will find predicted y.

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``cvlm`(X: systemds.operator.nodes.matrix.Matrix, y: systemds.operator.nodes.matrix.Matrix, k: int, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• k – Number of subsets needed, It should always be more than 1 and less than nrow(X)

• icpt – Intercept presence, shifting and rescaling the columns of X

• reg – Regularization constant (lambda) for L2-regularization. set to nonzero for

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``dbscan`(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• eps – Maximum distance between two points for one to be considered reachable for the other.

• minPts – Number of points in a neighborhood for a point to be considered as a core point

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``dbscanApply`(X: systemds.operator.nodes.matrix.Matrix, clusterModel: systemds.operator.nodes.matrix.Matrix, eps: float)
Parameters

eps – Maximum distance between two points for one to be considered reachable for the other.

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``decisionTree`(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, R: systemds.operator.nodes.matrix.Matrix, verbose: bool, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• a – vector, other positive Integers indicate the number of categories

• If – not provided by default all variables are assumed to be scale

• bins – Number of equiheight bins per scale feature to choose thresholds

• depth – Maximum depth of the learned tree

• verbose – boolean specifying if the algorithm should print information while executing

Returns

‘OperationNode’ containing information: & if the feature is categorical) & looks at if j is an internal node, otherwise 0 & as r input vector & of the subset of values & 6,7,… if j is categorical & a leaf node: number of misclassified samples reaching at node j & at m[6,j] if the feature chosen for j is scale, & feature chosen for j is categorical rows 6,7,… depict the value subset chosen for j & a leaf node 1 if j is impure and the number of samples at j > threshold, otherwise 0

`systemds.operator.algorithm.``decisionTreePredict`(M: systemds.operator.nodes.matrix.Matrix, X: systemds.operator.nodes.matrix.Matrix, strategy: str)
Parameters
• to – in the learned tree and each row contains the following information:

• categorical – if the feature is categorical)

• that – looks at if j is an internal node, otherwise 0

• the – as R input vector

• otherwise – of the subset of values

• stored – 6,7,… if j is categorical

• If – a leaf node: number of misclassified samples reaching at node j

• to – at M[6,j] if the feature chosen for j is scale,

• otherwise – feature chosen for j is categorical rows 6,7,… depict the value subset chosen for j

• If – a leaf node 1 if j is impure and the number of samples at j > threshold, otherwise 0

• strategy – strategy, can be one of [“GEMM”, “TT”, “PTT”], referring to “Generic matrix multiplication”,

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``deepWalk`(Graph: systemds.operator.nodes.matrix.Matrix, w: int, d: int, gamma: int, t: int, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• w – window size

• d – embedding size

• gamma – walks per vertex

• t – walk length

• alpha – learning rate

• beta – factor for decreasing learning rate

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``discoverFD`(X: systemds.operator.nodes.matrix.Matrix, Mask: systemds.operator.nodes.matrix.Matrix, threshold: float)
Parameters

will – second column from processing

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``executePipeline`(X: systemds.operator.nodes.matrix.Matrix)
Parameters
• flagsCount

• test

Returns

‘OperationNode’ containing validation check & convert the matrix row-vector into list & flag & append flag & append flag & append flag & of hyper-parameters and loop till that & flag & and remove categorical & and remove numerics & + 1 for nan replacement & matrix & matrix & ohe call, to call inside eval as a function & encoding of categorical features & features & ohe call, to call inside eval as a function & to call inside eval as a function & doing relative over-sampling & count & replace the null with default values & replace the null with default values & flip the noisy labels & best option

`systemds.operator.algorithm.``ffTrain`(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, out_activation: str, loss_fcn: str, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• batch_size – Batch size

• epochs – Number of epochs

• learning_rate – Learning rate

• out_activation – User specified ouptut activation function. Possible values:

• loss_fcn – User specified loss function. Possible values:

• shuffle – Flag which indicates if dataset should be shuffled or not

• validation_split – Fraction of training set used as validation set

• seed – Seed for model initialization

• verbose – Flag which indicates if function should print to stdout

Returns

‘OperationNode’ containing by the model & by the model

`systemds.operator.algorithm.``garch`(X: systemds.operator.nodes.matrix.Matrix, kmax: int, momentum: float, start_stepsize: float, end_stepsize: float, start_vicinity: float, end_vicinity: float, sim_seed: int, verbose: bool)
Parameters
• kmax – Number of iterations

• momentum – Momentum for momentum-gradient descent (set to 0 to deactivate)

• start_stepsize – Initial gradient-descent stepsize

• end_stepsize – gradient-descent stepsize at end (linear descent)

• start_vicinity – proportion of randomness of restart-location for gradient descent at beginning

• end_vicinity – same at end (linear decay)

• sim_seed – seed for simulation of process on fitted coefficients

• verbose – verbosity, comments during fitting

Returns

‘OperationNode’ containing term of fitted process & arch-coefficient of fitted process & garch-coefficient of fitted process & drawbacks: slow convergence of optimization (sort of simulated annealing/gradient descent)

`systemds.operator.algorithm.``gaussianClassifier`(D: systemds.operator.nodes.matrix.Matrix, C: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• varSmoothing – Smoothing factor for variances

• verbose – Print accuracy of the training set

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``getAccuracy`(y: systemds.operator.nodes.matrix.Matrix, yhat: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters

isWeighted – Flag for weighted or non-weighted accuracy calculation

Returns

‘OperationNode’ containing of the predicted labels

`systemds.operator.algorithm.``glm`(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• dfam – Distribution family code: 1 = Power, 2 = Binomial

• vpow – Power for Variance defined as (mean)^power (ignored if dfam != 1):

• link – Link function code: 0 = canonical (depends on distribution),

• lpow – Power for Link function defined as (mean)^power (ignored if link != 1):

• yneg – Response value for Bernoulli “No” label, usually 0.0 or -1.0

• icpt – Intercept presence, X columns shifting and rescaling:

• reg – Regularization parameter (lambda) for L2 regularization

• tol – Tolerance (epsilon)

• disp – (Over-)dispersion value, or 0.0 to estimate it from data

• moi – Maximum number of outer (Newton / Fisher Scoring) iterations

• mii – Maximum number of inner (Conjugate Gradient) iterations, 0 = no maximum

Returns

‘OperationNode’ containing line, as follows: & integer indicating success/failure as follows: & value (regression coefficient), excluding the intercept & for the smallest beta value & value (regression coefficient), excluding the intercept & for the largest beta value & or nan if there is no intercept (if icpt=0) & to scale deviance, provided as “disp” input parameter & from the dataset & the saturated model, assuming dispersion == 1.0 & the saturated model, scaled by the dispersion value & when requested, contains the following per-iteration variables in csv format, & triple (name, iteration, value) with iteration = 0 for initial values: & inner (conj.gradient) iterations in this outer iteration & function we minimize (i.e. negative partial log-likelihood) & the objective during this iteration, actual value & the objective predicted by a quadratic approximation & value of x %*% beta, used to check for overflows & value of x %*% beta, used to check for overflows & region size, the “delta” & supported glm distribution families & lpow distribution.link nical?

`systemds.operator.algorithm.``gmm`(X: systemds.operator.nodes.matrix.Matrix, verbose: bool, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• n_components – Number of n_components in the Gaussian mixture model

• model – “VVV”: unequal variance (full),each component has its own general covariance matrix

• init_param – initialize weights with “kmeans” or “random”

• iterations – Number of iterations

• reg_covar – regularization parameter for covariance matrix

• tol – tolerance value for convergence

Returns

‘OperationNode’ containing of estimated parameters & information criterion for best iteration & kth class

`systemds.operator.algorithm.``gmmPredict`(X: systemds.operator.nodes.matrix.Matrix, weight: systemds.operator.nodes.matrix.Matrix, mu: systemds.operator.nodes.matrix.Matrix, precisions_cholesky: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters

model – fitted model

Returns

‘OperationNode’ containing cluster labels & of belongingness & for new instances given the variance and mean of fitted data

`systemds.operator.algorithm.``gnmf`(X: systemds.operator.nodes.matrix.Matrix, rnk: int, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• rnk – Number of components into which matrix X is to be factored

• eps – Tolerance

• maxi – Maximum number of conjugate gradient iterations

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``hospitalResidencyMatch`(R: systemds.operator.nodes.matrix.Matrix, H: systemds.operator.nodes.matrix.Matrix, capacity: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• It – an ORDERED matrix.

• It – an UNORDRED matrix.

• It – a [n*1] matrix with non zero values.

• with – and vice-versa (higher is better).

• verbose – If the operation is verbose

Returns

‘OperationNode’ containing an ordered matrix, this means that resident 1 (row 1) likes hospital 2 the most, followed by hospital 1 and hospital 3. & unordered, this would mean that resident 1 (row 1) likes hospital 3 the most (since the value at [1,3] is the row max), & 1 (2.0 preference value) and hospital 2 (1.0 preference value). & an unordered matrix this means that hospital 1 (row 1) likes resident 1 the most (since the value at [1,1] is the row max). & matched with hospital 3 (since [1,3] is non-zero) at a preference level of 2.0. & matched with hospital 1 (since [2,1] is non-zero) at a preference level of 1.0. & matched with hospital 2 (since [3,2] is non-zero) at a preference level of 2.0.

`systemds.operator.algorithm.``hyperband`(X_train: systemds.operator.nodes.matrix.Matrix, y_train: systemds.operator.nodes.matrix.Matrix, X_val: systemds.operator.nodes.matrix.Matrix, y_val: systemds.operator.nodes.matrix.Matrix, params: Iterable, paramRanges: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• One – hyper parameter, first column specifies min, second column max value.

• verbose – If TRUE print messages are activated

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``img_brightness`(img_in: systemds.operator.nodes.matrix.Matrix, value: float, channel_max: int)
Parameters
• value – The amount of brightness to be changed for the image

• channel_max – Maximum value of the brightness of the image

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``img_crop`(img_in: systemds.operator.nodes.matrix.Matrix, w: int, h: int, x_offset: int, y_offset: int)
Parameters
• w – The width of the subregion required

• h – The height of the subregion required

• x_offset – The horizontal coordinate in the image to begin the crop operation

• y_offset – The vertical coordinate in the image to begin the crop operation

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``img_cutout`(img_in: systemds.operator.nodes.matrix.Matrix, x: int, y: int, width: int, height: int, fill_value: float)
Parameters
• x – Column index of the top left corner of the rectangle (starting at 1)

• y – Row index of the top left corner of the rectangle (starting at 1)

• width – Width of the rectangle (must be positive)

• height – Height of the rectangle (must be positive)

• fill_value – The value to set for the rectangle

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``img_invert`(img_in: systemds.operator.nodes.matrix.Matrix, max_value: float)
Parameters

max_value – The maximum value pixels can have

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``img_mirror`(img_in: systemds.operator.nodes.matrix.Matrix, horizontal_axis: bool)
Parameters

max_value – The maximum value pixels can have

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``img_posterize`(img_in: systemds.operator.nodes.matrix.Matrix, bits: int)
Parameters
• bits – The number of bits keep for the values.

• 1 – and white, 8 means every integer between 0 and 255.

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``img_rotate`(img_in: systemds.operator.nodes.matrix.Matrix, radians: float, fill_value: float)
Parameters

• fill_value – The background color revealed by the rotation

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``img_sample_pairing`(img_in1: systemds.operator.nodes.matrix.Matrix, img_in2: systemds.operator.nodes.matrix.Matrix, weight: float)
Parameters
• weight – The weight given to the second image.

• 0 – img_in1, 1 means only img_in2 will be visible

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``img_shear`(img_in: systemds.operator.nodes.matrix.Matrix, shear_x: float, shear_y: float, fill_value: float)
Parameters
• shear_x – Shearing factor for horizontal shearing

• shear_y – Shearing factor for vertical shearing

• fill_value – The background color revealed by the shearing

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``img_transform`(img_in: systemds.operator.nodes.matrix.Matrix, out_w: int, out_h: int, a: float, b: float, c: float, d: float, e: float, f: float, fill_value: float)
Parameters
• out_w – Width of the output image

• out_h – Height of the output image

• fill_value – The background of the image

Returns

‘OperationNode’ containing image as 2d matrix with top left corner at [1, 1]

`systemds.operator.algorithm.``img_translate`(img_in: systemds.operator.nodes.matrix.Matrix, offset_x: float, offset_y: float, out_w: int, out_h: int, fill_value: float)
Parameters
• offset_x – The distance to move the image in x direction

• offset_y – The distance to move the image in y direction

• out_w – Width of the output image

• out_h – Height of the output image

• fill_value – The background of the image

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``imputeByFD`(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, threshold: float, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• threshold – threshold value in interval [0, 1] for robust FDs

• verbose – flag for printing verbose debug output

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``imputeByFDApply`(X: systemds.operator.nodes.matrix.Matrix, Y_imp: systemds.operator.nodes.matrix.Matrix)
Parameters
• source – source attribute to use for imputation and error correction

• target – attribute to be fixed

• threshold – threshold value in interval [0, 1] for robust FDs

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``km`(X: systemds.operator.nodes.matrix.Matrix, TE: systemds.operator.nodes.matrix.Matrix, GI: systemds.operator.nodes.matrix.Matrix, SI: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• number – (categorical features) for grouping and/or stratifying

• alpha – Parameter to compute 100*(1-alpha)% confidence intervals for the survivor

• function – median

• err_type – Parameter to specify the error type according to “greenwood” (the default) or “peto”

• conf_type – Parameter to modify the confidence interval; “plain” keeps the lower and

• upper – the confidence interval unmodified, “log” (the default)

• corresponds – transformation and “log-log” corresponds to the

• test_type – If survival data for multiple groups is available specifies which test to

• perform – survival data across multiple groups: “none” (the default)

Returns

‘OperationNode’ containing 7 consecutive columns in km corresponds to a unique & and strata in the data with the following schema & number of factors used for stratifying, i.e., ncol(si)) & of groups and strata is equal to 1, m will have 4 columns with & 4 matrix t and an g x 5 matrix t_groups_oe with

`systemds.operator.algorithm.``kmeans`(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• k – Number of centroids

• runs – Number of runs (with different initial centroids)

• max_iter – Maximum number of iterations per run

• eps – Tolerance (epsilon) for WCSS change ratio

• is_verbose – do not print per-iteration stats

• avg_sample_size_per_centroid – Average number of records per centroid in data samples

• seed – The seed used for initial sampling. If set to -1

• random – selected.

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``l2svm`(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• intercept – No Intercept ( If set to TRUE then a constant bias column is added to X)

• epsilon – Procedure terminates early if the reduction in objective function value is less

• lambda – Regularization parameter (lambda) for L2 regularization

• maxIterations – Maximum number of conjugate gradient iterations

• maxii

• verbose – Set to true if one wants print statements updating on loss.

• columnId – The column Id used if one wants to add a ID to the print statement, Specificly

• usefull – is used in MSVM.

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``l2svmPredict`(X: systemds.operator.nodes.matrix.Matrix, W: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters

verbose – Set to true if one wants print statements.

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``lasso`(X: systemds.operator.nodes.matrix.Matrix, y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• tol – target convergence tolerance

• M – history length

• tau – regularization component

• maxi – maximum number of iterations until convergence

• verbose – if the builtin should be verbose

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``lenetTrain`(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, X_val: systemds.operator.nodes.matrix.Matrix, Y_val: systemds.operator.nodes.matrix.Matrix, C: int, Hin: int, Win: int, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• C – Number of input channels (dimensionality of input depth)

• Hin – Input width

• Win – Input height

• batch_size – Batch size

• epochs – Number of epochs

• lr – Learning rate

• mu – Momentum value

• decay – Learning rate decay

• lambda – Regularization strength

• seed – Seed for model initialization

• verbose – Flag indicates if function should print to stdout

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``lm`(X: systemds.operator.nodes.matrix.Matrix, y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• icpt – Intercept presence, shifting and rescaling the columns of X

• reg – Regularization constant (lambda) for L2-regularization. set to nonzero

• tol – Tolerance (epsilon); conjugate gradient procedure terminates early if L2

• norm – beta-residual is less than tolerance * its initial norm

• maxi – Maximum number of conjugate gradient iterations. 0 = no maximum

• verbose – If TRUE print messages are activated

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``lmCG`(X: systemds.operator.nodes.matrix.Matrix, y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• icpt – Intercept presence, shifting and rescaling the columns of X

• reg – Regularization constant (lambda) for L2-regularization. set to nonzero

• tol – Tolerance (epsilon); conjugate gradient procedure terminates early if L2

• norm – beta-residual is less than tolerance * its initial norm

• maxi – Maximum number of conjugate gradient iterations. 0 = no maximum

• verbose – If TRUE print messages are activated

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``lmDS`(X: systemds.operator.nodes.matrix.Matrix, y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• icpt – Intercept presence, shifting and rescaling the columns of X

• reg – Regularization constant (lambda) for L2-regularization. set to nonzero

• tol – Tolerance (epsilon); conjugate gradient procedure terminates early if L2

• norm – beta-residual is less than tolerance * its initial norm

• maxi – Maximum number of conjugate gradient iterations. 0 = no maximum

• verbose – If TRUE print messages are activated

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``logSumExp`(M: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• margin – if the logsumexp of rows is required set margin = “row”

• if – of columns is required set margin = “col”

• if – “none” then a single scalar is returned computing logsumexp of matrix

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``matrixProfile`(ts: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• window_size – Sliding window size

• sample_percent – Degree of approximation

• between – one (1

• computes – solution)

• is_verbose – Print debug information

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``miceApply`(X: systemds.operator.nodes.matrix.Matrix, meta: systemds.operator.nodes.matrix.Matrix, threshold: float, dM: systemds.operator.nodes.frame.Frame, betaList: Iterable)
Parameters
• threshold – confidence value [0, 1] for robust imputation, values will only be imputed

• if – value has probability greater than threshold,

• only – categorical data

• verbose – Boolean value.

Returns

‘OperationNode’ containing are represented with empty string i.e “,,” in csv file & n are storing continuos/numeric data and variables with & storing categorical data

`systemds.operator.algorithm.``msvm`(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• intercept – No Intercept ( If set to TRUE then a constant bias column is added to X)

• num_classes – Number of classes

• epsilon – Procedure terminates early if the reduction in objective function

• value – than epsilon (tolerance) times the initial objective function value.

• lambda – Regularization parameter (lambda) for L2 regularization

• maxIterations – Maximum number of conjugate gradient iterations

• verbose – Set to true to print while training.

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``multiLogReg`(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• icpt – Intercept presence, shifting and rescaling X columns: 0 = no intercept,

• tol – tolerance (“epsilon”)

• reg – regularization parameter (lambda = 1/C); intercept is not regularized

• maxi – max. number of outer (Newton) iterations

• maxii – max. number of inner (conjugate gradient) iterations, 0 = no max

• verbose – flag specifying if logging information should be printed

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``multiLogRegPredict`(X: systemds.operator.nodes.matrix.Matrix, B: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters

verbose – flag specifying if logging information should be printed

Returns

‘OperationNode’ containing value of accuracy

`systemds.operator.algorithm.``na_locf`(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• option – String “locf” (last observation moved forward) to do forward fill

• verbose – to print output on screen

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``naiveBayes`(D: systemds.operator.nodes.matrix.Matrix, C: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• Laplace – Any Double value.

• Verbose – Boolean value.

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``outlier`(X: systemds.operator.nodes.matrix.Matrix, opposite: bool)
Parameters

opposite – (1)TRUE for evaluating outlier from upper quartile range,

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``outlierByArima`(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• repairMethod – values: 0 = delete rows having outliers, 1 = replace outliers as zeros

• p – non-seasonal AR order

• d – non-seasonal differencing order

• q – non-seasonal MA order

• P – seasonal AR order

• D – seasonal differencing order

• Q – seasonal MA order

• s – period in terms of number of time-steps

• solver – solver, is either “cg” or “jacobi”

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``outlierByIQR`(X: systemds.operator.nodes.matrix.Matrix, k: float, max_iterations: int, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• k – a constant used to discern outliers k*IQR

• isIterative – iterative repair or single repair

• repairMethod – values: 0 = delete rows having outliers,

• max_iterations – values: 0 = arbitrary number of iteraition until all outliers are removed,

• verbose – flag specifying if logging information should be printed

Returns

‘OperationNode’ containing meaning & matrix x with no outliers

`systemds.operator.algorithm.``outlierByIQRApply`(X: systemds.operator.nodes.matrix.Matrix, Q1: systemds.operator.nodes.matrix.Matrix, Q3: systemds.operator.nodes.matrix.Matrix, IQR: systemds.operator.nodes.matrix.Matrix, k: float, repairMethod: int)
Parameters
• k – a constant used to discern outliers k*IQR

• repairMethod – values: 0 = delete rows having outliers,

Returns

‘OperationNode’ containing meaning & matrix x with no outliers

`systemds.operator.algorithm.``outlierBySd`(X: systemds.operator.nodes.matrix.Matrix, max_iterations: int, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• k – threshold values 1, 2, 3 for 68%, 95%, 99.7% respectively (3-sigma rule)

• repairMethod – values: 0 = delete rows having outliers, 1 = replace outliers as zeros

• max_iterations – values: 0 = arbitrary number of iteration until all outliers are removed,

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``outlierBySdApply`(X: systemds.operator.nodes.matrix.Matrix, colMean: systemds.operator.nodes.matrix.Matrix, colSD: systemds.operator.nodes.matrix.Matrix, k: float, repairMethod: int)
Parameters
• k – a constant used to discern outliers k*IQR

• isIterative – iterative repair or single repair

• repairMethod – values: 0 = delete rows having outliers,

• max_iterations – values: 0 = arbitrary number of iteraition until all outliers are removed,

• verbose – flag specifying if logging information should be printed

Returns

‘OperationNode’ containing meaning & matrix x with no outliers

`systemds.operator.algorithm.``pca`(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• K – Number of reduced dimensions (i.e., columns)

• Center – Indicates whether or not to center the feature matrix

• Scale – Indicates whether or not to scale the feature matrix

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``pnmf`(X: systemds.operator.nodes.matrix.Matrix, rnk: int, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• rnk – Number of components into which matrix X is to be factored.

• maxi – Maximum number of conjugate gradient iterations.

• verbose – If TRUE, ‘iter’ and ‘obj’ are printed.

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``ppca`(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• k – indicates dimension of the new vector space constructed from eigen vectors

• maxi – maximum number of iterations until convergence

• tolobj – objective function tolerance value to stop ppca algorithm

• tolrecerr – reconstruction error tolerance value to stop the algorithm

• verbose – verbose debug output

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``randomForest`(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, R: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• If – not provided by default all variables are assumed to be scale

• bins – Number of equiheight bins per scale feature to choose thresholds

• depth – Maximum depth of the learned tree

• num_leaf – Number of samples when splitting stops and a leaf node is added

• num_samples – Number of samples at which point we switch to in-memory subtree building

• num_trees – Number of trees to be learned in the random forest model

• subsamp_rate – Parameter controlling the size of each tree in the forest; samples are selected from a

• Poisson – parameter subsamp_rate (the default value is 1.0)

• feature_subset – Parameter that controls the number of feature used as candidates for splitting at each tree node

• as – of number of features in the dataset;

• by – root of features (i.e., feature_subset = 0.5) are used at each tree node

• impurity – Impurity measure: entropy or Gini (the default)

Returns

‘OperationNode’ containing tree and each row contains the following information: & that leaf node j is supposed to predict & subset of values & 7,8,… if j is categorical & stored at m[7,j] if the feature chosen for j is scale; & chosen for j is categorical rows 7,8,… depict the value subset chosen for j

`systemds.operator.algorithm.``scale`(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• center – Indicates whether or not to center the feature matrix

• scale – Indicates whether or not to scale the feature matrix

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``shortestPath`(G: systemds.operator.nodes.matrix.Matrix, sourceNode: int, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• The – G can be 0/1 (just specifying whether the nodes

• are – not) or integer values (representing the weight

• of – or the distances between nodes, 0 if not connected).

• maxi – Integer max number of iterations accepted (0 for FALSE, i.e.

• max – iterations not defined)

• sourceNode – index to calculate the shortest paths to all other nodes.

• verbose – flag for verbose debug output

Returns

‘OperationNode’ containing minimum distance shortest-path from vertex i to vertex j. & of the minimum distance is infinity, the two nodes are

`systemds.operator.algorithm.``sigmoid`(X: systemds.operator.nodes.matrix.Matrix)
Returns

‘OperationNode’ containing meaning

`systemds.operator.algorithm.``slicefinder`(X: systemds.operator.nodes.matrix.Matrix, e: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• k – Number of subsets required

• maxL – level L (conjunctions of L predicates), 0 unlimited

• minSup – support (min number of rows per slice)

• alpha – [0,1]: 0 only size, 1 only error

• tpEval – for task-parallel slice evaluation,

• tpBlksz – size for task-parallel execution (num slices)

• selFeat – for removing one-hot-encoded features that don’t satisfy

• the – constraint and/or have zero error

• verbose – for verbose debug output

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``smote`(X: systemds.operator.nodes.matrix.Matrix, mask: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• s – Amount of SMOTE (percentage of oversampling), integral multiple of 100

• k – Number of nearest neighbour

• verbose – if the algorithm should be verbose

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``split`(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• f – Train set fraction [0,1]

• cont – contiuous splits, otherwise sampled

• seed – The seed to reandomly select rows in sampled mode

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``splitBalanced`(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• f – Train set fraction [0,1]

• verbose – print available

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``stableMarriage`(P: systemds.operator.nodes.matrix.Matrix, A: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• It – a square matrix with no zeros.

• It – a square matrix with no zeros.

• ordered – If true, P and A are assumed to be ordered,

• index – vice-versa (higher is better).

Returns

‘OperationNode’ containing to the match. & 1 (2.0 preference value) and acceptor 2 (1.0 preference value). & 3 (2.0 preference value) and proposer 2 & matched with proposer 3 (since [1,3] is non-zero) at a & 3.0. & matched with proposer 2 (since [2,2] is non-zero) at a & 3.0. & matched with proposer 1 (since [3,1] is non-zero) at a & 1.0.

`systemds.operator.algorithm.``statsNA`(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• bins – Split number for bin stats. Number of bins the time series gets

• missing – printed.

• verbose – Print detailed information.

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``steplm`(X: systemds.operator.nodes.matrix.Matrix, y: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• icpt – Intercept presence, shifting and rescaling the columns of X:

• reg – learning rate

• tol – Tolerance threashold to train until achieved

• maxi – maximum iterations 0 means until tolerange is reached

• verbose – If the algorithm should be verbose

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``tSNE`(X: systemds.operator.nodes.matrix.Matrix, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters
• reduced_dims – Output dimensionality

• perplexity – Perplexity Parameter

• lr – Learning rate

• momentum – Momentum Parameter

• max_iter – Number of iterations

• seed – The seed used for initial values.

• If – -1 random seeds are selected.

• is_verbose – Print debug information

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``toOneHot`(X: systemds.operator.nodes.matrix.Matrix, numClasses: int)
Parameters

numclasses – Number of columns, must be be greater than or equal to largest value in X

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``underSampling`(X: systemds.operator.nodes.matrix.Matrix, Y: systemds.operator.nodes.matrix.Matrix, ratio: float)
Parameters

ratio – The ratio to sample

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``univar`(X: systemds.operator.nodes.matrix.Matrix, types: systemds.operator.nodes.matrix.Matrix)
Parameters

1 – 2 for nominal, 3 for ordinal

Returns

‘OperationNode’ containing

`systemds.operator.algorithm.``winsorize`(X: systemds.operator.nodes.matrix.Matrix, verbose: bool, **kwargs: Dict[str, Union[DAGNode, str, int, float, bool]])
Parameters

verbose – To print output on screen

Returns

‘OperationNode’ containing