Running SystemDS

This guide explains how to run SystemDS regardless of whether you installed it from a Release or built it from Source. All execution modes -local, Spark, and federated- are covered in this document.



1. Prerequisites

This guide assumes that SystemDS has already been installed successfully.

Please make sure you have followed one of the installation guides:

In particular, ensure that:


2. Run a Simple Script Locally

This mode does not require Spark. It only needs Java 17.

2.1 Create and Run a Hello World

echo 'print("Hello, World!")' > hello.dml

Run:

systemds hello.dml

Expected output:

Hello, World!

(Optional) MacOS Note: realpath: illegal option -- - Error

If you are running MacOS and encounter an error message similar to realpath: illegal option -- - when executing systemds hello.dml. You may try to replace the system-wide command realpath with the homebrew version grealpath that comes with the coreutils. Alternatively, you may change all occurrences within the script accordingly, i.e., by prepending a g to avoid any side effects.

(Optional) Ubuntu Note: Invalid or corrupt jarfile hello.dml

On some Ubuntu setups (especially clean environments such as Docker images), running systemds -f hello.dml may result in an error like Invalid or corrupt jarfile hello.dml. If this happens, the SystemDS launcher may not automatically locate the correct JAR. To fix this, export SYSTEMDS_JAR_FILE to point to the JAR shipped with the release. Please refer to the Ubuntu troubleshooting section in the installation guide for a detailed workaround: Release Installation – Ubuntu Note

(Optional) Windows Note: systemds Command Not Found

On Windows (e.g., PowerShell), running systemds -f hello.dml may fail with an error indicating that systemds is not recognized as a command. This is expected, since the systemds launcher in bin/ is implemented as a shell script, which cannot be executed natively on Windows. In this case, SystemDS should be invoked directly via the runnable JAR using java -jar. For a detailed Windows-specific walkthrough, please refer to the installation guide: Release Installation – Windows Notes

2.2 Create a Real Example

This example demonstrates local execution of a real script Univar-stats.dml. The relevant commands to run this example with SystemDS is described in the DML Language reference guide at DML Language Reference.

Prepare the data (macOS: use curlinstead of wget):

# download test data
wget -P data/ http://archive.ics.uci.edu/ml/machine-learning-databases/haberman/haberman.data

# generate a metadata file for the dataset
echo '{"rows": 306, "cols": 4, "format": "csv"}' > data/haberman.data.mtd

# generate type description for the data
echo '1,1,1,2' > data/types.csv
echo '{"rows": 1, "cols": 4, "format": "csv"}' > data/types.csv.mtd

Execute the DML Script:

systemds "$SYSTEMDS_ROOT/scripts/algorithms/Univar-Stats.dml" -nvargs \
  X=data/haberman.data \
  TYPES=data/types.csv \
  STATS=data/univarOut.mtx \
  CONSOLE_OUTPUT=TRUE

(Optional) MacOS Note: SparkException Error

If SystemDS tries to initialize Spark and you see SparkException: A master URL must be set in your configuration, you can force single-node execution without Spark/Hadoop initialization via:

systemds -exec singlenode scripts/algorithms/Univar-Stats.dml -nvargs \
  X=data/haberman.data \
  TYPES=data/types.csv \
  STATS=data/univarOut.mtx \
  CONSOLE_OUTPUT=TRUE

2.3 Run the Real Example

The script computes basic statistics (min, max, variance, skewness, etc) for each column of a dataset. Expected output (example):

-------------------------------------------------
Feature [1]: Scale
 (01) Minimum             | 30.0
 (02) Maximum             | 83.0
 (03) Range               | 53.0
 (04) Mean                | 52.45751633986928
 (05) Variance            | 116.71458266366658
 (06) Std deviation       | 10.803452349303281
 (07) Std err of mean     | 0.6175922641866753
 (08) Coeff of variation  | 0.20594669940735139
 (09) Skewness            | 0.1450718616532357
 (10) Kurtosis            | -0.6150152487211726
 (11) Std err of skewness | 0.13934809593495995
 (12) Std err of kurtosis | 0.277810485320835
 (13) Median              | 52.0
 (14) Interquartile mean  | 52.16013071895425
-------------------------------------------------
Feature [2]: Scale
 (01) Minimum             | 58.0
 (02) Maximum             | 69.0
 (03) Range               | 11.0
 (04) Mean                | 62.85294117647059
 (05) Variance            | 10.558630665380907
 (06) Std deviation       | 3.2494046632238507
 (07) Std err of mean     | 0.18575610076612029
 (08) Coeff of variation  | 0.051698529971741194
 (09) Skewness            | 0.07798443581479181
 (10) Kurtosis            | -1.1324380182967442
 (11) Std err of skewness | 0.13934809593495995
 (12) Std err of kurtosis | 0.277810485320835
 (13) Median              | 63.0
 (14) Interquartile mean  | 62.80392156862745
-------------------------------------------------
Feature [3]: Scale
 (01) Minimum             | 0.0
 (02) Maximum             | 52.0
 (03) Range               | 52.0
 (04) Mean                | 4.026143790849673
 (05) Variance            | 51.691117539912135
 (06) Std deviation       | 7.189653506248555
 (07) Std err of mean     | 0.41100513466216837
 (08) Coeff of variation  | 1.7857418611299172
 (09) Skewness            | 2.954633471088322
 (10) Kurtosis            | 11.425776549251449
 (11) Std err of skewness | 0.13934809593495995
 (12) Std err of kurtosis | 0.277810485320835
 (13) Median              | 1.0
 (14) Interquartile mean  | 1.2483660130718954
-------------------------------------------------
Feature [4]: Categorical (Nominal)
 (15) Num of categories   | 2
 (16) Mode                | 1
 (17) Num of modes        | 1
SystemDS Statistics:
Total execution time:   0,470 sec.

To check the location of output file created:

ls -l data/univarOut.mtx

3. Run a Script on Spark

SystemDS can be executed on Spark using the main executable JAR. The location of this JAR differs depending on whether you installed SystemDS from:

3.1 Running with a Release Installation

If you installed SystemDS from a release archive, locate the runnable JAR in the release root directory. It is typically named like systemds-<version>.jar.

Example:

ls -1 "$SYSTEMDS_ROOT"/*.jar

Run:

spark-submit "$SYSTEMDS_ROOT/systemds-<version>.jar" -f hello.dml

3.2 Running with a Source-build Installation

If you cloned the SystemDS repository and built it yourself, you must first run Maven to generate the executable JAR.

mvn -P distribution package

This creates several JAR files in target/:

Example output:

target/systemds-3.3.0-shaded.jar
target/systemds-3.3.0.jar
target/systemds-3.3.0-unshaded.jar
target/systemds-3.3.0-extra.jar
target/SystemDS.jar            <-- main runnable JAR
target/systemds-3.3.0-ropt.jar
target/systemds-3.3.0-javadoc.jar

Run:

spark-submit target/SystemDS.jar -f hello.dml

4. Run a Script in Federated Mode

Federated mode allows SystemDS to execute operations on data located on remote or distributed workers. Federated execution requires:

  1. One or more federated workers
  2. A driver program (DML or Python) that sends operations to those workers.

Note: The SystemDS documentation provides federated execution examples primarily via the Python API. This Quickstart demonstrates only how to start a federated worker, and refers users to the official Federated Environment guide for complete end-to-end examples.

4.1 Start a Federated Worker

Run in a separate terminal:

systemds WORKER 8001

This starts a worker on port 8001.

4.2 Next Steps and Full Examples

For complete, runnable examples of federated execution (including data files, metadata, and Python code), see the official Federated Environment guide

Using Intel MKL Native Instructions

To use the MKL acceleration download and install the latest supported MKL library (<=2019.5) from 1, set the environment variables with the MKL-provided script . /opt/intel/bin/compilervars.sh intel64 (note the dot and the default install location) and set the option sysds.native.blas in SystemDS-config.xml to mkl.