There are special branches for running hadoop in docker.
The docker-hadoop-runner*
branches contain scripts that set up base images that can be used for running any Hadoop version.
The docker-hadoop*
branches can be used for running a specific version.
hadoop-3.3.6
hadoop-2.10.2
There is a setup under hadoop-dist
that contains Docker Compose definitions for running the current version of Hadoop in a multi-node docker environment.
This is meant for testing code changes locally and debugging.
The base image used by the Docker setup is built as part of the maven lifecycle. The distribution files generated while building the project with the -Pdist
profile enabled, will be used for running hadoop inside the containers.
In order to start the docker environment you need to do the following * Build the project, using the -Pdist
profile
> mvn clean install -Dmaven.javadoc.skip=true -DskipTests -DskipShade -Pdist,src
> cd hadoop-dist/target/hadoop-<current-version>/compose/hadoop
> docker-compose up -d --scale datanode=3
> docker exec -it hadoop_datanode_1 bash bash-4.2$ hdfs dfs -mkdir /test
To add or remove properties from the core-site.xml
, hdfs-site.xml
, etc. files used in the docker environment, simply edit the config
file before starting the containers. The changes will be persisted in the docker environment.