A container is an immutable object that can be seen as a lightweight virtual environment (compared to virtual machines) allowing good isolation of processes, codes or scripts. All the libraries required for the runtime are embedded within the container, which ensures a certain degree of portability, reproducibility and distribution, whatever the operating system. In particular, this makes it possible to “freeze” libraries or code over, enabling their use beyond their depreciation.
A container system allows you to run a container on a personal machine, on a local cluster in a laboratory or university, or in a mesocentre. It also allows you to run any code flexibly with the same user rights as your account.
The available container systems and their versions are:
Singularity is currently no longer updated. It will be deprecated in the near future in favour of Apptainer, which is an open-source project.
This documentation is limited to explaining the use of Apptainer (and containers) on computing machines, and does not cover image creation at all.
The container systems are available on the Dahu and Bigfoot clusters, and are compatible with the use of CiGri. This means you don’t need to install Apptainer with Nix or Guix. Apptainer commands are available on the compute nodes (but not on head nodes).
The use of container systems is not recommended on the Luke cluster.
You need to have an existing container image (.sif
file) and upload it to the machines. It will not be possible to create an image on the clusters. We won’t go into detail here about all the options offered by Apptainer, please refer to their documentation.
The default execution of a container is obtained with the run
command:
apptainer run apptainer_image.sif
This run
command executes the commands that were written in the %runscript section when the image was created. You should therefore take a look at the container definition file.
You should use the exec
command to execute the commands you want inside the container:
apptainer exec apptainer_image.sif command...
Apptainer mounts some folders in the container by default, such as $HOME
, the current directory, /tmp
, /proc
, /dev
, …
To isolate the container from the host machine, you can use the --no-mount
flag (to avoid mounting $HOME
) or --containall
to completely isolate the container from the host.
The recommended use of Apptainer is with the --containall
flag, followed by several --bind
flags to mount only the needed folders to launch the container :
apptainer exec \
--containall \
--bind path_folder1_host:path_folder1_container \
--bind path_folder2_host:path_folder2_container \
apptainer_image.sif command...
Docker is a popular solution for containerisation. The code or script you want to use may have been packaged in a Docker image. Apptainer aims for maximum compatibility with Docker, which means you can use Apptainer commands to launch a Docker image.
apptainer exec docker://docker_image
The recommendation for Docker images is to create an Apptainer image from the Docker image on your personal machine, then upload it to the clusters and use the Apptainer commands. This is done using the command apptainer pull docker://docker_image
. Take a look at the Apptainer documentation on the topic.
Some containerised code/scripts are massively parallel and rely on the use of OpenMPI, which is then embedded in the container. There is therefore “communication” between OpenMPI on the host machine (Dahu for example) and OpenMPI embedded in the Apptainer image. This method is called the hybrid model.
First of all, you need to install OpenMPI in your Nix profile or your Guix session. Next, you need to use Apptainer instances to homogenise the OpenMPI processes namespaces. To do this, we recommend using a submission script. In practice, you need to add these lines to the .sh
(or .oar
) file.
# if OpenMPI has been installed on a NIX profile, run
source /applis/site/nix.sh
# set the parameters for OpenMPI
export OMPI_MCA_btl_openib_allow_ib=true
export OMPI_MCA_pml=cm
export OMPI_MCA_mtl=psm2
# launch an apptainer instance on each compute node
mpirun -npernode 1 \
--machinefile $OAR_NODE_FILE \
--mca plm_rsh_agent "oarsh" \
--prefix $HOME/.nix-profile \
apptainer instance start \
apptainer_image.sif instance_name
# run the code/script with the MPI command
mpirun -np `cat $OAR_FILE_NODES|wc -l` \
--machinefile $OAR_NODE_FILE \
--mca plm_rsh_agent "oarsh" \
--prefix $HOME/.nix-profile \
apptainer exec instance://instance_name \
/bin/bash -c "command..."
# to launch instances of each node
mpirun -npernode 1 \
--machinefile $OAR_NODE_FILE \
--mca plm_rsh_agent "oarsh" \
--prefix $HOME/.nix-profile \
apptainer instance stop instance_name
If you are using a Guix session, replace the line source /applis/site/nix.sh
with source /applis/site/guix-start.sh
, and the lines --prefix $HOME/.nix-profile
with --prefix $HOME/.guix-profile
.
If you want to mount specific folders with the --bind
flag, this must be done when the instance is created.
mpirun -npernode 1 \
--machinefile $OAR_NODE_FILE \
--mca plm_rsh_agent "oarsh" \
--prefix $HOME/.nix-profile \
apptainer instance start \
--bind path_folder_host:path_folder_container \
apptainer_image.sif instance_name
Although the Apptainer image is running, this does not mean that you are taking advantage of HPC installations.
If OpenMPI is not correctly configured in the Apptainer container, this can result in a drop in performance. Ideally, the two OpenMPIs (on the host machine and on the container) should be identical, as a “communication” occurs between them. If you want to containerise your code/script yourself, you need to configure OpenMPI as follows:
Bootstrap: docker
From: debian:stable-slim
%environment
export PATH="$LIB_DIR/openmpi/bin:$WORK_DIR/.local/bin:$PATH"
export LD_LIBRARY_PATH="$LIB_DIR/openmpi/lib:$LD_LIBRARY_PATH"
%post
##### Defines local environment variables
export LIB_DIR=/libraries
mkdir -p $LIB_DIR
##### Installs the necessary packages
apt-get update -y
apt-get install -y gfortran gcc g++ libgomp1
apt-get install -y make wget oar-node
apt-get install -y libinput-dev
## Installs the packages for Infiniband
apt-get install -y rdma-core libibverbs-dev libibverbs1
apt-get install -y ibacm infiniband-diags librdmacm1 librdmacm-dev
apt-get install -y libibnetdisc-dev
## Installs the Omnipath packages
apt-get install -y libnuma-dev libfabric-dev libpsm2-2-compat libpsm2-dev numactl
apt-get install -y libtool opensm libopensm-dev
##### Compile OpenMPI
cd ${APPTAINER_CONTAINER}/tmp
wget https://download.open-mpi.org/release/open-mpi/vX.X/openmpi-X.X.X.tar.gz
tar -xvf openmpi-X.X.X.tar.gz
cd openmpi-X.X.X
mkdir -p $LIB_DIR/openmpi
./configure --prefix=$LIB_DIR/openmpi \
## Omnipath configuration parameters
--with-libfabric --with-psm2 \
## Infiniband configuration parameters
--enable-openib-control-hdr-padding \
--enable-openib-dynamic-sl \
--enable-openib-udcm \
--enable-openib-rdmacm \
--enable-openib-rdmacm-ibaddr \
--with-pmix=internal \
## Compilation parameters used on GRICAD clusters
--disable-mca-dso --disable-static --disable-dependency-tracking --enable-mpi-cxx
make -j4
make install
Although there is inter-version OpenMPI compatibility, using different versions of OpenMPI results in lower performance. Care must therefore be taken to use the same version of OpenMPI in the container and in the Nix profile (version of OpenMPI available with Nix) or the Guix session (for example, using the command guix install openmpi@4.1.6
to install version 4.1.6 of OpenMPI).
If you have any question ro need any help on the subject, please send an email to sos-calcul-gricad@univ-grenoble-alpes.fr.