Use Apptainer images

Why using containers?

A container is an immutable object that can be seen as a lightweight virtual environment (compared to virtual machines) allowing good isolation of processes, codes or scripts. All the libraries required for the runtime are embedded within the container, which ensures a certain degree of portability, reproducibility and distribution, whatever the operating system. In particular, this makes it possible to “freeze” libraries or code over, enabling their use beyond their depreciation.

A container system allows you to run a container on a personal machine, on a local cluster in a laboratory or university, or in a mesocentre. It also allows you to run any code flexibly with the same user rights as your account.

The available container systems and their versions are:

Apptainer (v.1.2.2)
Singularity (v.3.7.1)

Singularity is currently no longer updated. It will be deprecated in the near future in favour of Apptainer, which is an open-source project.

This documentation is limited to explaining the use of Apptainer (and containers) on computing machines, and does not cover image creation at all.

User guide

The container systems are available on the Dahu and Bigfoot clusters, and are compatible with the use of CiGri. This means you don’t need to install Apptainer with Nix or Guix. Apptainer commands are available on the compute nodes (but not on head nodes).

The use of container systems is not recommended on the Luke cluster.

You need to have an existing container image (.sif file) and upload it to the machines. It will not be possible to create an image on the clusters. We won’t go into detail here about all the options offered by Apptainer, please refer to their documentation.

Launching an Apptainer image on a single core

The default execution of a container is obtained with the run command:

apptainer run apptainer_image.sif

This run command executes the commands that were written in the %runscript section when the image was created. You should therefore take a look at the container definition file.

You should use the exec command to execute the commands you want inside the container:

apptainer exec apptainer_image.sif command...

Apptainer mounts some folders in the container by default, such as $HOME, the current directory, /tmp, /proc, /dev, … To isolate the container from the host machine, you can use the --no-mount flag (to avoid mounting $HOME) or --containall to completely isolate the container from the host.

The recommended use of Apptainer is with the --containall flag, followed by several --bind flags to mount only the needed folders to launch the container :

apptainer exec \
--containall \
--bind path_folder1_host:path_folder1_container \
--bind path_folder2_host:path_folder2_container \
apptainer_image.sif command...

Launching a Docker image with Apptainer

Docker is a popular solution for containerisation. The code or script you want to use may have been packaged in a Docker image. Apptainer aims for maximum compatibility with Docker, which means you can use Apptainer commands to launch a Docker image.

apptainer exec docker://docker_image

The recommendation for Docker images is to create an Apptainer image from the Docker image on your personal machine, then upload it to the clusters and use the Apptainer commands. This is done using the command apptainer pull docker://docker_image. Take a look at the Apptainer documentation on the topic.

Launching an Apptainer image on several cores with OpenMPI

Some containerised code/scripts are massively parallel and rely on the use of OpenMPI, which is then embedded in the container. There is therefore “communication” between OpenMPI on the host machine (Dahu for example) and OpenMPI embedded in the Apptainer image. This method is called the hybrid model.

First of all, you need to install OpenMPI in your Nix profile or your Guix session. Next, you need to use Apptainer instances to homogenise the OpenMPI processes namespaces. To do this, we recommend using a submission script. In practice, you need to add these lines to the .sh (or .oar) file.

# if OpenMPI has been installed on a NIX profile, run
source /applis/site/nix.sh

# set the parameters for OpenMPI
export OMPI_MCA_btl_openib_allow_ib=true
export OMPI_MCA_pml=cm
export OMPI_MCA_mtl=psm2

# launch an apptainer instance on each compute node
mpirun -npernode 1 \
        --machinefile $OAR_NODE_FILE \
        --mca plm_rsh_agent "oarsh" \
        --prefix $HOME/.nix-profile \
        apptainer instance start \
        apptainer_image.sif instance_name

# run the code/script with the MPI command
mpirun -np `cat $OAR_FILE_NODES|wc -l` \
        --machinefile $OAR_NODE_FILE \
        --mca plm_rsh_agent "oarsh" \
        --prefix $HOME/.nix-profile \
        apptainer exec instance://instance_name \
        /bin/bash -c "command..."

# stop instances that were started earlier
mpirun -npernode 1 \
        --machinefile $OAR_NODE_FILE \
        --mca plm_rsh_agent "oarsh" \
        --prefix $HOME/.nix-profile \
        apptainer instance stop instance_name

If you are using a Guix session, replace the line source /applis/site/nix.sh with source /applis/site/guix-start.sh, and the lines --prefix $HOME/.nix-profile with --prefix $HOME/.guix-profile.

If you want to mount specific folders with the --bind flag, this must be done when the instance is created.

mpirun -npernode 1 \
        --machinefile $OAR_NODE_FILE \
        --mca plm_rsh_agent "oarsh" \
        --prefix $HOME/.nix-profile \
        apptainer instance start \
        --bind path_folder_host:path_folder_container \
        apptainer_image.sif instance_name

Advice and best practice

Although the Apptainer image is running, this does not mean that you are taking advantage of HPC installations.

If OpenMPI is not correctly configured in the Apptainer container, this can result in a drop in performance. Ideally, the two OpenMPIs (on the host machine and on the container) should be identical, as a “communication” occurs between them. If you want to containerise your code/script yourself, you need to configure OpenMPI as follows:

Bootstrap: docker  
From: debian:stable-slim

%environment
  export PATH="$LIB_DIR/openmpi/bin:$WORK_DIR/.local/bin:$PATH"
  export LD_LIBRARY_PATH="$LIB_DIR/openmpi/lib:$LD_LIBRARY_PATH"
  
%post
##### Defines local environment variables
  export LIB_DIR=/libraries
  mkdir -p $LIB_DIR
##### Installs the necessary packages
  apt-get update -y
  apt-get install -y gfortran gcc g++ libgomp1
  apt-get install -y make wget oar-node
  apt-get install -y libinput-dev 
  ## Installs the packages for Infiniband
  apt-get install -y rdma-core libibverbs-dev libibverbs1
  apt-get install -y ibacm infiniband-diags librdmacm1 librdmacm-dev
  apt-get install -y libibnetdisc-dev 
  ## Installs the Omnipath packages
  apt-get install -y libnuma-dev libfabric-dev libpsm2-2-compat libpsm2-dev numactl
  apt-get install -y libtool opensm libopensm-dev

##### Compile OpenMPI
  cd ${APPTAINER_CONTAINER}/tmp
  wget https://download.open-mpi.org/release/open-mpi/vX.X/openmpi-X.X.X.tar.gz
  tar -xvf openmpi-X.X.X.tar.gz
  cd openmpi-X.X.X
  mkdir -p $LIB_DIR/openmpi
  ./configure --prefix=$LIB_DIR/openmpi \
    ## Omnipath configuration parameters
    --with-libfabric --with-psm2 \
    ## Infiniband configuration parameters
    --enable-openib-control-hdr-padding \
    --enable-openib-dynamic-sl \
    --enable-openib-udcm \
    --enable-openib-rdmacm \
    --enable-openib-rdmacm-ibaddr \
    --with-pmix=internal \
    ## Compilation parameters used on GRICAD clusters
    --disable-mca-dso --disable-static --disable-dependency-tracking --enable-mpi-cxx
  make -j4
  make install

Although there is inter-version OpenMPI compatibility, using different versions of OpenMPI results in lower performance. Care must therefore be taken to use the same version of OpenMPI in the container and in the Nix profile (version of OpenMPI available with Nix) or the Guix session (for example, using the command guix install openmpi@4.1.6 to install version 4.1.6 of OpenMPI).

To go further

Find out more about Apptainer
OpenMPI prerequisites for Apptainer to install the required libraries for using Infiniband and Omnipath
Find out more about Apptainer instances
Understanding the need to use consistent namespaces section “User namespace and mount namespace of processes spawned by MPI launcher with Apptainer”

If you have any question ro need any help on the subject, please send an email to sos-calcul-gricad@univ-grenoble-alpes.fr.