For simulations that require GPU cards to run, multiple nodes are available:
The Bigfoot cluster, dedicated to GPU computing :
Nodes with 4 NVIDIA Tesla V100 GPUs, 32 Go RAM for each GPU.
Nodes with 2 NVIDIA A100, 32 Go RAM for each GPUs
Nodes with 4 NVIDIA MI210/XGMI, 64 Go RAM for each GPUs
Nodes with T4 GPUs, called Virgo, only available at night!
The “Bigfoot” cluster is dedicated to calculations requiring the use of nodes delivering computing power through co-processors, currently GPGPUs. The access is done in a classical way, from the SSH bastions of Gricad, to the bigfoot front-end:
login@trinity:~$ ssh bigfoot Linux bigfoot 4.19.0-18-amd64 #1 SMP Debian 4.19.208-1 (2021-09-29) x86_64 Welcome to Bigfoot cluster! : : .' : _.-" : _.-" '. ..__...____...-" : : \_\ : : .--" : `.__/ .-" _ : / / ," ,- .' (_)(`,(_,'L_,_____ ____....__ _.' "' " """"""" """ GPU, GPU, GPU, ... ;-) Type 'chandler' to get cluster status Type 'recap.py' to get cluster properties Sample OAR submissions: # Get a A100 GPU and all associated cpu and memory resources: oarsub -l /nodes=1/gpu=1 --project test -p "gpumodel='A100'" "nvidia-smi -L" # Get a MIG partition of an A100 on a devel node, to make some tests oarsub -l /nodes=1/gpu=1/migdevice=1 --project test -t devel "nvidia-smi -L" Last login: Mon Jan 10 17:37:43 2022 from 22.214.171.124 login@bigfoot:~$
chandler command allows to have an overview of the available resources and their instantaneous state.
Several GPU models are available. The
recap.py command gives up-to-date information about the hardware configuration of the different nodes, in particular the model and the number of GPUs available inside the nodes:
login@bigfoot:~$ recap.py ================================================================================ | node | cpumodel | gpumodel | gpus | cpus | cores| mem | mem/gpu |MIG| ================================================================================ |bigfoot1 | intel Gold 6130| V100 | 4 | 2 | 32 | 192 | 96 | NO | | [ + 1 more node(s) ] | |bigfoot3 | intel Gold 6130| V100 | 4 | 2 | 32 | 192 | 96 | NO | |bigfoot4 | intelGold 5218R| V100 | 4 | 2 | 40 | 192 | 96 | NO | | [ + 1 more node(s) ] | |bigfoot6 | intelGold 5218R| V100 | 4 | 2 | 40 | 192 | 96 | NO | |bigfoot7 | amd EPYC 7452| A100 | 2 | 2 | 64 | 192 | 96 | YES | |bigfoot8 | intelGold 5218R| V100 | 4 | 2 | 40 | 192 | 48 | NO | |bigfoot9 | amd EPYC 7452| A100 | 2 | 2 | 64 | 192 | 96 | NO | | [ + 2 more node(s) ] | |bigfoot12| amd EPYC 7452| A100 | 2 | 2 | 64 | 192 | 96 | NO | |virgo1 | intel vcpu| T4 | 1 | 1 | 2 | 4 | 4 | NO | | [ + 33 more node(s) ] | |virgo35 | intel vcpu| T4 | 1 | 1 | 2 | 4 | 4 | NO | ================================================================================ # of GPUS: 10 A100, 28 V100, 35 T4 login@bigfoot:~$
The nodes, appart from the Virgo T4 nodes, are interconnected via the same low-latency Omnipath network as the Dahu cluster.
The usual storage spaces are available from the front-end and all nodes:
The classical NIX and GUIX environments are available and shared with the Dahu and Luke clusters, as well as the specific application directory
To install the libraries commonly used in GPU computing, you can use the predefined conda environments.
To use NVIDIA GPUs, you will also need to source the appropriate CUDA toolkit. You can use the following script by passing the name of the cluster and the desired version of the toolkit. Example for the toolkit version 11.7:
user@bigfoot:~$ source /applis/environments/cuda_env.sh 11.7
You can also list all available CUDA toolkits on the cluster using
cuda_env.sh script :
user@bigfoot:~$ source /applis/environments/cuda_env.sh -l
To launch a job, we use the OAR resource manager (whose essential commands are described in this page.), and more particularly the
The particularity on the Bigfoot cluster, is that the resource unit to request is usually a gpu. The other resources of the compute nodes (cpu-core and memory) have been distributed and associated to the gpu according to the hardware configuration of the nodes which is heterogeneous.
It is recommended (but not mandatory) to specify the GPU model you want to obtain on your compute nodes, using the OAR gpumodel property.
The following example gives the minimum options to submit a job requiring a single gpu on a node that has Nvidia A100 GPUs:
oarsub -l /nodes=1/gpu=1 -p "gpumodel='A100'" ...
OAR will also allocate, on a pro-rata basis, a certain number of general-purpose computing cores (cpu-cores) and a certain amount of memory.
This other example job will get 2 Nvidia A100 GPUs on the same node, and the associated cpu and memory resources:
oarsub -l /nodes=1/gpu=2 -p "gpumodel='A100'" ...
OAR allocates GPU units according to the
gpudevice property. To find out which units are allocated, use the
oarprint command once on the node (interactively):
user@bigfoot2:~$ oarprint gpudevice 1 0
This same command also allows you to know the cpu-cores associated with gpudevice:
user@bigfoot8:~$ oarprint -P gpudevice,cpuset core 0 3 1 12 1 18 0 9 0 1 1 17 1 14 0 5 1 10 1 16 0 8 0 4 0 6 1 15 0 7 1 11 0 2 1 13 0 0 1 19
Here, we have obtained 20 compute cores and 2 GPUs, each GPU being associated to 10 cores whose rank is listed within the compute node.
To know the amount of core memory allocated, we can query the
cgroup of the job in the following way:
user@bigfoot8:~$ cat /dev/oar_cgroups_links/memory/`cat /proc/self/cpuset`/memory.limit_in_bytes 100693262336
This amount of memory varies depending on the number of GPUs obtained on the node and the characteristics of the node. The
mem_per_gpu property (in GB) allows us to know the configuration of the nodes and to put constraints on job submission on the amount of core memory needed for the job. For example:
oarsub -l /nodes=1/gpu=2 -p "gpumodel='A100' and mem_per_gpu > 64"...
For Nvidia GPUs, the command
nvidia-smi allows to get information about the accessible GPUs. It also shows that OAR will only allow access to GPUs that have been allocated:
user@bigfoot:~$ oarsub -l /nodes=1/gpu=2 -p "gpumodel='V100'" --project test -I [ADMISSION RULE] Set default walltime to 7200. [ADMISSION RULE] Modify resource description with type constraints OAR_JOB_ID=293 Interactive mode: waiting... Starting... Connect to OAR job 293 via the node bigfoot8 user@bigfoot8:~$ nvidia-smi -L GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-263f55be-a11c-81be-8af6-e948471cb954) GPU 1: Tesla V100-SXM2-32GB (UUID: GPU-9e4e2b6c-19ea-73ef-7026-00619f988787) bzizou@bigfoot8:~$ logout Connection to bigfoot8 closed. Disconnected from OAR job 293. user@bigfoot:~$ oarsub -l /nodes=1/gpu=1 -p "gpumodel='A100'" --project test -I [ADMISSION RULE] Set default walltime to 7200. [ADMISSION RULE] Modify resource description with type constraints OAR_JOB_ID=294 Interactive mode: waiting... Starting... Connect to OAR job 294 via the node bigfoot12 user@bigfoot12:~$ nvidia-smi -L GPU 0: NVIDIA A100-PCIE-40GB (UUID: GPU-45d882aa-be45-3db7-bd3a-06da9fcaf3b1) user@bigfoot12:~$ logout
The maximum walltime is 48 hours. This is necessary to allow a reasonable turnover for a fair sharing of resources. It is also necessary to facilitate maintenance operations and machine updates. If your jobs need a longer execution time, you must set up or use checkpoint features within your applications (i.e. your applications must be able to save their state in files in order to restart later on this state by loading these files).
The submission of a CPU job using a script is explained on this page.
Submitting a GPU job is done in the same way, except for the content of the dump.sh, which is as follows:
#!/bin/bash #OAR -n Hello_World #OAR -l /nodes=1/gpu=1,walltime=00:01:30 #OAR -p gpumodel='A100' #OAR --stdout hello_world.out #OAR --stderr hello_world.err #OAR --project test cd /bettik/bouttier/ /bin/hostname >> dumb.txt
In this example, we request a job on GPU A100.
#OAR -p gpumodel='A100'
Unlike interactive submission, in a submission script,
-p gpumodel='A100' is not written in quotes.
AMD GPUs are operated via the “amdgpu” and “rocm” layers.
To join these specific nodes, you need to add the job type
amd, for example:
user@bigfoot:~$ oarsub -t amd -l /nodes=1/gpu=2 --project test -I
or as directives in a script:
#!/bin/bash #OAR -n Hello_World #OAR -l /nodes=1/gpu=1,walltime=00:01:30 #OAR -p gpumodel='MI210' #OAR -t amd #OAR --stdout hello_world.out #OAR --stderr hello_world.err #OAR --project test
For the environment, we recommend the following NIX shell, which provides the rocm, opencl and openmpi utilities, including drivers for exploiting XGMI links. You can copy/paste this entire section into an interactive job on your node:
source /applis/site/nix.sh NIX_PATH="nixpkgs=channel:nixos-23.05" nix-shell -p nur.repos.gricad.openmpi4 -p nur.repos.gricad.ucx -p rocm-smi -p clinfo -p rocm-opencl-runtime -p rocm-opencl-icd
Or run your program passively by prefixing it with the nix-shell as follows in a script:
source /applis/site/nix.sh export NIX_PATH="nixpkgs=channel:nixos-23.05" nix-shell --command <./votre_programme> -p nur.repos.gricad.openmpi4 -p nur.repos.gricad.ucx -p rocm-smi -p clinfo -p rocm-opencl-runtime -p rocm-opencl-icd
Here’s a more complete example, allowing you to configure OpenCL locally, interactively:
$ source /applis/site/nix.sh $ NIX_PATH="nixpkgs=channel:nixos-23.05" nix-shell -p nur.repos.gricad.openmpi4 -p nur.repos.gricad.ucx -p rocm-smi -p clinfo -p rocm-opencl-runtime -p rocm-opencl-icd [nix-shell:~]$ mkdir -p ~/.local/etc/OpenCL/vendors [nix-shell:~]$ echo `nix eval --raw nixpkgs.rocm-opencl-runtime.outPath`/lib/libamdocl64.so > ~/.local/etc/OpenCL/vendors/amdocl64.icd [nix-shell:~]$ export OCL_ICD_VENDORS=~/.local/etc/OpenCL/vendors/amdocl64.icd [nix-shell:~]$ clinfo Number of platforms 1 Platform Name AMD Accelerated Parallel Processing [...]
Test of IDEFIX application developped with KOKKOS (https://kokkos.org/) The Kokkos C++ EcoSystem is a solution for writing modern C++ applications, a programming model for performance and portability.
Compilation af IDEFIX code in a nix shell :
user@bigfoot:~$ cd idefix-bench/benchmark user@bigfoot:~/idefix-bench/benchmark$ NIX_PATH="nixpkgs=channel:nixpkgs-unstable" user@bigfoot:~/idefix-bench/benchmark$ . ./sourceMeFirst.sh user@bigfoot:~/idefix-bench/benchmark$ nix-shell -p nur.repos.gricad.openmpi4 -p nur.repos.gricad.ucx -p rocm-smi -p hip -p cmake
The cmake options to compile for AMD HIP: “-DKokkos_ENABLE_HIP=ON -DKokkos_ARCH_VEGA90A=ON” (for AMD Mi200/Mi250X) The HIP C°° compiler command is “hipcc”.
user@bigfoot:~/idefix-bench/benchmark$ cmake $IDEFIX_DIR -DKokkos_ENABLE_HIP=ON -DKokkos_ARCH_VEGA90A=ON -DCMAKE_CXX_COMPILER=hipcc user@bigfoot:~/idefix-bench/benchmark$ make user@bigfoot:~/idefix-bench/benchmark$ oarsub -I -lnodes=1 --project admin -t amd user@bigfoot14:~/idefix-bench/benchmark$ . /applis/site/nix.sh user@bigfoot14:~/idefix-bench/benchmark$ mpirun -np 2 --mca btl '^openib' -x UCX_TLS=sm,self,rocm_copy,rocm_ipc --mca pml ucx -x UCX_RNDV_THRESH=128 --mca osc ucx ./idefix
-t devel submission mode is available to facilitate job tuning. It is limited to jobs of 2 hours maximum, and allocates resources on nodes dedicated to this mode. This allows to have sandbox resources much more available than production resources.
But be careful, this sandbox works on smaller GPUs, which are in fact Nvidia A100 GPU partitions. Each Nvidia A100 GPU in
devel mode is in fact a sub-gpu, which we will call
mig by abuse of language (MIG=Multi Instance GPU). Also the selection of resources is a bit different, and you have to specify the number of
migdevice you want. In general we will only ask for one because it doesn’t make sense to work on several:
oarsub -l /nodes=1/gpu=1/migdevice=1,walltime=00:10:00 -t devel ...
The bigfoot cluster also hosts “virtual” nodes with small
Nvidia T4 GPUs. These GPUs are used during the day by students during their training. They are made available by the Fall project.
At night, from midnight, the virtual machines are switched on by allocating a physical GPU. The virgo nodes become active in the Bigfoot cluster, offering Nvidia T4 resources to the jobs waiting for this gpumodel. The virtual machines are shut down at 6am. Jobs must be sufficiently short (walltime less than 6am) to be allowed on these resources.
oarsub -l /nodes=1/gpu=1,walltime=04:00:00 -p "gpumodel='T4'" ...