Liste of hardware / configuration changes
Here is a changes journal of the HPC facilities. The lastest changes are on the top of the list.
2024-04-22
- Bigfoot: Continued systemd OAR job manager adaptations : fixed Suspecting bigfoot-gh1 with users having a “-” in their login and confined GPU devices into the systemd slice
2024-04-15
2024-04-12
- Bigfoot:
- Doc update for AMD GPUS (rocm nix packages upgrade)
2024-04-10
- Bigfoot:
- Upgraded bigfoot-gh1 kernel to 6.5.0-1014 and nvidia drivers to 550 —> fixed cpuset bug, so bigfoot-gh1 is back online into OAR
- Upgraded AMD GPU firmwares on bigfoot13 as an attempt to fix some issues with inter-gpus communications
- Nix: added support for aarch64 (ARM64) architecture into /applis/site/nix.sh
2024-03-29
2024-03-25
- Dahu and Bigfoot nodes:
- Added packages (deps of apptainer): squashfuse, fuse2fs, gocryptfs
singularity
is now a symbolic link to apptainer
2024-03-21
- Added a new queue
long
for jobs walltime between 48h and 160h on the Dahu cluster with a high priority on 2 nodes (dahu106 and dahu107, tagged with the long=YES OAR property)
2024-03-20
- Installed a new
bigfoot-gh1
node: it’s an experimental node containing an Nvidia Grace-Hopper GH200 motherboard (72 cores ARM64 Grace + Hopper GPU). As the node is still unstable, it’s often in the “drain” mode. When not drained, you can submit jobs with the -t gh
type from the bigfoot frontend.
2024-03-05
- MAIN MAINTENANCE
- Mantis:
- OS upgrade of the nodes
- iRODS servers upgrade 4.2.12 -> 4.3.1
- clients upgrade postponed as there’s a configuration change to deploy into users home directories
- Dahu, Luke and Bigfoot nodes:
- Security updates
- Firmware upgrades
- Upgrade BeeGFS clients to 7.4.2
- Nvidia drivers re-deployment on Bigfoot nodes
- Bettik :
- upgrade servers to BeeGFS 7.4.2
- data migration and decomission of bettik-data1
- Silenus :
- upgrade servers to BeeGFS 7.4.2
- RAM upgrade of the meta-data server 32 GB -> 254 GB
- Nix: upgrade deamon and clients to 2.18
- SSH gateways OS Upgrade
- Vacuum of OAR databases (Bigfoot/Luke/Dahu)