Towards more reproducibility

How can you make your developments and calculations more reproducible?

Why and how can you make your developments and calculations more reproducible? Reproducibility of research is a theme that concerns all scientific disciplines and is fully in line with the Open Science plan and the law on scientific integrity. The benefits are numerous, and concern a whole range of people. Revisiting your work 5 years later, passing it on to colleagues in your team or to an incoming PhD student, or ensuring wider dissemination are all the easier if certain good practices have been followed during the production, use and dissemination phases.

Best practices for software development

Use a naming convention for your variables and files to make your software easier to read and use.
Choose and license your developments.
- This presentation explains the various possible licenses and discusses the legal aspects. The license determines how users will be able to reuse your developments.
Using an integrated development environment (IDE) There are several IDEs to choose from, and you’ll need to select the one best suited to your needs: VScodium, Xcode, Eclipse, …
Use a software forge to manage collaborative history
- Preferably use a institutional forge such as gricad-gitlab.
Include unit tests in the code.
- For more information, see this video explaining the benefits and implementation of unit testing: Tests unitaires, une philosophie et une aide face à son logiciel (in french).
Include automatically executed test cases (thanks in particular to continuous integration) to check that new developments do not modify the software’s behavior.
Write clear, detailed documentation as automatically as possible
- There are several tools available for automatic code documentation: Sphynx, Doxygen, etc …

For further information, please consult this document published by IT specialists from DEVLOG :

Je code : Les bonnes pratiques de développement logiciel (in french).

Best practices for performing calculations

To make your work reproducible, you need to pay particular attention to the software environment in which you carried out your developments and executed your code. The description of this environment and how to recreate it elsewhere and/or later must be distributed in the same way as your software.

On Gricad clusters, several tools are available for deploying the software environment you need: see the High-Performance Computing section. Some are more reproducible than others, for intrinsic reasons. For example, Guix or Nix allow you to build an environment isolated from the computing machine, which is less true in the case of conda / mamba / micromamba, which rely on software components already installed on the machine.

Another way to reproduce your software environment is to use containers. This makes it possible to provide a user with software and a complete, isolated software environment (libraries and dependencies) to run it. Popular containerization platforms include Docker and Apptainer, but there are many others.

In terms of reproducibility, Guix and Nix are clearly the best performers, followed by containers. Tools based on conda / mamba perform less well in terms of reproducibility.

Apart from containers, which have a different operating mode, each tool has its own way of “capturing” the runtime environment:

guix package --export-manifest > manifest.scm
nix-env --switch-profile $NIX_USER_PROFILE_DIR/test_profile
conda create -n <env_name> <pkg-name1> <pkg-name2> <pkg-name3>

and replace it at a later date:

guix shell -m manifest.scm
nix-env --switch-profile $NIX_USER_PROFILE_DIR/test_profile
conda activate /path/to/env

To ensure reproducibility, the project’s git repository will contain this information. In this way, a user who has downloaded the software sources will be able to know and redeploy the execution software environment.

Last but not least, a document describing the various execution stages, and possibly a test case, is also necessary: this will enable the user to clearly identify the parts to be executed and in what order.

For a more exhaustive description, you can find information on this site.

Best practices for software distribution

If your developments are associated with a publication, the reader will want to have access to the version that corresponds exactly to the one described in the article. You will therefore need to indicate precisely the corresponding “commit”. The [Software Heritage] platform (https://www.softwareheritage.org/?lang=fr) will be a great help in this respect.

Preliminary stages

To distribute your software, there are several preliminary steps to take:

Check that the git repository containing the source files is indexed in the [Software Heritage] platform (https://www.softwareheritage.org/).
- This provides a perennial identifier (SWHID) that can be used to fix the version of sources used in a publication, for example. Each subsequent contribution can itself be identified by a different identifier.
Write a HAL entry for your software
- Writing the HAl entry for a software application is made much easier by indicating the SWHID: all metadata is automatically retrieved (git repository, etc.).

Distribution

The HAL reference normally contains all the information you need to download and reuse your software!

For further information, please consult this document published by the IT specialists of DEVLOG :

Je code : les bonnes pratiques en matière de diffusion (in french).

Conclusions

In short, you need to supply your software with the chosen license, the list of authors and contributors, its SWHID or HAL identifier, a description of the software environment and its documentation (code + execution).

For any question: do not hesitate to contact us by mail: sos-gricad@univ-grenoble-alpes.fr

For more information on GRICAD, please visit our website.

If you’re interested in reproducibility, you’ll find more information on the National Reproducible Research Network website.