Ensuring Reproducibility in High-Performance Computing

A group of scientists from Inria, the Max Delbruck Center for Molecular Medicine (MDC), and the Utrecht Bioinformatics Center have joined forces to make certain that scientists use reproducible methodologies in their experiments that involve high-performance computing (HPC).

Experiments increasingly rely on HPC software, and complex, customized sets of software are frequently involved in the analysis and interpretation of data. Much like the reproducibility issues associated with antibody validation, differences in software environments can cause problems when those experiments need to be reproduced.

The international consortium is banking on Guix, free software that is used to fully reproduce computational environments. Guix is an outgrowth of a project called GNU launched almost 40 years ago at MIT.  According to the team, one of the advantages of Guix is how it characterizes software environments in unambiguous terms, similar to a mathematical function. It completely describes all its relations and thus can reproduce them bit-for-bit. This way, Guix facilitates both reproducibility and customizability, they say.

Guix was not originally designed for the HPC environments required by today's experiments. So scientists at the MDC, Inria, and the partner institutes are building functions that permit Guix to be used on a computing cluster, to implement reproducible workflows. 

Subscribe to eNewsletters
Get the latest industry news and technology
updates related to your research interests.

"Before Guix, the installation of scientific software was necessarily ad-hoc," Ricardo Wurmus of MDC says. "Groups would build their own software, statically link it into existing systems, and hope that it would never have to change, because managing software environments was virtually impossible. Now not only can we manage a single environment per group in a reliable fashion, but we use Guix at all levels: of the group, user, workflow and so on."

The project is scheduled to last two years. "The wider objective," according to Ludovic Courtès of Inria, "is to convince others who rely on high-performance computing that Guix represents a major advance toward a fundamental goal in science."

 

Caption: A view from inside the MDC's data center, with racks of high performance computers. Image courtesy of MDC.

  • <<
  • >>

Articles List

Comments