Leibniz MMS Days 2020 - Abstract

Eggert, Anja

Reproducible bioinformatics workflows: A case study with software containers and interactive notebooks

[joint work with Pål O. Westermark (FBN Dummerstorf)]

Reproducible specification of workflows in bioinformatics is challenging given their complexity. We developed a new statistical method in the field of circadian rhythmicity, which allows to rigorously determine whether measured quantities such as gene expression are not rhythmic (first presented on MMS Days 2019). The statistical method itself was implemented in the R package ?HarmonicRegression?, available on the CRAN repository. However, the bioinformatics workflow is much larger than the statistical test. For instance, to ensure the validity of the statistical method, we simulated data sets of 20,000 gene expressions, with a large range of parameter combinations (e.g. sampling interval, fraction of rhythmicity, amount of outliers). We now demonstrate the use of Jupyter notebooks to document and distribute our statistical method and its application to both simulated and experimental data sets. The notebook runs inside a Docker software container. It ensures complete long-term reproducibility of the workflow. The Docker container and the Jupyter notebook will be available on GitHub, accompanying our paper with preprint available on bioRxiv.