Tips for managing code as a researcher in life sciences
The molecular biology lab is becoming increasingly data driven and as a researcher or manager we need to make sure we're recording our work properly. Good documentation is critical to the future usability of the code, and especially so when it comes to handing a project over between project leads. It is also good practice to enable other researchers to run and use the code after publication.
So here are my recommendations as a researcher and manager of a small bioinformatics team:
- Use GitHub, GitLab, CodeBerg or another central repository to record changes made to code daily.
- Team members should invite manager and colleagues as collaborators on the Git repositories.
- R scripts should be written as R Markdown files, as this enables a few benefits like better documentation, outputs are arranged in sequence and high level transparency. R Markdown scripts are output as HTML files for sharing/archiving. For python based workflows, Jupyter notebooks achieve more or less the same thing. Quarto notebook is also a good option.
- Each chunk of code should have a short description on what is being done and the approach being used.
- After each chunk, note the key results.
- At the end of the script, use the sessionInfo() command which shows versions of R and packages.
- At the end of the script run save.image("myproject.RData") to save all the key data objects to a file.
- Team members should document their code sufficiently, including a detailed project README file. The README should describe how to run all the analyses shown in the paper.
- The manager should regularly check the repository to ensure it is sufficiently documented. This helps future reuse.
- At the manager's discretion, materials should be uploaded to the electronic notebook. For example you may do this on a monthly basis with a calendar reminder. Or at the completion of certain milestones. Materials would include the git repository, the R Markdown reports in HTML format and the saved RData files. This needs regular checking by the manager.
- By the time the project is getting written as a manuscript, the code should be reproducible, by installing it and running on a different computer or by having another team member running it.
- You can publicly share the GitHub/GitLab repository when submitting the manuscript for publication. A better alternative is to deposit the code to Zenodo, as it will get a DOI and has better guarantees of preservation.
I think that about covers it. Related reading: The five pillars of computational reproducibility.
No AI was used to create this post.