Benchmarking R based bioinformatics speed across different computer types
As a bioinformatician, or other type of data analyst, we should be using the right tool for the job. There are many types of compute we can use from laptops, desktops, pro workstations, on prem-servers, high performance computing clusters (HPC) and cloud. One of the considerations should be the speed/performance of the computer to complete the task, as that can result in less waiting around and better productivity.
What I thought I would do today is to test a typical and relatively simple R-based bioinformatics pipeline across a few different computer systems I have on hand to see how quickly they can process a containerised workflow. The workflow involved downloading gene expression count data from DEE2, then doing differential expression with DESeq2, followed by pathway enrichment done with over-representation and with functional class scoring with the fgsea package. The workflow itself is available from the github repository. The docker container is available from dockerhub.
The script I used was:
time bash main.sh
where `main.sh` consisted of the two lines:
Rscript -e "rmarkdown::render('dataprep.Rmd')"
Rscript -e "rmarkdown::render('session2.Rmd')"
For testing I included my work laptop, a basic desktop PC, two workstations with consumer grade CPUs, a Threadripper workstation, a Xeon HPC system and a cloud based AMD Epyc server. The elapsed time was recorded an included in the table below.
The results show the Ryzen (9950X3D) workstation was fastest with 51 seconds, closely followed by the Intel i9-14900 workstation with 54 seconds. The Threadripper workstation with 5955WX recorded 68.1 seconds and the basic PC with the older i5-10400 part scored 87.9 seconds. The HPC based Xeon Gold 6240R running in an Apptainer recorded 104 seconds. The Dell laptop with i7-1365U CPU achieved a lacklustre 151 seconds. In last place was the cloud based EPYC-Rome server with a lousy 193 seconds.