Pathway analysis - speed of FGSEA versus Mitch
FGSEA is among the most used pathway enrichment tools due to its speed and straightforward design. While doing a comparison of different enrichment tools recently, I compared it against a tool called "mitch" and I noticed something interesting. Mitch was actually faster than fgsea. This is strange, because my own tests from 2020 published in BMC Bioinformatics showed that fgsea was >10x faster (fgsea version 1.11.1) (Fig 1).
As emphasised in our 2020 paper, speed wasn't the main priority for our software. We were focused on enabling multi-dimensional analysis.
What we see is that running processes in a docker container doesn't make them slower compared to native execution (Fig 2). Also mitch was faster than all fgsea tests done here. I am not sure about what changes fgsea had after fgsea v.1.11 but they have made a big difference to speed. Although fgsea has optimised C code for permutation based tests, that is computationally intensive. Mitch on the other had is written in pure R, but doesn't use permutations, it does an ANOVA-on-ranks test which is less computationally intensive but still slow because it uses base R functions.
Also notice how BioC3-11 fgsea is really slow. I repeated it separately and can confirm these numbers are correct.
From Fig 2 you can also see that parallelisation doesn't provide a linear speedup. Sure there are some parts of these tools that are single-threaded, but it looks like the overhead of managing >10 threads is high enough to cancel out the benefit of additional threads. Therefore, on this 16 thread Threadripper 1900X workstation, any more than 8 parallel threads is just a waste.