Posts

Showing posts from July, 2024

Weird multi-threaded behaviour of R/Bioconductor under Docker

Image
As I was running some R code under Docker code recently, I noticed that processes that should be single threaded, were using all available threads. And this behaviour was different between R on a native Linux machine as compared to Docker.  A search of the forums found that this is due to the configuraiton of the BLAS system dependancy on those Docker images, which is set to use all available threads for matrix operations. This configuration sounds like a good idea at first because dedicating more threads to a problem should speed up the execution. But realise that parallel processing incurs some overhead to coordinate the sub tasks and communicate the data to/from daughter threads. This means that you rarely achieve linear speedup the more threads you add. Typically what happens is that parallelisation has a sweetspot where the first 5-10 threads provide some speed-up, but beyond that there is either no improvement in speed or that adding additional threads actually makes the code sl

Stress test RAM annually

Image
TLDR: System memory can go bad. Use `memtester` on your Linux system annually to spot any problems early. Now the long story... In bioinformatics, we process a lot of data and conduct a lot of analysis. We use a range of devices from laptops to desktop workstations and remote servers and cloud. One particular desktop workstation of mine has been showing intermittent freezing and other problems, so I spent a bit of time trying to diagnose the issue. It is based on the AMD Threadripper 2990WX 32 Core CPU with 8x 16GB DDR4 modules, and we have been using it to process thousands of RNA sequencing datasets for the DEE2 project. It has been working at maximum capacity for about 6 years. Symptoms it had were sudden shutdowns and freezing. I checked the CPU temperatures (using the `stress` command) and it was high, in the 90 °C. range which was odd given it was water cooled with an all in one system. I removed the block to inspect the thermal paste, and I found that the block did not appear t

Pathway analysis - speed of FGSEA versus Mitch

Image
FGSEA is among the most used pathway enrichment tools due to its speed and straightforward design. While doing a comparison of different enrichment tools recently, I compared it against a tool called "mitch" and I noticed something interesting. Mitch was actually faster than fgsea. This is strange, because my own tests from 2020 published in BMC Bioinformatics showed that fgsea was >10x faster (fgsea version 1.11.1) (Fig 1). Fig 1. Tests conducted in 2020 show FGSEA is 10x faster than mitch. As emphasised in our 2020 paper, speed wasn't the main priority for our software. We were focused on enabling multi-dimensional analysis. But given this curious observation, I did a systematic test of speed of fgsea and mitch using a real dataset to try and understand whether fgsea underwent some code changes that made it slower. The codes for my work are on GitHub , and I used the bioconductor docker images  to repeat this across Bioconductor versions from 3-11 onwards. Before 3.1