Bioinformatiics data processing power depends on CPU L3 cache A LOT!
When speccing a CPU for a new bioinformatics computer we tend to focus on threads and frequency, but do you look at the cache? Well you definitely should. On January 30, I started jobs on two servers. The first one has 2x Xeon E5-2680 v3 (48 threads total) and the second has 1x Xeon E5-2667 v3 (16 threads). The work we are doing is processing raw RNA-seq data using a pipeline of Skewer+STAR+Kallisto. Server 1 has many more threads, so I ran two parallel jobs of 12 threads each, while on Server 2 I'm running one job using 8 cores. The cluster nodes are attached to the same network storage, so I/O capacity is the same. Since Jan 30, these two machines have collectively processed 4670 datasets and the difference between these two machines was a big surprise. Not only did Server 2 process more datasets, when normalised by number of threads, Server 2 processed 4.1 times more datasets!
These CPUs are from the same generation and run at a similar frequency, but the main point of difference is with the massive L3 cache of Server 2.
I was a bit confused with how Xeon E5-2667 v3 could have such a huge cache, as the online documentation indicates only a 20 MB L3 cache, but noticed that the lscpu command indicated this server had 16 sockets (!), so perhaps this is a special CPU that has 16 times the normal cache. I wasn't around when this cluster was set up but it certainly gives us an insight into how important cache is for CPU performance. So does increased cache lead to a linear increase in processing power? Well here Server 2 has 5.3 times more L3 cache, and the throughput was 4.1 more, so yes the increase is nearly linear and would be a worthwhile to spend more for a bigger L3 cache!
With the trend of increasing L3 cache on AMD X3D chips (up to 128 MB), it would be great to benchmark the per-thread performance for bioinformatics tasks on those Ryzen CPUs compared to the low cache non-X3D chips. In the lack of actual benchmarks we can guess that double the cache could yield 40-50% more performance, but given the 44% premium being charged for the extra cache, the extra cost may only be justified if the machine will be heavily used.
Ryzen 7950X (64 MB L3 cache) AUD$829
Ryzen 7950X3D (128 MB L3 cache) AUD$1199
Source: Scorptec (13 Feb 2025)