Bioinformatiics data processing power depends on CPU L3 cache A LOT!
When speccing a CPU for a new bioinformatics computer we tend to focus on threads and frequency, but do you look at the cache? Well you definitely should. On January 30, I started jobs on two servers. The first one has 2x Xeon E5-2680 v3 (48 threads total) and the second has 1x Xeon E5-2667 v3 (16 threads). The work we are doing is processing raw RNA-seq data using a pipeline of Skewer+STAR+Kallisto. Server 1 has many more threads, so I ran two parallel jobs of 12 threads each, while on Server 2 I'm running one job using 8 cores. The cluster nodes are attached to the same network storage, so I/O capacity is the same. Since Jan 30, these two machines have collectively processed 4670 datasets and the difference between these two machines was a big surprise. Not only did Server 2 process more datasets, when normalised by number of threads, Server 2 processed 4.1 times more datasets! Server 1 Server 2 CPU 2x Xeon E5-2680 v3 1x Xeon E5-2667 v3 Cores 24 16 Threads 48 16 Frequency ...