Posts

Showing posts from 2025

Bioinformatiics data processing power depends on CPU L3 cache A LOT!

When speccing a CPU for a new bioinformatics computer we tend to focus on threads and frequency, but do you look at the cache? Well you definitely should. On January 30, I started jobs on two servers. The first one has 2x Xeon E5-2680 v3 (48 threads total) and the second has 1x Xeon E5-2667 v3 (16 threads). The work we are doing is processing raw RNA-seq data using a pipeline of Skewer+STAR+Kallisto. Server 1 has many more threads, so I ran two parallel jobs of 12 threads each, while on Server 2 I'm running one job using 8 cores. The cluster nodes are attached to the same network storage, so I/O capacity is the same. Since Jan 30, these two machines have collectively processed 4670 datasets and the difference between these two machines was a big surprise. Not only did Server 2 process more datasets, when normalised by number of threads, Server 2 processed 4.1 times more datasets!  Server 1 Server 2 CPU 2x Xeon E5-2680 v3 1x Xeon E5-2667 v3 Cores 24 16 Threads 48 16 Frequency ...

DEE2 2025 updates: growth and development

Image
DEE2 is a database service I co-founded in 2015 aiming to provide uniformly processed gene expression profiles for each and every RNA-seq dataset in NCBI's Sequence Read Archive. After some major revisions, we published the database/service journal article in 2019. Over time, DEE2 has grown dramatically, with the number of datasets and total number of counted reads, as seen in the tables below. Here, I'll walk you  through the  growth of the database over time and the new features we've added. Growth of DEE2 DEE2  continues to ingest new metadata from NCBI SRA and GEO, and the growth of this metadata set has caused us a lot of issues over time. In the early years of DEE2 we used SRAdb, and then it became too large for its design, so we sought different solutions. Currently we are using pySRA and only fetching quarterly. The size of each request has been a challenge, with human and mouse requests of annual  dumps exceeding the 64 GB RAM of our backend workstation! S...