Posts

Showing posts from 2026

Benchmarking R based bioinformatics speed across different computer types

Image
As a bioinformatician, or other type of data analyst, we should be using the right tool for the job. There are many types of compute we can use from laptops, desktops, pro workstations, on prem-servers, high performance computing clusters (HPC) and cloud. One of the considerations should be the speed/performance of the computer to complete the task, as that can result in less waiting around and better productivity.  What I thought I would do today is to test a typical and relatively simple R-based bioinformatics pipeline across a few different computer systems I have on hand to see how quickly they can process a containerised workflow. The workflow involved downloading gene expression count data from DEE2, then doing differential expression with DESeq2, followed by pathway enrichment done with over-representation and with functional class scoring with the fgsea package. The workflow itself is available from the github repository . The docker container is available from dockerhub . ...

Now published: Ten common mistakes that could ruin your enrichment analysis

Image
  Glad to share with you our published article  Ten common mistakes that could ruin your enrichment analysis . The work was done by Anusuiya Bora (Burnet Bioinformatics, Deakin Uni), Matthew McKenzie (Deakin) and I. My working group got interested into pathway enrichment analysis in around 2020/2021 because I was receiving review requests for manuscripts and many of them suffered from severe problems, leading me to believe that these issues were prevalent. Our first article on this theme  Wijesooriya et al, 2022  demonstrated that severe problems were really common across a big sample of peer-reviewed articles including: Lack of p-value correction. Wrong background list. Lack of methodological details. We showed that problems 1 and 2 led to dramatic biases in results that could alter the conclusions of transcriptome studies. Due to reviewer disagreements, we were unable to provide a more comprehensive set of best practices in that article, but this remained a priorit...

Workshop alert! Mastering Reproducible Enrichment Analysis

Image
 Join us for a 2-part workshop on Mastering Reproducible Enrichment Analysis! 📊 Presented by Anusuiya Bora and myself, with a focus on reproducibility and best practices. 📅 When: 12 and 13 May 2026  🕑 Time: 2:00 PM – 4:00 PM (AEST)  📍 Where: Online  💰 Cost: FREE for academic sector (places are limited!) 🔗Registration form link: https://tinyurl.com/ye262hzx

Tips for managing code as a researcher in life sciences

The molecular biology lab is becoming increasingly data driven and as a researcher or manager we need to make sure we're recording our work properly. Good documentation is critical to the future usability of the code, and especially so when it comes to handing a project over between project leads. It is also good practice to enable other researchers to run and use the code after publication.  So here are my recommendations as a researcher and manager of a small bioinformatics team: Use GitHub, GitLab, CodeBerg or another central repository to record changes made to code daily. Team members should invite manager and colleagues as collaborators on the Git repositories. R scripts should be written as R Markdown files, as this enables a few benefits like better documentation, outputs are arranged in sequence and high level transparency. R Markdown scripts are output as HTML files for sharing/archiving. For python based workflows, Jupyter notebooks achieve more or less the same thing. Q...

Yes, you can use a single stick of DDR5 for bioinformatics and data analysis

Image
INTRO DRAM prices skyrocketed 171% in 2025 [ 1 ], and this trend looks like it will continue into 2026 unless there is a crash in demand for hardware for GenAI applications. This leaves bioinformaticians and other data analysts in a pickle, as most applications we use require a lot of RAM. To keep costs low, we might consider using a single stick (aka Dual In-line Memory Module: DIMM) of RAM for a new workstation build, which is something that has been tried with reasonable success for low budget gaming setups [ 2 ]. So in this post we will look at whether using a single stick of DDR5 DRAM will cause a dramatic reduction in computational throughput as compared to the normal two-stick setup. We will also examine whether stock memory configuration (4800MT/s) is any slower as compared to the tweaked settings (EXPO 6000MT/s with low latency and high bandwidth support). SETUP The tests I will use include: A synthetic CPU test using stress-ng Single end RNA-seq human (STAR) Single end RNA-se...