Posts

Now published: Ten common mistakes that could ruin your enrichment analysis

Image
  Glad to share with you our published article  Ten common mistakes that could ruin your enrichment analysis . The work was done by Anusuiya Bora (Burnet Bioinformatics, Deakin Uni), Matthew McKenzie (Deakin) and I. My working group got interested into pathway enrichment analysis in around 2020/2021 because I was receiving review requests for manuscripts and many of them suffered from severe problems, leading me to believe that these issues were prevalent. Our first article on this theme  Wijesooriya et al, 2022  demonstrated that severe problems were really common across a big sample of peer-reviewed articles including: Lack of p-value correction. Wrong background list. Lack of methodological details. We showed that problems 1 and 2 led to dramatic biases in results that could alter the conclusions of transcriptome studies. Due to reviewer disagreements, we were unable to provide a more comprehensive set of best practices in that article, but this remained a priorit...

Workshop alert! Mastering Reproducible Enrichment Analysis

Image
 Join us for a 2-part workshop on Mastering Reproducible Enrichment Analysis! 📊 Presented by Anusuiya Bora and myself, with a focus on reproducibility and best practices. 📅 When: 12 and 13 May 2026  🕑 Time: 2:00 PM – 4:00 PM (AEST)  📍 Where: Online  💰 Cost: FREE for academic sector (places are limited!) 🔗Registration form link: https://tinyurl.com/ye262hzx

Tips for managing code as a researcher in life sciences

The molecular biology lab is becoming increasingly data driven and as a researcher or manager we need to make sure we're recording our work properly. Good documentation is critical to the future usability of the code, and especially so when it comes to handing a project over between project leads. It is also good practice to enable other researchers to run and use the code after publication.  So here are my recommendations as a researcher and manager of a small bioinformatics team: Use GitHub, GitLab, CodeBerg or another central repository to record changes made to code daily. Team members should invite manager and colleagues as collaborators on the Git repositories. R scripts should be written as R Markdown files, as this enables a few benefits like better documentation, outputs are arranged in sequence and high level transparency. R Markdown scripts are output as HTML files for sharing/archiving. For python based workflows, Jupyter notebooks achieve more or less the same thing. Q...

Yes, you can use a single stick of DDR5 for bioinformatics and data analysis

Image
INTRO DRAM prices skyrocketed 171% in 2025 [ 1 ], and this trend looks like it will continue into 2026 unless there is a crash in demand for hardware for GenAI applications. This leaves bioinformaticians and other data analysts in a pickle, as most applications we use require a lot of RAM. To keep costs low, we might consider using a single stick (aka Dual In-line Memory Module: DIMM) of RAM for a new workstation build, which is something that has been tried with reasonable success for low budget gaming setups [ 2 ]. So in this post we will look at whether using a single stick of DDR5 DRAM will cause a dramatic reduction in computational throughput as compared to the normal two-stick setup. We will also examine whether stock memory configuration (4800MT/s) is any slower as compared to the tweaked settings (EXPO 6000MT/s with low latency and high bandwidth support). SETUP The tests I will use include: A synthetic CPU test using stress-ng Single end RNA-seq human (STAR) Single end RNA-se...

DEE2 database gets HDF5

Here I’ll show you how to download and work with the new HDF5 datasets from DEE2 (dee2.io). HDF5 files are provide fast random access to large and complex datasets while occupying less disk space. Overall the bulk data files are 50% smaller than the previously used BZ2. It also makes selecting datasets of interest quicker and obviates the need to convert data from "long" to "wide" formats, which takes a long time and lots of RAM. In short, this is a big upgrade in end user accessibility to power large scale analysis of DEE2 transcriptome data. The materials here are mostly based on the rhdf5 package  here . I’ll demonstrate with  E. coli , but this should also work for other organisms. First step is to load the  rhdf5  library and download the h5 file. library ( "rhdf5" ) library ( "tictoc" ) if ( file.exists( "ecoli_se.h5" ) ) { message( "HDF5 file exists" ) } else { message( "Downloading HDF5 file" ) downl...

Mitch gets upgraded - now with gene set networks

Image
Interpreting pathway enrichment analysis results is a big challenge. There may be hundreds of statistically significant pathways from an analysis and getting to a shortlist of key mechanisms to follow up with experiments and describe in a publication is difficult. I recently got a request from a collaborator to come up with a way to visualise the key networks. I groaned... because network analysis in bioinformatics is sometimes characterised by showing hairballs of hundreds/thousands of meaningless interactions. Mostly it is done poorly and the charts themselves do not have any explanatory function and mostly appear to be decorative. After seeing a few well done examples such as Figure 2 from Chappel et al (PMID: 32138627), I thought this could be something we include as a common step in the mitch workflow. For those of you unaware, mitch is the R/bioconductor package that Dr Antony Kaspi and I published in 2020 (PMID: 32600408) with the main focus being on multi-dimensional enrichmen...