EdgeR or DESeq2? Comparing the performance of differential expression tools

October 06, 2020

It seems like this discussion comes up a lot. Choosing a differential expression (DE) tool could change the results and conclusions of studies. How much depends on the strength of the data. Indeed this has been coveres by others already (here, here).

In this post I will compare these tools for a particular dataset that highlights the different ways these algorithms perform. So you can try it out at home, I've uploaded the code for this to GitHub here. To get it to work, clone the repository and open the Rmarkdown (.Rmd) file in Rstudio. You will need to install Bioconductor packages edgeR, DESeq2 and the CRAN package eulerr (for the nifty venn diagrams). Once that's done you can start working with the Rmd file. For an introduction to Rmd read on here. Suffice to say that it's the most convenient way to generate a data report that includes code, results and descriptive text for background information, interpretation, references and the rest. To create the report in Rstudio, click the "knit" button and it will execute the code and you'll get a HTML report. If you're working in the R console, for example if you are connecting to a server via SSH, then you can generate the report using the command rmarkdown::render("myreport.Rmd") and you will get the output HTML file.

In case you're impatient and just want to see the result for yourself, I've hosted my output report at this link. For everyone else, here's the walkthrough.

Multidimensional scaling analysis

Always do an MDS because it will give you an idea of the intra and inter group differences. As you can see here, there is a good measure of both that distinguishes this n=3 experiment. (The counts were derived from an ATAC-seq experiment).

DE analysis

Here is the summary of the results with the different tools:

edgeR (LRT) Up: 297 Down:589

edgeR (QL) Up:15 Down:204

DESeq2 Up:3 Down:113

Voom Up:100 Down:128

You can see the massive difference between them. In particular between edgeR LRT and DESeq. Let's take a look. The Euler package venn diagrams are quite easy to make and the areas of the shaded parts are proportional which is a nice touch.

I selected 9 genes that were significant with edgeR that were discordant with DESeq2 and created barcharts. Gene9565 in demonstrates that the algorithms deviate in how they treat outliers.

Here are the stats:

EdgeR

logFC logCPM LR PValue FDR dispersion

Gene9565 -0.7916072 2.291451 11.07955 0.0008728513 0.04036433 0.064547

C1 C2 C3 T1 T2 T3

Gene9565 21.96188 14.17148 40.87624 24.08976 24.94944 20.96064

DESeq2

baseMean log2FoldChange lfcSE stat pvalue padj C1

Gene9565 24.7313 -0.8373232 0.3698349 -2.264046 0.0235713 NA 7.203133

C2 C3 T1 T2 T3

Gene9565 7.148011 7.06639 6.875975 6.935422 6.907739

The T2 sample is clearly a bit of an outlier here but it is being marked as statistically significant by edgeR and NA by DESeq2, because of the outlier detection.

The next gene I want to highlight is Gene28045.

EdgeR

logFC logCPM LR PValue FDR dispersion

Gene28045 -0.3876685 5.166385 13.8639 0.0001965379 0.01514142 0.00657676

C1 C2 C3 T1 T2 T3

Gene28045 148.1977 95.62844 275.8308 215.6399 223.3354 187.6296

DESeq2

baseMean log2FoldChange lfcSE stat pvalue padj

Gene28045 183.5495 -0.4058583 0.1610077 -2.520739 0.01171087 0.4080693

C1 C2 C3 T1 T2 T3

Gene28045 8.468181 8.443831 8.337265 8.230507 8.135978 8.113126

Clearly this is a problem because there's no way that these genes could be validated with qPCR and t-test and get a p-value anywhere near 0.05. Therefore I'm happy to recommend people not use edgeR LRT anymore and go with the more conservative DESeq2.

What about the other tools?

Based on twitter discussions I also assessed edger with the QL test, which gave results more similar to DESeq.

Voom-limma as well

One thing I'm curious about is how the results from DESeq2 are biased with very few upregulated genes, which makes sense because this is a knock-down experiment. However with Voom-Limma, there is roughly equal number of up and downregulated genes.

UpSet plots can be useful to visualise commonlalities between three or more lists. Let's see how it looks for the 4 methods above. Firstly for the upregulated genes:

Then for the downregulated genes:

Search This Blog

Genome Spot