EdgeR or DESeq2? Comparing the performance of differential expression tools
It seems like this discussion comes up a lot. Choosing a differential expression (DE) tool could change the results and conclusions of studies. How much depends on the strength of the data. Indeed this has been coveres by others already (here, here).
In this post I will compare these tools for a particular dataset that highlights the different ways these algorithms perform. So you can try it out at home, I've uploaded the code for this to GitHub here. To get it to work, clone the repository and open the Rmarkdown (.Rmd) file in Rstudio. You will need to install Bioconductor packages edgeR, DESeq2 and the CRAN package eulerr (for the nifty venn diagrams). Once that's done you can start working with the Rmd file. For an introduction to Rmd read on here. Suffice to say that it's the most convenient way to generate a data report that includes code, results and descriptive text for background information, interpretation, references and the rest. To create the report in Rstudio, click the "knit" button and it will execute the code and you'll get a HTML report. If you're working in the R console, for example if you are connecting to a server via SSH, then you can generate the report using the command rmarkdown::render("myreport.Rmd") and you will get the output HTML file.
In case you're impatient and just want to see the result for yourself, I've hosted my output report at this link. For everyone else, here's the walkthrough.
Multidimensional scaling analysis
Always do an MDS because it will give you an idea of the intra and inter group differences. As you can see here, there is a good measure of both that distinguishes this n=3 experiment. (The counts were derived from an ATAC-seq experiment).
DE analysis
You can see the massive difference between them. In particular between edgeR LRT and DESeq. Let's take a look. The Euler package venn diagrams are quite easy to make and the areas of the shaded parts are proportional which is a nice touch.
I selected 9 genes that were significant with edgeR that were discordant with DESeq2 and created barcharts. Gene9565 in demonstrates that the algorithms deviate in how they treat outliers.
Here are the stats:
EdgeR
logFC logCPM LR PValue FDR dispersion
Gene9565 -0.7916072 2.291451 11.07955 0.0008728513 0.04036433 0.064547
C1 C2 C3 T1 T2 T3
Gene9565 21.96188 14.17148 40.87624 24.08976 24.94944 20.96064
DESeq2
baseMean log2FoldChange lfcSE stat pvalue padj C1
Gene9565 24.7313 -0.8373232 0.3698349 -2.264046 0.0235713 NA 7.203133
C2 C3 T1 T2 T3
Gene9565 7.148011 7.06639 6.875975 6.935422 6.907739
The T2 sample is clearly a bit of an outlier here but it is being marked as statistically significant by edgeR and NA by DESeq2, because of the outlier detection.
The next gene I want to highlight is Gene28045.
EdgeR
logFC logCPM LR PValue FDR dispersion
Gene28045 -0.3876685 5.166385 13.8639 0.0001965379 0.01514142 0.00657676
C1 C2 C3 T1 T2 T3
Gene28045 148.1977 95.62844 275.8308 215.6399 223.3354 187.6296
DESeq2
baseMean log2FoldChange lfcSE stat pvalue padj
Gene28045 183.5495 -0.4058583 0.1610077 -2.520739 0.01171087 0.4080693
C1 C2 C3 T1 T2 T3
Gene28045 8.468181 8.443831 8.337265 8.230507 8.135978 8.113126
Clearly this is a problem because there's no way that these genes could be validated with qPCR and t-test and get a p-value anywhere near 0.05. Therefore I'm happy to recommend people not use edgeR LRT anymore and go with the more conservative DESeq2.
What about the other tools?
One thing I'm curious about is how the results from DESeq2 are biased with very few upregulated genes, which makes sense because this is a knock-down experiment. However with Voom-Limma, there is roughly equal number of up and downregulated genes.