Mitch gets upgraded - now with gene set networks
Interpreting pathway enrichment analysis results is a big challenge. There may be hundreds of statistically significant pathways from an analysis and getting to a shortlist of key mechanisms to follow up with experiments and describe in a publication is difficult. I recently got a request from a collaborator to come up with a way to visualise the key networks. I groaned... because network analysis in bioinformatics is sometimes characterised by showing hairballs of hundreds/thousands of meaningless interactions. Mostly it is done poorly and the charts themselves do not have any explanatory function and mostly appear to be decorative. After seeing a few well done examples such as Figure 2 from Chappel et al (PMID: 32138627), I thought this could be something we include as a common step in the mitch workflow. For those of you unaware, mitch is the R/bioconductor package that Dr Antony Kaspi and I published in 2020 (PMID: 32600408) with the main focus being on multi-dimensional enrichment analysis.
In this post we'll look at some data comparing control and SARS-COV2 infected cells, that is publicly available on NCBI SRA. The full workflow (R Markdown script and html report) is uploaded here including fetching the expression counts and differential expression (DE). The DE table looks like this:
Gene ID | baseMean | log2FoldChange | lfcSE | stat | pvalue padj | |
---|---|---|---|---|---|---|
ENSG00000113739 | 3523.734 | 4.004593 | 0.1012568 | 39.54888 | 0 | 0 |
ENSG00000176153 | 8851.309 | -2.785312 | 0.0820808 | -33.93378 | 0 | 0 |
ENSG00000058085 | 5767.111 | 3.699041 | 0.1148334 | 32.21223 | 0 | 0 |
ENSG00000169710 | 6917.854 | -2.259158 | 0.0703230 | -32.12547 | 0 | 0 |
ENSG00000170421 | 41561.919 | -2.233482 | 0.0742019 | -30.10007 | 0 | 0 |
ENSG00000100297 | 2476.082 | -2.364383 | 0.0802583 | -29.45965 | 0 | 0 |
There are some strong gene expression changes as shown by the volcano plot.
Next, we can use mitch to run the pathway enrichment analysis.
genesets <- gmt_import("ReactomePathways.gmt")
Next, use mitch import to score the de table and add the gene symbols according to a mapping table called gt.
m <- mitch_import(de,DEtype = "DEseq2", geneTable = gt)
This generates a dataframe with one column that shows the gene score. In this case the DEseq2 test statistic.
Next, run mitch enrichment tests. Note that I'm using prioritisation by effect size. This is the preferred way as it will prioritise relatively specific pathways with large changes in expression. On the other hand if you want to prioritise by significance, you will likely get large gene sets that show relatively subtle expression changes that I find uninteresting.
mres <- mitch_calc(m,genesets = genesets,priority="effect",cores=8)
networkplot(res,FDR=0.05,n_sets=20)
The FDR threshold and number of pathways to show in each chart are customisable.
This visualisation works best with an extra-wide aspect, for example 14 wide and 9 high. This helps to prevent the longer pathway names from being cut off. If you are using R Markdown you can use the chunk header option "fig.height=9,fig.width=14". My other recommendation is to set the output device a SVG, which will allow you to move around the pathway labels to make presentation clearer. To do that, simply include the following chunk somewhere above your networkplot() call.
knitr::opts_chunk$set(dev = 'svg')
There is some aspect of randomness when it comes to generating these charts, so if you get one that looks to have too many ugly overlapping labels, you can generate a few more and pick the most aesthetically appealing one. It uses the R/CRAN package "network" to generate the network.
When it comes to investigating the key driver genes underpinning these charts, use the network_genes() function. It will show all the driver genes for each pathway. This will help you when deciding which genes should be investigated further.
network_genes(mres,FDR=0.05,n_sets=20)
networkplot() and network_genes() functions have been added to the development version of mitch, which is available from Bioconductor and from GitHub. I hope you find it useful!