Mitch: An R package for Multi-Contrast Gene Set Enrichment Analysis

Gene set enrichment is one of the key methods in the understanding of gene expression patterns. As omics are becoming more widely used, large experiments are common and include not just one contrast (control vs case), but perhaps many contrasts (eg: healthy, disease, treatment 1, treatment 2, etc). Previously, as part of an epigenetics/epigenomics lab we were looking at ways to integrate different types of omics profiling such as RNA expression, histone acetylation and DNA methylation.

We stumbled across a paper that proposed the use of multivariate ANOVA on ranks (Cox & Mann 2012) to identify gene sets that exhibited enrichment in one or more contrasts. We thought this could be a simple way to identify gene set enrichments when looking at multiple contrasts without the need to run GSEA multiple times and summarise the results. The tool described in Cox & Mann (2012) is written for the Perseus suite which works for Windows. We saw there was no similar implementation in R so Antony Kaspi (now based at WEHI) wrote the first draft of the R implementation. We saw the general applicability of the tools, packaged it up and gave it a new name, mitch; a mashup of multi enrichment.

This tools was influential in several studies. In one of those, we were interested in whether the drug valproic acid (VPA) could be used to supress the negative effects of high glucose exposure on liver cells (Felisbino et al, 2018). In that study we grew cells in low and high glucose condition and with and without VPA in triplicate. We ran edgeR to identify (i) the genes altered by high glucose and (ii) the genes altered by VPA in the high glucose condition; then we supplied the edgeR tables along with some Reactome gene sets and found something very interesting. Firstly clotting and complement cascade genes were dramatically upregulated in cells exposed to high glucose, a finding that is important in the cardiovascular disease risk of diabetics, and secondly that VPA attenuated the expression of these pathways. Sure, we could have used GSEA to identify this trend as well, but the power of mitch is the ability to also generate these landscape charts which beautifully show the enrichment in two dimensions (Fig 1E below) as compared to all expressed genes (background shown in Fig 1D).