Mitch: An R package for Multi-Contrast Gene Set Enrichment Analysis
Gene set enrichment is one of the key methods in the understanding of gene expression patterns. As omics are becoming more widely used, large experiments are common and include not just one contrast (control vs case), but perhaps many contrasts (eg: healthy, disease, treatment 1, treatment 2, etc). Previously, as part of an epigenetics/epigenomics lab we were looking at ways to integrate different types of omics profiling such as RNA expression, histone acetylation and DNA methylation.
We stumbled across a paper that proposed the use of multivariate ANOVA on ranks (Cox & Mann 2012) to identify gene sets that exhibited enrichment in one or more contrasts. We thought this could be a simple way to identify gene set enrichments when looking at multiple contrasts without the need to run GSEA multiple times and summarise the results. The tool described in Cox & Mann (2012) is written for the Perseus suite which works for Windows. We saw there was no similar implementation in R so Antony Kaspi (now based at WEHI) wrote the first draft of the R implementation. We saw the general applicability of the tools, packaged it up and gave it a new name, mitch; a mashup of multi enrichment.
This tools was influential in several studies. In one of those, we were interested in whether the drug valproic acid (VPA) could be used to supress the negative effects of high glucose exposure on liver cells (Felisbino et al, 2018). In that study we grew cells in low and high glucose condition and with and without VPA in triplicate. We ran edgeR to identify (i) the genes altered by high glucose and (ii) the genes altered by VPA in the high glucose condition; then we supplied the edgeR tables along with some Reactome gene sets and found something very interesting. Firstly clotting and complement cascade genes were dramatically upregulated in cells exposed to high glucose, a finding that is important in the cardiovascular disease risk of diabetics, and secondly that VPA attenuated the expression of these pathways. Sure, we could have used GSEA to identify this trend as well, but the power of mitch is the ability to also generate these landscape charts which beautifully show the enrichment in two dimensions (Fig 1E below) as compared to all expressed genes (background shown in Fig 1D).
This multi-contrast gene set enrichment approach is well suited to a few different applications:
Mitch was accepted and available in Bioconductor 3.11 release (Ziemann & Kaspi 2020). We have a journal paper at BMC Genomics demonstrating the features and accuracy of mitch (link). Any bugs or suggestions can be reported at the GitHub repo (https://github.com/markziemann/mitch).
References
Cox J, Mann M. 1D and 2D annotation enrichment: a statistical method integrating quantitative proteomics with complementary high-throughput data. BMC Bioinformatics. 2012;13 Suppl 16(Suppl 16):S12. doi:10.1186/1471-2105-13-S16-S12
Felisbino MB, Ziemann M, Khurana I, de Oliveira CBM, Mello MLS, El-Osta A. Valproic acid attenuates hyperglycemia-induced complement and coagulation cascade gene expression. bioRxiv 253591; doi: https://doi.org/10.1101/253591
Crowell HL, Soneson C, Germain PL, Calini D, Collin L, Raposo C, Malhotra D, Robinson MD. On the discovery of subpopulation-specific state transitions from multi-sample multi-condition single-cell RNA sequencing data. bioRxiv 713412; doi: https://doi.org/10.1101/713412
Ziemann M, Kaspi A (2020). mitch: Multi-Contrast Gene Set Enrichment Analysis. R package version 1.0.2, https://github.com/markziemann/mitch.
We stumbled across a paper that proposed the use of multivariate ANOVA on ranks (Cox & Mann 2012) to identify gene sets that exhibited enrichment in one or more contrasts. We thought this could be a simple way to identify gene set enrichments when looking at multiple contrasts without the need to run GSEA multiple times and summarise the results. The tool described in Cox & Mann (2012) is written for the Perseus suite which works for Windows. We saw there was no similar implementation in R so Antony Kaspi (now based at WEHI) wrote the first draft of the R implementation. We saw the general applicability of the tools, packaged it up and gave it a new name, mitch; a mashup of multi enrichment.
This tools was influential in several studies. In one of those, we were interested in whether the drug valproic acid (VPA) could be used to supress the negative effects of high glucose exposure on liver cells (Felisbino et al, 2018). In that study we grew cells in low and high glucose condition and with and without VPA in triplicate. We ran edgeR to identify (i) the genes altered by high glucose and (ii) the genes altered by VPA in the high glucose condition; then we supplied the edgeR tables along with some Reactome gene sets and found something very interesting. Firstly clotting and complement cascade genes were dramatically upregulated in cells exposed to high glucose, a finding that is important in the cardiovascular disease risk of diabetics, and secondly that VPA attenuated the expression of these pathways. Sure, we could have used GSEA to identify this trend as well, but the power of mitch is the ability to also generate these landscape charts which beautifully show the enrichment in two dimensions (Fig 1E below) as compared to all expressed genes (background shown in Fig 1D).
Fig 1. Application of mitch to understand effects of hyperglycemia (HG) and valproic acid (VPA) on gene expression. |
- Multi-contrast omics experiments, like the 2D RNA-seq example above
- Multi-omics case-control experiments like RNA, DNA methylation, histone, proteomics, etc
- Multi-sample case-control single-cell RNA sequencing experiments after pseudobulk summarisation with Muscat (Crowell et al, 2019)
- Comparison of different upstream bioinformatics tools - for example, testing whether edgeR and DESeq2 give different pathway-level results
- Exploring the effect of covariates in a clinical study - for example, visualising the pathway-level effects of age, smoking, diet in a case-control study looking at gene expression in cardiovascular disease
Fig 2. Overview of the mitch workflow.The mitch package consists of five functions (left). Example minimal workflow (right) |
- A function to import and generate ranked profiles from limma, edgeR, DESeq2, Muscat and several other tools.
- A function to import gene sets in the GMT format (for example from MSigDB).
- A function to calculate the enrichment
- A function to create a HTML report using rmarkdown which contains these pretty charts and a couple of interactive ones too
- A function to create charts in PDF. High-res for publications.
Mitch was accepted and available in Bioconductor 3.11 release (Ziemann & Kaspi 2020). We have a journal paper at BMC Genomics demonstrating the features and accuracy of mitch (link). Any bugs or suggestions can be reported at the GitHub repo (https://github.com/markziemann/mitch).
Obligatory hex-logo. |
References
Cox J, Mann M. 1D and 2D annotation enrichment: a statistical method integrating quantitative proteomics with complementary high-throughput data. BMC Bioinformatics. 2012;13 Suppl 16(Suppl 16):S12. doi:10.1186/1471-2105-13-S16-S12
Felisbino MB, Ziemann M, Khurana I, de Oliveira CBM, Mello MLS, El-Osta A. Valproic acid attenuates hyperglycemia-induced complement and coagulation cascade gene expression. bioRxiv 253591; doi: https://doi.org/10.1101/253591
Crowell HL, Soneson C, Germain PL, Calini D, Collin L, Raposo C, Malhotra D, Robinson MD. On the discovery of subpopulation-specific state transitions from multi-sample multi-condition single-cell RNA sequencing data. bioRxiv 713412; doi: https://doi.org/10.1101/713412
Ziemann M, Kaspi A (2020). mitch: Multi-Contrast Gene Set Enrichment Analysis. R package version 1.0.2, https://github.com/markziemann/mitch.