Yes, mitch can be used for pathway analysis of Methylation array data

In 2020, Dr Antony Kaspi and I published a method called "mitch" [1] which is like GSEA, but was specifically designed for multi-contrast analysis, and based on rank-MANOVA statistics inspired by a 2012 paper by Cox and Mann. Mitch worked well for various types of omics data downstream of commonly used differential abundance tools like DESeq2, edgeR, DiffBind, etc, but we didn't consider at the time how mitch could be applied to microarray data. 

You might think that microarrays are outdated, but they are still used extensively for epigenome-wide association studies (EWASs), which are frequently used to understand disease processes and to identify biomarkers of disease. To demonstrate, there are 1536 publicly available methylation array studies on NCBI GEO, and probably many more thousands that are restricted access.

The tools available for pathway analysis of methylation array data are a bit limited. There's an over-representation method that take into consideration the peculiarities of array data including the variable number of probes per gene and that overlapping genes can share probes [3], but binary classification is known to limit sensitivity, which is why over-representation analysis of epigenomic data is mostly unhelpful. There have been attempts to introduce rank-based methods like ebGSEA [4] and methylGSA [5] but these tools don't retain information about the direction of change in methylation, which is important for interpretation of regulatory data, like gene expression [6].

So we undertook a study to comprehensively test different methods for rank-based pathway analysis using an exhaustive simulation approach. The eight methods we tested examined different approaches to aggregating methylation data from probes to genes to pathways. In the end the best performing approach involved gene-wise aggregation of limma t-statistics followed by enrichment test with mitch. It was also superior to existing ORA methods over most simulation conditions. With real lung cancer data, it yielded more robust results than ORA and was able to highlight methylation changes that coincided with differential expression of key pathways. The preprint is available on biorXiv.

Here is an example of using this method to analyse two independent aging studies:


Pathway methylation changes associated with aging identified using mitch.



If you already have differential methylation profiles for limma, then applying mitch is relatively easy. I have added a new vignette to our Bioconductor package and another guide on our website. Just make sure your mitch version is 1.15.1 or greater.

Feedback welcome!

References

1. Kaspi, A., Ziemann, M. mitch: multi-contrast pathway enrichment for multi-omics and single-cell profiling data. BMC Genomics 21, 447 (2020). https://doi.org/10.1186/s12864-020-06856-9

2. Cox, J., Mann, M. 1D and 2D annotation enrichment: a statistical method integrating quantitative proteomics with complementary high-throughput data. BMC Bioinformatics 13 (Suppl 16), S12 (2012). https://doi.org/10.1186/1471-2105-13-S16-S12

3. Maksimovic, J., Oshlack, A. & Phipson, B. Gene set enrichment analysis for genome-wide DNA methylation data. Genome Biol 22, 173 (2021). https://doi.org/10.1186/s13059-021-02388-x

4. Danyue Dong, Yuan Tian, Shijie C Zheng, Andrew E Teschendorff, ebGSEA: an improved Gene Set Enrichment Analysis method for Epigenome-Wide-Association Studies, Bioinformatics, Volume 35, Issue 18, September 2019, Pages 3514–3516, https://doi.org/10.1093/bioinformatics/btz073

5. Ren X, Kuan PF. methylGSA: a Bioconductor package and Shiny app for DNA methylation data length bias adjustment in gene set testing. Bioinformatics. 2019 Jun 1;35(11):1958-1959. doi: 10.1093/bioinformatics/bty892. PMID: 30346483.

6. Hong G, Zhang W, Li H, Shen X, Guo Z. Separate enrichment analysis of pathways for up- and downregulated genes. J R Soc Interface. 2013 Dec 18;11(92):20130950. doi: 10.1098/rsif.2013.0950. PMID: 24352673; PMCID: PMC3899863.


Popular posts from this blog

Data analysis step 8: Pathway analysis with GSEA

Two subtle problems with over-representation analysis

Uploading data to GEO - which method is faster?