Showing posts from September, 2019

InterPro Protein domain based gene set library

Recently I wanted to know whether genes with methyltransferase domains were upregulated in my dataset. This isn't currently captured in the major gene set databases as far as I know. I dug into some older files of mine and found that the InterPro protein domain information is actually included in Ensembl BioMart and it's relatively staightforward to convert this to GMT format for pathway analysis. TLDR; Here is a link to the human GMT file for use in GSEA and other pathway analysis Please cite the recent InterPro paper if you use this in your research: Mitchell et al 2019, If you are interested in learning how it was made, read on. Method 1- Obtaining InterPro data from Ensembl BioMart Head to  and select the human database. Then click "Attributes". Here you can select the bits of information you want. Select only the following atributes and click the boxes in the or