Generating a custom gmt file for gene set analysis
Pathway and gene set analysis is a common procedure for interpretation of RNA-seq or other genome-wide expression assays. Most of the time, we use GSEA to tell us whether our gene sets of interest are up- or down-regulated. We can use gene sets from KEGG , Reactome , GO , MSigDB and other sources, but you can also generate your own gene sets. The format used for GSEA is gmt . I'm going to take you through two examples of generating custom gene sets: Generate gene sets from published data sets using GEO2R Let's say you're interested in the transcription factor STAT1. I found a dataset in GEO called "Knockdown of STAT1 in SCC61 tumor xenografts leads to alterations in the expression of energy metabolic pathways", which has a paper in BMC Med. Most uploaded array data sets can be reanalysed with GEO2R, which runs the array analysis tool Limma but this is embedded in the webpage and has a GUI which makes it very accessible for biologists. Click this link