A Biomarker Gene Set From ENCODE Expression Data
MSigDB contains thousands of gene sets which have been mined from a range of genome wide studies and these are a valuable resource for gene ontology and pathway analysis. You probably know that many gene sets are curated by KEGG, REACTOME and BIOCARTA - as well as dry lab scientists who specialise in analysing these data sets and curating gene sets. What you may not know is that if you follow a few basic guidelines, you can start generating your own custom gene sets and these can become a valuable resource for running gene ontology and Gene Set Enrichment Analysis (as per the graphic) For instance within our lab, we have extensively used ENCODE ChIP-Seq data to help us to analyse our mRNA-seq data and this has provided a huge leg-up in generating hypothesis and designing follow-up experiments. For this example, I want to show an overview of how I made biomarker gene sets for a bunch of cell types analysed by ENCODE. Biomarker gene sets can useful in array or mRNA-Seq analysis tha