Posts

Showing posts with the label ENCODE

Introducing the ENCODE Gene Set Hub

Image
TL;DR We curated a bunch of ENCODE data into gene sets that is super useful in pathway analysis (ie GSEA).
Link to gene sets and data: https://sourceforge.net/projects/encodegenesethub/
Poster presentation: DOI:10.13140/RG.2.2.34302.59208

Now for the longer version. Gene sets are wonderful resources. We use them to do pathway level analyses and identify trends in data that lead us to improved interpretation and new hypotheses. Most pathway analysis tools like GSEA allow us to use custom gene sets, this is really cool as you can start to generate gene sets based on your own profiling work and that of others.

There is huge value in curating experimental data into gene sets, as the MSigDB team have demonstrated. But overall, these data are under-shared. Even our group is guilty of not sharing the gene sets we've used in papers. There have been a few papers where we've used gene sets curated  from ENCODE transcription factor binding site (TFBS) data to understand which TFs were drivi…

A Biomarker Gene Set From ENCODE Expression Data

Image
MSigDB contains thousands of gene sets which have been mined from a range of genome wide studies and these are a valuable resource for gene ontology and pathway analysis. You probably know that many gene sets are curated by KEGG, REACTOME and BIOCARTA - as well as dry lab scientists who specialise in analysing these data sets and curating gene sets. What you may not know is that if you follow a few basic guidelines, you can start generating your own custom gene sets and these can become a valuable resource for running gene ontology and Gene Set Enrichment Analysis (as per the graphic)


For instance within our lab, we have extensively used ENCODE ChIP-Seq data to help us to analyse our mRNA-seq data and this has provided a huge leg-up in generating hypothesis and designing follow-up experiments. For this example, I want to show an overview of how I made biomarker gene sets for a bunch of cell types analysed by ENCODE. Biomarker gene sets can useful in array or mRNA-Seq analysis that you…