Posts

Showing posts with the label Reactome

Get the newest Reactome gene sets for pathway analysis

Image
For first-pass pathway analysis I find Reactome to be the most useful database of gene sets for biologists to understand. For a long time I have been using Reactome gene sets as deposited to the GSEA/MSigDB website. Recently a colleague pointed me to the gene matrix file offered directly on the Reactome webpage (Thanks Dr Okabe).

There are some differences. Firstly there are more gene sets in the one from the Reactome webpage (accessed 2018-05-09)
$ wc -l *gmt     674 c2.cp.reactome.v6.1.symbols.gmt    2022 ReactomePathways.gmt    2696 total
Secondly, there are more genes included in one or more gene sets:
$ cut -f3- c2.cp.reactome.v6.1.symbols.gmt | tr '\t' '\n' | sort -u | wc -l 6025 $ cut -f3- ReactomePathways.gmt | tr '\t' '\n' | sort -u | wc -l 10852
And overall there are about threefold more gene-pathway entries 
$ cut -f3- ReactomePathways.gmt | wc -w 106405 $ cut -f3- c2.cp.reactome.v6.1.symbols.gmt | wc -w 37601
I also looked at whether the gen…