Get the newest Reactome gene sets for pathway analysis
For first-pass pathway analysis I find Reactome to be the most useful database of gene sets for biologists to understand. For a long time I have been using Reactome gene sets as deposited to the GSEA/MSigDB website. Recently a colleague pointed me to the gene matrix file offered directly on the Reactome webpage (Thanks Dr Okabe).
![]() |
The latest Reactome gene set matrix file (gmt) can be found at this link https://reactome.org/download/current/ReactomePathways.gmt.zip |
There are some differences. Firstly there are more gene sets in the one from the Reactome webpage (accessed 2018-05-09)
$ wc -l *gmt
674 c2.cp.reactome.v6.1.symbols.gmt
2022 ReactomePathways.gmt
2696 total
Secondly, there are more genes included in one or more gene sets:
$ cut -f3- c2.cp.reactome.v6.1.symbols.gmt | tr '\t' '\n' | sort -u | wc -l
6025
$ cut -f3- ReactomePathways.gmt | tr '\t' '\n' | sort -u | wc -l
10852
And overall there are about threefold more gene-pathway entries
$ cut -f3- ReactomePathways.gmt | wc -w
106405
$ cut -f3- c2.cp.reactome.v6.1.symbols.gmt | wc -w
37601
I also looked at whether the gene sets had the same names.
First I counted the names that were common.
$ comm -12 <(cut -f1 ReactomePathways.gmt | tr '[a-z]' '[A-Z]' | tr ' -' '_' | sort ) <(cut -f1 c2.cp.reactome.v6.1.symbols.gmt | tr '[a-z]' '[A-Z]' | tr ' -' '_' | sed 's/REACTOME_//' | sort ) | wc -l
414
Then I counted the names specific to the MSigDB version
$ comm -13 <(cut -f1 ReactomePathways.gmt | tr '[a-z]' '[A-Z]' | tr ' -' '_' | sort ) <(cut -f1 c2.cp.reactome.v6.1.symbols.gmt | tr '[a-z]' '[A-Z]' | tr ' -' '_' | sed 's/REACTOME_//' | sort ) | wc -l
260
Lastly I counted the names specific to the official version
$ comm -23 <(cut -f1 ReactomePathways.gmt | tr '[a-z]' '[A-Z]' | tr ' -' '_' | sort ) <(cut -f1 c2.cp.reactome.v6.1.symbols.gmt | tr '[a-z]' '[A-Z]' | tr ' -' '_' | sed 's/REACTOME_//' | sort ) | wc -l
1608
I have re-run a couple of GSEAs with the official Reactome gene sets and have obtained MUCH better results, so I would recommend to use the official Reactome release for future pathway analyses. It would be great if GSEA/MSigDB team could update their Reactome soon.
Finally, if you include Reactome data in your publications, please cite them so they can continue their awesome work.