MSigDB gene sets for mouse
I recently needed to convert MSigDB gene sets to mouse so I thought I would share.
GO.v5.2.symbols_mouse.gmt
kegg.v5.2.symbols_mouse.gmt
msigdb.v5.2.symbols_mouse.gmt
reactome.v5.2.symbols_mouse.gmt
Below is the code used to do the conversion. It requires an input GMT file of human gene symbols as well as a human-mouse orthology file. You can download the ortholog file here. As the name suggests, it is based on data downloaded from Ensembl Biomart version 87.
Running the program converts all human gmt files. It requres gnu parallel which can be easily installed on Ubuntu with "sudo apt-get install parallel"
GO.v5.2.symbols_mouse.gmt
kegg.v5.2.symbols_mouse.gmt
msigdb.v5.2.symbols_mouse.gmt
reactome.v5.2.symbols_mouse.gmt
Below is the code used to do the conversion. It requires an input GMT file of human gene symbols as well as a human-mouse orthology file. You can download the ortholog file here. As the name suggests, it is based on data downloaded from Ensembl Biomart version 87.
Running the program converts all human gmt files. It requres gnu parallel which can be easily installed on Ubuntu with "sudo apt-get install parallel"
#!/bin/bash
conv(){
line=$1
NAME_DESC=`echo $line | cut -d ' ' -f-2`
GENES=`echo $line | cut -d ' ' -f3- \
| tr ' ' '\n' | sort -uk 1b,1 \
| join -1 1 -2 1 - \
<(cut -f3,5 mouse2hum_biomart_ens87.txt \
| sed 1d | awk '$1!="" && $2!=""' \
| sort -uk 1b,1) | cut -d ' ' -f2 \
| sort -u | tr '\n' '\t' \
| sed 's/\t$/\n/'`
echo $NAME_DESC $GENES | tr ' ' '\t'
}
export -f conv
for GMT in `ls *gmt | grep -v mouse ` ; do
NAME=`echo $GMT | sed 's/.gmt/_mouse.gmt/'`
sed -i 's@ @-@g' $GMT
parallel -k conv < $GMT > $NAME
done
exit