MSigDB gene sets for mouse

I recently needed to convert MSigDB gene sets to mouse so I thought I would share.

GO.v5.2.symbols_mouse.gmt
kegg.v5.2.symbols_mouse.gmt
msigdb.v5.2.symbols_mouse.gmt
reactome.v5.2.symbols_mouse.gmt

Below is the code used to do the conversion. It requires an input GMT file of human gene symbols as well as a human-mouse orthology file. You can download the ortholog file here. As the name suggests, it is based on data downloaded from Ensembl Biomart version 87.

Running the program converts all human gmt files. It requres gnu parallel which can be easily installed on Ubuntu with "sudo apt-get install parallel"


#!/bin/bash

conv(){
line=$1
  NAME_DESC=`echo $line | cut -d ' ' -f-2`

  GENES=`echo $line | cut -d ' ' -f3- \
  | tr ' ' '\n' | sort -uk 1b,1 \
  | join -1 1 -2 1 - \
  <(cut -f3,5 mouse2hum_biomart_ens87.txt \
  | sed 1d | awk '$1!="" && $2!=""' \
  | sort -uk 1b,1) | cut -d ' ' -f2 \
  | sort -u | tr '\n' '\t' \
  | sed 's/\t$/\n/'`

  echo $NAME_DESC $GENES | tr ' ' '\t'
}
export -f conv
for GMT in `ls *gmt | grep -v mouse ` ; do
  NAME=`echo $GMT | sed 's/.gmt/_mouse.gmt/'`
  parallel -k conv < $GMT > $NAME
done

exit

Popular posts from this blog

Data analysis step 8: Pathway analysis with GSEA

A selection of useful bash one-liners

HISAT vs STAR vs TopHat2 vs Olego vs SubJunc