Posts

Showing posts with the label GEO2R

Comparing expression profiles

Image
One of the most common tasks in gene expression analysis is to compare different profiling experiments. There are three main strategies:
Compare all data points - using a correlation analysisCompare sets of up and down-regulated genes - using a binomial or Fisher exact testCompare sets of genes within a profile - such as GSEA test In this post, I'll describe how correlation analysis is used between expression data sets of all detected genes.

Merging data sets No matter what type of correlation used, the profiling data sets need to be merged. This means selecting a field that can the datasets can be merged on. This could be a array probe ID, gene accession number or gene symbol as in this case. I will compare gene expression profiles from two experiments (azacitidine in human and mouse cells). The human gene profile was generated by RNA-seq and the mouse data set by microarray.

The human data is currently in CSV format from Degust  and looks like this:

gene,c,aza,FDR,C1,C2,C3,A1,A2,A…

How to generate a rank file from gene expression data

Turning a gene expression profile into a ranked list is useful for comparing with other profiling data sets as well as an input for preranked GSEA analysis (example here). In this post, I describe a simple bash script called rnkgen.sh that can take gene expression data from a range of sources, such as edgeR, DESeq, GEO2R, etc., and generate a ranked list of genes from most up-expressed to most down-expressed based on the p-value.

#!/bin/bash
#rnkgen.sh converts a differential gene expression spreadsheet (XLS) into a rank file (RNK)

#Specify the input file
XLS=$1
#Specify the gene ID column
ID=$2
#Specify the fold change value column
FC=$3
#Specify the raw p-value column
P=$4

sed 1d $XLS | tr -d '"' \
| awk -v I=$ID -v F=$FC -v P=$P '{FS="\t"} $I!="" {print $I, $F, $P}' \
| awk '$2!="NA" && $3!="NA"' \
| awk '{s=1} $2<0{s=-1} {print $1"\t"s*-1*log($3)/log(10)}' \
| awk '{print $1,$2,$2*…

Interspecies gene name conversion

Image
In this post, I'll provide a step-by-step guide to perform interspecies gene name conversion of gene expression data. This is a necessary step in the comparison of profiling data from two different experiments with different species (human and mouse), and allows us to use extensive human-centric gene set libraries in MSigDB when analysing non-human mammalian profiling data (such as mouse).

I performed GEO2R analysis of mouse expression data (GSE30192) to analyse the effect of azacitidine on mouse C2C12 myoblasts. The data looks like this:
"ID""adj.P.Val""P.Value""t""B""logFC""Gene.symbol""Gene.title"
"1420647_a_at" "0.000346" "2.24e-08" "56.073665" "8.699524" "6.9755573" "Krt8" "keratin 8"
"1423327_at" "0.000346" "2.32e-08" "55.685912" "8.686447" "3.8096523" "Rpl…