Posts

Showing posts with the label UCSC genome browser

Screening RNA-seq data for novel transcripts with samtools mpileup and UCSC genome browser

Image
The unbiased nature of deep transcript sequencing makes it the ideal technology to discover novel uncharacterised genes. Lets screen our favourite RNA-seq experiment (azacitidine-treated AML3 human cells, GSE55123) for novel expressed genes. I use Ensembl gene annotations.

We'll start with preparing bed files of exons and gene bodies

grep -w exon Homo_sapiens.GRCh38.76.gtf | tr '" ' '\t' \
| cut -f1,4,5,11 | uniq > Homo_sapiens.GRCh38.76_exons.bed

grep -w gene Homo_sapiens.GRCh38.76.gtf | tr '" ' '\t' \
| cut -f1,4,5,11 > Homo_sapiens.GRCh38.76_genes.bed

Now for convenience, I'll merge the data from the three replicates with samtools.

samtools view -H SRR1171523_1.fastq.sort.bam > header.txt

samtools merge -h header.txt Ctrl.bam SRR1171523_1.bam SRR1171523_2.bam SRR1171524_1.bam SRR1171524_2.bam SRR1171525_1.bam SRR1171525_2.bam

samtools merge -h header.txt Aza.bam SRR1171526_1.bam SRR1171526_2.bam SRR1171527_1.bam SRR1171527_2.bam S…