Screening RNA-seq data for novel transcripts with samtools mpileup and UCSC genome browser
The unbiased nature of deep transcript sequencing makes it the ideal technology to discover novel uncharacterised genes. Lets screen our favourite RNA-seq experiment (azacitidine-treated AML3 human cells, GSE55123 ) for novel expressed genes. I use Ensembl gene annotations. We'll start with preparing bed files of exons and gene bodies grep -w exon Homo_sapiens.GRCh38.76.gtf | tr '" ' '\t' \ | cut -f1,4,5,11 | uniq > Homo_sapiens.GRCh38.76_exons.bed grep -w gene Homo_sapiens.GRCh38.76.gtf | tr '" ' '\t' \ | cut -f1,4,5,11 > Homo_sapiens.GRCh38.76_genes.bed Now for convenience, I'll merge the data from the three replicates with samtools. samtools view -H SRR1171523_1.fastq.sort.bam > header.txt samtools merge -h header.txt Ctrl.bam SRR1171523_1.bam SRR1171523_2.bam SRR1171524_1.bam SRR1171524_2.bam SRR1171525_1.bam SRR1171525_2.bam samtools merge -h header.txt Aza.bam SRR1171526_1.bam SRR1171526_2.bam SRR1171527