Screening RNA-seq data for novel transcripts with samtools mpileup and UCSC genome browser
The unbiased nature of deep transcript sequencing makes it the ideal technology to discover novel uncharacterised genes. Lets screen our favourite RNA-seq experiment (azacitidine-treated AML3 human cells, GSE55123) for novel expressed genes. I use Ensembl gene annotations.
We'll start with preparing bed files of exons and gene bodies
grep -w exon Homo_sapiens.GRCh38.76.gtf | tr '" ' '\t' \
| cut -f1,4,5,11 | uniq > Homo_sapiens.GRCh38.76_exons.bed
grep -w gene Homo_sapiens.GRCh38.76.gtf | tr '" ' '\t' \
| cut -f1,4,5,11 > Homo_sapiens.GRCh38.76_genes.bed
Now for convenience, I'll merge the data from the three replicates with samtools.
samtools view -H SRR1171523_1.fastq.sort.bam > header.txt
samtools merge -h header.txt Ctrl.bam SRR1171523_1.bam SRR1171523_2.bam SRR1171524_1.bam SRR1171524_2.bam SRR1171525_1.bam SRR1171525_2.bam
samtools merge -h header.txt Aza.bam SRR1171526_1.bam SRR1171526_2.bam SRR1171527_1.bam SRR1171527_2.bam S…
We'll start with preparing bed files of exons and gene bodies
grep -w exon Homo_sapiens.GRCh38.76.gtf | tr '" ' '\t' \
| cut -f1,4,5,11 | uniq > Homo_sapiens.GRCh38.76_exons.bed
grep -w gene Homo_sapiens.GRCh38.76.gtf | tr '" ' '\t' \
| cut -f1,4,5,11 > Homo_sapiens.GRCh38.76_genes.bed
Now for convenience, I'll merge the data from the three replicates with samtools.
samtools view -H SRR1171523_1.fastq.sort.bam > header.txt
samtools merge -h header.txt Ctrl.bam SRR1171523_1.bam SRR1171523_2.bam SRR1171524_1.bam SRR1171524_2.bam SRR1171525_1.bam SRR1171525_2.bam
samtools merge -h header.txt Aza.bam SRR1171526_1.bam SRR1171526_2.bam SRR1171527_1.bam SRR1171527_2.bam S…