Using paired end read concordance to determine accuracy of RNA-seq pipelines STAR/featureCounts and Kallisto
This post is a collection of code used to compare accuracy of Kallisto and STAR/featureCounts in my previous post.
Download from SRA and convert to fastq formatCheck my older post. __________________________________________________________________
Run skewer to quality trim the data.
$ cat run_skewer.sh #!/bin/bash for FQZ1 in *_1.fastq.gz ; do FQZ2=`echo $FQZ1 | sed 's/_1.fastq.gz/_2.fastq.gz/'` skewer -t 8 -q 20 $FQZ1 $FQZ2 done __________________________________________________________________
Run bfc to correct errors.
$ cat run_bfc.sh
#!/bin/bash
for FQ in *.fastq.gz ; do
OUT=`echo $FQ | sed 's/.fastq.gz$/_bfc.fastq.gz/'`
bfc -t 8 $FQ | pigz > $OUT &
done
wait __________________________________________________________________
Run STAR aligner to align reads to human genome in single end mode.
$ cat run_star.sh
#!/bin/bash
DIR=/refgenome_hsapiens/
GTF=refgenome_hsapiens/Homo_sapiens.GRCh38.78.gtf
for FQZ in `ls *gz` ; do
FQ=`echo $FQZ | sed 's/.gz//'`
pigz -dkf …
Download from SRA and convert to fastq formatCheck my older post. __________________________________________________________________
Run skewer to quality trim the data.
$ cat run_skewer.sh #!/bin/bash for FQZ1 in *_1.fastq.gz ; do FQZ2=`echo $FQZ1 | sed 's/_1.fastq.gz/_2.fastq.gz/'` skewer -t 8 -q 20 $FQZ1 $FQZ2 done __________________________________________________________________
Run bfc to correct errors.
$ cat run_bfc.sh
#!/bin/bash
for FQ in *.fastq.gz ; do
OUT=`echo $FQ | sed 's/.fastq.gz$/_bfc.fastq.gz/'`
bfc -t 8 $FQ | pigz > $OUT &
done
wait __________________________________________________________________
Run STAR aligner to align reads to human genome in single end mode.
$ cat run_star.sh
#!/bin/bash
DIR=/refgenome_hsapiens/
GTF=refgenome_hsapiens/Homo_sapiens.GRCh38.78.gtf
for FQZ in `ls *gz` ; do
FQ=`echo $FQZ | sed 's/.gz//'`
pigz -dkf …