Posts

Showing posts with the label Kallisto

Using paired end read concordance to determine accuracy of RNA-seq pipelines STAR/featureCounts and Kallisto

This post is a collection of code used to compare accuracy of Kallisto and STAR/featureCounts in my previous post.
Download from SRA and convert to fastq formatCheck my older post. __________________________________________________________________

Run skewer to quality trim the data.
$ cat run_skewer.sh #!/bin/bash for FQZ1 in *_1.fastq.gz ; do FQZ2=`echo $FQZ1 | sed 's/_1.fastq.gz/_2.fastq.gz/'` skewer -t 8 -q 20 $FQZ1 $FQZ2 done __________________________________________________________________

Run bfc to correct errors.
$ cat run_bfc.sh
#!/bin/bash
for FQ in *.fastq.gz ; do
OUT=`echo $FQ | sed 's/.fastq.gz$/_bfc.fastq.gz/'`
bfc -t 8 $FQ | pigz > $OUT &
done
wait __________________________________________________________________
Run STAR aligner to align reads to human genome in single end mode.

$ cat run_star.sh
#!/bin/bash
DIR=/refgenome_hsapiens/
GTF=refgenome_hsapiens/Homo_sapiens.GRCh38.78.gtf
for FQZ in `ls *gz` ; do
FQ=`echo $FQZ | sed 's/.gz//'`
pigz -dkf …

How accurate is Kallisto?

Image
One of the most interesting developments in RNA-seq informatics in the past year or so is the evolution of so-called "lightweight" analysis. Instead of trying to map a whole read to the reference exons, it may be quicker and just as accurate to simply compare the k-mer content of the read and reference. If the read and reference share enough unique kmers, then the aligner can stop analysing that read and can move on the the next one. It turns out that the speed-up with this approach is huge, when compared with previous methods. Kallisto especially appears to have some speed advantages over competitors such as Sailfish, Salmon, eXpress, etc. For those interested in learning more, please refer to the Kallisto pre-print in arXiv, as well as the following blog posts.
nextgenseek.comrobpatro.comliorpachter.wordpress.comsjcockell.me
For this post, I wanted to compare the accuracy of Kallisto with STAR/featureCounts (mapQ20 threshold), the pipeline that I'm currently using in mo…