HISAT vs STAR vs TopHat2 vs Olego vs SubJunc
So I thought I'd give it a test run with some simulated data to check its accuracy compared to other aligners.
I generated synthetic 100bp reads based on Arabidopsis cDNAs and then incorporated mutations with msbar.
I then aligned these reads to the Arabidopsis genome with default settings and counted (featureCounts) the number of correctly and incorrectly assigned reads with a mapping quality of 20.
Here is the result for unmutated (perfect) 100 bp reads.
Table 1. Alignment of simulated 100 bp cDNA sequences to the Arabidopsis genome. |
Now here are the results of the mutation experiment.
Table 2. Results of the mapping mutated 100bp reads to the Arabidopsis genome (data shown in Fig1) |
==> test_hisat.sh <==
#!/bin/bash
X=Arabidopsis_thaliana.TAIR10.23.dna.genome_fmt.fa
for FQZ in *fasta.gz ; do
FQ=`echo $FQZ | sed 's/.gz//'`
pigz -dk $FQZ
hisat -p 8 -f -x $X -U $FQ \
| samtools view -uSh - \
| samtools sort - ${FQ}_hisat.sort
rm $FQ
done
==> test_olego.sh <==
#!/bin/bash
IDX=Arabidopsis_thaliana.TAIR10.23.dna_sm.genome.fa
for FQ in *.fasta.gz
do
pigz -dc $FQ \
|olego -t 8 $IDX $FQ \
| samtools view -uSh - \
| samtools sort - ${FQ}_olego.sort
done
==> test_STAR.sh <==
IDX=arabidopsis/
for FQZ in *.fasta.gz
do
pigz -fdk $FQZ
FQ=`echo $FQZ | sed 's/.gz//'`
STAR --readFilesIn $FQ --genomeLoad LoadAndKeep \
--genomeDir $IDX
--runThreadN 8 \mv Aligned.out.sam ${FQ}.STAR.sam
(SAM=${FQ}.STAR.sam
samtools view -uSh $SAM \
| samtools sort - ${SAM}.sort
rm $SAM )&
rm $FQ
done
wait
==> test_subjunc.sh <==
#!/bin/bash
IDX=Arabidopsis_thaliana.TAIR10.23.dna_sm.genome
for FQZ in *.fasta.gz
do
pigz -fdk $FQZ
FQ=`echo $FQZ | sed 's/.gz//'`
subjunc -T 8 -i $IDX \
-r $FQ -o ${FQ}.subjunc.sam
(SAM=${FQ}.subjunc.sam
samtools view -uSh $SAM \
| samtools sort - ${SAM}.sort
rm $FQ ${FQ}.subjunc.sam )&
done
wait
==> test_tophat.sh <==
#!/bin/bash
IDX=Arabidopsis_thaliana.TAIR10.23.dna.genome_fmt
for FQZ in *fasta.gz ; do
FQ=`echo $FQZ | sed 's/.gz//'`
pigz -dk $FQZ
tophat2 -p 8 -o ${FQ}.tophat $IDX $FQ
rm $FQ
done