HISAT vs STAR vs TopHat2 vs Olego vs SubJunc
So I thought I'd give it a test run with some simulated data to check its accuracy compared to other aligners.
I generated synthetic 100bp reads based on Arabidopsis cDNAs and then incorporated mutations with msbar.
I then aligned these reads to the Arabidopsis genome with default settings and counted (featureCounts) the number of correctly and incorrectly assigned reads with a mapping quality of 20.
Here is the result for unmutated (perfect) 100 bp reads.
![]() |
| Table 1. Alignment of simulated 100 bp cDNA sequences to the Arabidopsis genome. |
Now here are the results of the mutation experiment.
![]() |
| Table 2. Results of the mapping mutated 100bp reads to the Arabidopsis genome (data shown in Fig1) |
==> test_hisat.sh <==
#!/bin/bashX=Arabidopsis_thaliana.TAIR10.23.dna.genome_fmt.fafor FQZ in *fasta.gz ; doFQ=`echo $FQZ | sed 's/.gz//'`pigz -dk $FQZhisat -p 8 -f -x $X -U $FQ \| samtools view -uSh - \| samtools sort - ${FQ}_hisat.sortrm $FQdone
==> test_olego.sh <==#!/bin/bashIDX=Arabidopsis_thaliana.TAIR10.23.dna_sm.genome.fafor FQ in *.fasta.gzdopigz -dc $FQ \|olego -t 8 $IDX $FQ \| samtools view -uSh - \| samtools sort - ${FQ}_olego.sortdone
==> test_STAR.sh <==IDX=arabidopsis/for FQZ in *.fasta.gzdopigz -fdk $FQZFQ=`echo $FQZ | sed 's/.gz//'`STAR --readFilesIn $FQ --genomeLoad LoadAndKeep \--genomeDir $IDX --runThreadN 8 \mv Aligned.out.sam ${FQ}.STAR.sam(SAM=${FQ}.STAR.samsamtools view -uSh $SAM \| samtools sort - ${SAM}.sortrm $SAM )&rm $FQdonewait
==> test_subjunc.sh <==#!/bin/bashIDX=Arabidopsis_thaliana.TAIR10.23.dna_sm.genomefor FQZ in *.fasta.gzdopigz -fdk $FQZFQ=`echo $FQZ | sed 's/.gz//'`subjunc -T 8 -i $IDX \-r $FQ -o ${FQ}.subjunc.sam(SAM=${FQ}.subjunc.samsamtools view -uSh $SAM \| samtools sort - ${SAM}.sortrm $FQ ${FQ}.subjunc.sam )&donewait
==> test_tophat.sh <==#!/bin/bashIDX=Arabidopsis_thaliana.TAIR10.23.dna.genome_fmtfor FQZ in *fasta.gz ; doFQ=`echo $FQZ | sed 's/.gz//'`pigz -dk $FQZtophat2 -p 8 -o ${FQ}.tophat $IDX $FQrm $FQdone


