Posts

Showing posts with the label Bowtie2

microRNA aligners compared

Image
Alignment of microRNA to the genome poses a particular challenge because the reads are short, and some microRNAs are nearly identical. Moreover, microRNAs themselves are subject to RNA editing (adenine-to-inosine conversion, non-templated base addition) and normal sequencing error rates. In this post, I'm going to test the performance of several aligners in aligning microRNA reads to the Arabidopsis genome. 
I downloaded the Arabidopsis genome from Ensembl plant and the latest miRbase release version 21. I used bowtie2 to align the 325 full-length hairpin transcripts to the Arabidopsis genome. I generated pseudo microRNA reads that uniformly cover the hairpin transcript at a range of lengths from 16 nt to 25 nt. I then aligned the reads to the Arabidopsis genome using these different aligners with the default settings. I then used bedtools and awk to count the correctly and incorrectly mapped reads at a mapQ threshold of 10. 
Firstly note that there were no alignment with either …

DNA aligner accuracy: BWA, Bowtie, Soap and SubRead tested with simulated reads

Image
In the past few posts, we've looked at RNA-seq aligner performance in terms of accuracy and speed. In this post, I'll take a look at the accuracy of DNA aligners using simulated reads.

The first step is to download the genome of interest. I'm using Arabidopsis as its pretty small and good for quick benchmarking. I downloaded the genome from Ensembl plant ftp site (link).

Next step was to generate simulated reads that uniformly cover the genome at user-selected length and intervals. I couldn't find any previously made simulators to do exactly that, so I chained together some awk and bedtools commands (code at the bottom of the post) to generate the reads. I generated pseudo reads of 50, 100 & 200 bp in length at 10 bp intervals and output them in fasta format. Here is an example.

>1-0-50
CCCTAAACCCTAAACCCTAAACCCTAAACCTCTGAATCCTTAATCCCTAA
>1-10-60
TAAACCCTAAACCCTAAACCTCTGAATCCTTAATCCCTAAATCCCTAAAT
>1-20-70
ACCCTAAACCTCTGAATCCTTAATCCCTAAATCCCTAAATCTTTAAATCC

T…

Bowtie alignment to a genome - single end

Recently I posted about BWA alignment of NGS data, so today I will demonstrate the use of Bowtie2 for aligning single end sequence data, for more information on advanced options, take a thorough look at the manpage. One of major strengths of Bowtie2, is that it's more tolerant for gaps than Bowtie1 and  BWA. Again you will need to have the reference genome and the raw fastq files in the otherwise empty directory.


The first thing we do is stipulate the name of the experiment and the name of the reference sequence.
EXID=Experimentname
REF=Hg19.fa Now index the genome with bowtie2-build:

bowtie2-build Hg19.fa Hg19.fa Now to actually start the alignment for all the files with prefix "FC" and suffix "fq":
for FQ in `ls FC*fq | sed 's/.fq//'`
do
bowtie2 -x $REF -U ${FQ}.fq -S ${FQ}.sam -p 4 --met-file ${FQ}.met 2> ${FQ}.stats &
done
wait The script starts aligning all fq files to the human genome in parallel and each job uses 4 threads as specified by the &…