Posts

Showing posts with the label HPG aligner

RNA-seq aligners: Subread, STAR, HPG aligner and Olego | PART 2

Image
In the previous post, I compared the speed of several RNA aligners. In this post, we'll take a closer look at the results generated by these different aligners; Subread/Subjunc, STAR, HPG aligner and Olego. In addition, we will use BWA MEM as an example of what happens when you use a DNA aligner to analyse RNA-seq. For the test, we've been using 101 bp Arabidopsis mRNA-seq data from GEO accession GSE42968. For simplicity, I use only the forward read in the paired-end dataset. I used fastx-toolkit to remove low quality bases from the 3' end (base qual threshold of 20).

One of the simplest ways to assess the quality of an alignment is to determine the proportion of reads that are mapped to the genome and the proportion that map to exons. After the aligners did their work, I used featureCounts to quantify the number of aligned reads with a mapping quality >10. Here is the data for the first sample in the series, SRR634969 which contained 14.5 million reads.

The first thing…

RNA-seq aligners: Subread, STAR, HPG aligner and Olego

Image
In this post I compare the speed of Subread/Subjunc, STAR, HPG aligner and Olego. I will use real RNA-seq data from GEO accession GSE42968 and align to the Arabidopsis thaliana genome. All code used is found at the bottom of this post. The benchmarking was performed on a standard 8 core workstation with 8 GB RAM.

Index the Arabidopsis thaliana genome The first step in any RNA-seq analysis is to prepare the genome for alignment. Subread was the quickest.
Align RNA-seq reads I down sampled the sequence (SRA accesion:SRR634969) file to generate files containing 1M, 2M, 5M, 10M and 14M reads selected randomly. Fastq quality trimmer was used to trim low quality bases from the 3' end and filter low quality reads. Then I used Olego, Subread, Subjunc, STAR and HPG aligner to align the reads. (I wanted to test HPG aligner in both the BWT and SA modes but didn't have the 17 GB RAM to test SA algorithm sadly.) In all the below cases the sequence files were uncompressed fastq (101 bp read…