Counting the number of reads that align to certain genomic features is a key element of many next gen sequencing analysis pipelines. For RNA-seq, this is commonly used to count reads aligning to exons, while for ChIP-seq this is used to count reads over a promoter or other region of interest. There are several available tools for performing this task and in this post I will compare the three of the most commonly used:
I took one of the bam files from the recent
RNA-seq series of posts and subsampled it using
samtools and
shuf into file sizes of 1M, 2M, 5M, 10M reads, as well as the bam file containing 25.4M reads.
I then used the benchmarking script described in the previous
post to record execution time, CPU usage and peak memory for read counting to generate a gene-wise matrix. I used featureCounts in the single thread mode as well as the parallel mode (maximum of 8 cores).
|
Execution time for read counting by featureCounts, BedTools Multicov and HTSeq-count for bam files of varying sizes. |
|
Peak memory usage for read counting by featureCounts, BedTools Multicov and HTSeq-count for bam files of varying sizes. |
|
|
CPU utilisation for read counting by featureCounts, BedTools Multicov and HTSeq-count for bam files of varying sizes. |
|
The results show that featureCounts is about 10 times faster than BedTools Multicov and about 18 times faster than HTSeq-count when using a single thread, and when allowing parallel processing, this became 20 times and 37 times respectively.
Then I broke down the overall times into that invested in (1) reading in the annotation file and (2) processing the reads.
|
Time taken for featureCounts, BedTools Multicov and HTSeq to read in the annotation feature (upper panel) file and process 1 million reads (lower panel).
|
|
This result shows that featureCounts is considerably faster at both tasks than its competitors.
Exact commands used:
htseq-count -q -f bam --stranded=no --minaqual=10 --type=gene --mode=union test.bam Homo_sapiens.GRCh38.76.gtf > test.bam.htseq.count
bedtools multicov -D -q 10 -bams test.bam -bed Homo_sapiens.GRCh38_exon.bed > test.bam.bedtools.cnt
featureCounts -Q 10 -M -s 0 -T 1 -a Homo_sapiens.GRCh38.76.gtf -o test.bam.featureCount.cnt test.bam
featureCounts -Q 10 -M -s 0 -T 8 -a Homo_sapiens.GRCh38.76.gtf -o test.bam.featureCount.cnt test.bam