Showing posts from July, 2013

Align Next Gen Sequencing Data Like A Boss

We know that sequencing platforms are generating more and more output, and more of this data is finding itself in online databanks. The volume of data is a blessing because it is so information rich, but it is also increasingly becoming a problem - we need more machine time to process it.

I was recently aligning dozens of mouse encode RNA-seq data sets while we had a server fault. I was left with a mess of half finished jobs and intermediate files and basically had to start the whole process again. This prompted me to make an alignment pipeline that would be more resistant to unexpected interruptions. To do this, I took inspiration from a post at SEQanswers which piped the output from the BWA aligner and generated bam files without any intermediate files.

But I wanted to take this a bit further, by going from compressed fastq to bam file without writing any intermediate files, doing quality trimming on the fly and saving a lot of storage in the process. So here are two examples, one f…