Data analysis step 2: quality control of RNA-seq data

September 08, 2014

In a previous post, we downloaded RNA-seq data from GEO (GSE55123) . Lets continue with the processing of this data by performing QC analysis. In a previous post, I went into a bit more detail, but here we will simply use fastx_quality_stats from the fastx toolkit to have a look at quality scores among the data sets.

The general strategy was to unzip the data on the fly, convert to tabular format and then select a random 1 million sequences and then submit these to fastx_quality_stats. So having a look through the output file, shows very high quality scores with median scores >36 which suggests this dataset is very high quality. Below see the code used and a graph of median quality scores throughout the run.

done | tee quality_analysis.txt

Mean cycle base quality for GSE55123 RNA-seq data shows very high quality sequence reads.

Search This Blog

Genome Spot

Data analysis step 2: quality control of RNA-seq data

Popular posts from this blog

Data analysis step 8: Pathway analysis with GSEA

Uploading data to GEO - which method is faster?

Generate an RNA-seq count matrix with featureCounts