Data analysis step 1: download from GEO and convert to fastq
In this post we will be downloading human RNA-seq data from GEO accession GSE55123 . Now you would have thought that this would be easy, but you have to understand that the data we download from GEO is in NCBI's short read archive format (SRA). To unpack the original sequence files can be a bit tricky at first, even the size of the SRA toolkit manual is enough to make you cringe. So start (in linux) by making a text file containing all the SRA file links fron the NCBI ftp site. Let's call it "url.txt". http://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP038/SRP038101/SRR1171523/SRR1171523.sra http://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP038/SRP038101/SRR1171524/SRR1171524.sra http://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP038/SRP038101/SRR1171525/SRR1171525.sra http://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP038/SRP038101/SRR1171526/SRR1171526.sra http:...