Download SRA data with Aspera command line utility



Aspera connect is NCBI's recommended data transfer client for large datasets >1GB. It uses the FASP protocol, here's a description from the NCBI guide.
"The FASP protocol from Aspera (www.asperasoft.com<http://www.asperasoft.com<UrlBlockedError.aspx>>) uses UDP, eliminating the latency issues seen with TCP, and provides bandwidth up to 1 gigabit per second (Gbps) to transfer data. It has a restart capability if data transfer is interrupted midstream and is well behaved, so if there is other data traffic on your network connections, it will back off in order to avoid starving other protocols. We have seen effective throughput up to 800 megabits per second (Mbps) to a single site.
The fasp protocol uses UDP port 33001-33009 for data transfer and you may need to contact your IT security staff if this port is not open to NCBI through your institutional firewalls.
NCBI is implementing Aspera for two use cases, occasional users who download files for direct use (Aspera Connect), and bulk users who will be downloading large amounts of data (ascp)."
NCBI recommends getting the Aspera transfer plugin for your browser from the website, but this is little help to power users who have large command line/programmatic job queues. You can download the stand-alone Unix utility using the following command (h/t to Matt Shirley @ BioStars).

sh <(curl -s http://demo.asperasoft.com/ascp-install-3.5.4.102989-linux-64.sh)
Or if that fails

wget http://demo.asperasoft.com/ascp-install-3.5.4.102989-linux-64.sh
chmod +x ascp-install-3.5.4.102989-linux-64.sh
./ascp-install-3.5.4.102989-linux-64.sh


Here is the usage message:

Usage: ascp [OPTION] SRC... DEST
          SRC to DEST, or multiple SRC to DEST dir
          SRC, DEST format: [[user@]host:]PATH



Then you can download SRA files with ease using a command such as this

ascp -l 1000m -O 33001 -T -i ~/.aspera/connect/etc/asperaweb_id_dsa.openssh anonftp@ftp.ncbi.nlm.nih.gov:/sra/sra-instant/reads/ByExp/sra/SRX/SRX306/SRX306097/SRR901180/SRR901180.sra /download/SRR901180.sra


-l MAX-RATE                     Max transfer rate
-O FASP-PORT                    UDP port used for FASP transport
-T                              Disable encryption
-i PRIVATE-KEY-FILE             Private-key file name (id_rsa)

Keep in mind that the Private-key file name may be different on your system.

On my fibre connection, I reached 370 MB/s, which is 20 fold better than my previous implementation of ftp using axel.

Popular posts from this blog

Data analysis step 8: Pathway analysis with GSEA

Two subtle problems with over-representation analysis

Uploading data to GEO - which method is faster?