Uploading data to GEO - which method is faster?

If you have had to upload omics data to GEO before, you'll know it's a bit of a hassle and takes a long time. There are a few methods suggested by the GEO team if you are using the Unix command line:

 Using 'ncftp'

set passive on
set so-bufsize 33554432
open ftp://geoftp:yourpasscode@ftp-private.ncbi.nlm.nih.gov
cd uploads/your@mail.com_yourfolder
put -R Folder_with_submission_files

Using 'lftp'

lftp ftp://geoftp:yourpasscode@ftp-private.ncbi.nlm.nih.gov
cd uploads/your@mail.com_yourfolder
mirror -R Folder_with_submission_files

Using 'sftp' 

(expect slower transfer speeds since this method encrypts on-the-fly)

sftp geoftp@sftp-private.ncbi.nlm.nih.gov
password: yourpasscode
cd uploads/your@mail.com_yourfolder
mkdir new_geo_submission
cd new_geo_submission
put file_name

Using 'ncftpput' 

(transfers from the command-line without entering an interactive shell)
Usage example:

ncftpput -F -R -z -u geoftp -p "yourpasscode" ftp-private.ncbi.nlm.nih.gov ./uploads/your@mail.com_yourfolder ./local_dir_path

local_dir_path: path to the local submission directory you are transferring to your personalized upload space

-F to use passive (PASV) data connection
-z is for resuming upload if a file upload gets interrupted
-R to recursively upload an entire directory/tree

The speed test

So I did a small speed test to see which was the fastest of the first 3 options.

For those of you uploding seq data to NCBI, you would have noticed that GEO has 3 recommended upload tools from unix servers ncftp, lftp and sftp. There is a big speed difference, at least for us folks based in Australia: ncftp 364.7 kB/s lftp 11.83 MB/s sftp 8.6 MB/s

lftp is the winner - hopefully this will save you a few minutes!


Popular posts from this blog

EdgeR or DESeq2? Comparing the performance of differential expression tools

Data analysis step 8: Pathway analysis with GSEA

Installing R-4.0 on Ubuntu 18.04 painlessly