Uploading data to GEO - which method is faster?

If you have had to upload omics data to GEO before, you'll know it's a bit of a hassle and takes a long time. There are a few methods suggested by the GEO team if you are using the Unix command line:

Using 'ncftp'


ncftp
set passive on
set so-bufsize 33554432
open ftp://geoftp:yourpasscode@ftp-private.ncbi.nlm.nih.gov
cd uploads/your@mail.com_yourfolder
put -R Folder_with_submission_files

Using 'lftp'


lftp ftp://geoftp:yourpasscode@ftp-private.ncbi.nlm.nih.gov
cd uploads/your@mail.com_yourfolder
mirror -R Folder_with_submission_files

Using 'sftp'


(expect slower transfer speeds since this method encrypts on-the-fly)


sftp geoftp@sftp-private.ncbi.nlm.nih.gov
password: yourpasscode
cd uploads/your@mail.com_yourfolder
mkdir new_geo_submission
cd new_geo_submission
put file_name

Using 'ncftpput'


(transfers from the command-line without entering an interactive shell)
Usage example:

ncftpput -F -R -z -u geoftp -p "yourpasscode" ftp-private.ncbi.nlm.nih.gov ./uploads/your@mail.com_yourfolder ./local_dir_path

local_dir_path: path to the local submission directory you are transferring to your personalized upload space

-F to use passive (PASV) data connection
-z is for resuming upload if a file upload gets interrupted
-R to recursively upload an entire directory/tree


The speed test


For those of you uploding seq data to NCBI, you would have noticed that GEO has 3 recommended upload tools from unix servers ncftp, lftp and sftp. There is a big speed difference, at least for us folks based in Australia:
  • ncftp 364.7 kB/s
  • lftp 11.83 MB/s
  • sftp 8.6 MB/s

lftp is the winner - hopefully this will save you a few minutes!

Popular posts from this blog

Two subtle problems with over-representation analysis

Data analysis step 8: Pathway analysis with GSEA