Uploading data to GEO - which method is faster?
If you have had to upload omics data to GEO before, you'll know it's a bit of a hassle and takes a long time. There are a few methods suggested by the GEO team if you are using the Unix command line:
Using 'ncftp'
ncftp
set passive on
set so-bufsize 33554432
open ftp://geoftp:yourpasscode@ftp-private.ncbi.nlm.nih.gov
cd uploads/your@mail.com_yourfolder
put -R Folder_with_submission_files
Using 'lftp'
lftp ftp://geoftp:yourpasscode@ftp-private.ncbi.nlm.nih.gov
cd uploads/your@mail.com_yourfolder
mirror -R Folder_with_submission_files
Using 'sftp'
(expect slower transfer speeds since this method encrypts on-the-fly)
sftp geoftp@sftp-private.ncbi.nlm.nih.gov
password: yourpasscode
cd uploads/your@mail.com_yourfolder
mkdir new_geo_submission
cd new_geo_submission
put file_name
Using 'ncftpput'
(transfers from the command-line without entering an interactive shell)
Usage example:
ncftpput -F -R -z -u geoftp -p "yourpasscode" ftp-private.ncbi.nlm.nih.gov ./uploads/your@mail.com_yourfolder ./local_dir_path
local_dir_path: path to the local submission directory you are transferring to your personalized upload space
-F to use passive (PASV) data connection
-z is for resuming upload if a file upload gets interrupted
-R to recursively upload an entire directory/tree
-z is for resuming upload if a file upload gets interrupted
-R to recursively upload an entire directory/tree
The speed test
For those of you uploding seq data to NCBI, you would have noticed that GEO has 3 recommended upload tools from unix servers ncftp, lftp and sftp. There is a big speed difference, at least for us folks based in Australia:
- ncftp 364.7 kB/s
- lftp 11.83 MB/s
- sftp 8.6 MB/s
lftp is the winner - hopefully this will save you a few minutes!