What? You're not using parallel compression yet?
Just in case you guys are struggling with (de)compression of collossal data sets, here's something which you'll find useful, a parallel zip archiver called PBZIP2. OK so it's not going to improve the quality of your sequencing data, but it could save you a bit of time.
Say I have a fastq file (FC001_sequence.fq) which needs compression on 8 threads:
How to compress an entire directory:
tar cv directory | pbzip2 > directory.tar.bz2
Here is the help page with more examples:
Say I have a fastq file (FC001_sequence.fq) which needs compression on 8 threads:
pbzip2 -p8 FC001_sequence.fqTo decompress (-d) a file (FC001_sequence.fq.bz2) and keep (-k) the archived version on 10 threads:
pbzip2 -dk -p10 FC001_sequence.fq.bz2To test the integrity of a compressed file:
pbzip2 -t FC001_sequence.fq.bz2
How to compress an entire directory:
tar cv directory | pbzip2 > directory.tar.bz2
Here is the help page with more examples:
Parallel BZIP2 v1.1.5 - by: Jeff Gilchrist [http://compression.ca]
[Jul. 16, 2011] (uses libbzip2 by Julian Seward)
Major contributions: Yavor Nikolov <nikolov.javor+pbzip2@gmail.com>
Usage: pbzip2 [-1 .. -9] [-b#cdfhklm#p#qrS#tVz] <filename> <filename2> <filenameN>
-1 .. -9 set BWT block size to 100k .. 900k (default 900k)
-b# Block size in 100k steps (default 9 = 900k)
-c,--stdout Output to standard out (stdout)
-d,--decompress Decompress file
-f,--force Overwrite existing output file
-h,--help Print this help message
-k,--keep Keep input file, don't delete
-l,--loadavg Load average determines max number processors to use
-m# Maximum memory usage in 1MB steps (default 100 = 100MB)
-p# Number of processors to use (default: autodetect [32])
-q,--quiet Quiet mode (default)
-r,--read Read entire input file into RAM and split between processors
-t,--test Test compressed file integrity
-v,--verbose Verbose mode
-V,--version Display version info for pbzip2 then exit
-z,--compress Compress file (default)
--ignore-trailing-garbage=# Ignore trailing garbage flag (1 - ignored; 0 - forbidden)
Example: pbzip2 -b15vk myfile.tar
Example: pbzip2 -p4 -r -5 myfile.tar second*.txt
Example: tar cf myfile.tar.bz2 --use-compress-prog=pbzip2 dir_to_compress/
Example: pbzip2 -d -m500 myfile.tar.bz2
Example: pbzip2 -dc myfile.tar.bz2 | tar x