DEE2 database gets HDF5
Here I’ll show you how to download and work with the new HDF5 datasets from DEE2 (dee2.io). HDF5 files are provide fast random access to large and complex datasets while occupying less disk space. Overall the bulk data files are 50% smaller than the previously used BZ2. It also makes selecting datasets of interest quicker and obviates the need to convert data from "long" to "wide" formats, which takes a long time and lots of RAM. In short, this is a big upgrade in end user accessibility to power large scale analysis of DEE2 transcriptome data. The materials here are mostly based on the rhdf5 package here . I’ll demonstrate with E. coli , but this should also work for other organisms. First step is to load the rhdf5 library and download the h5 file. library ( "rhdf5" ) library ( "tictoc" ) if ( file.exists( "ecoli_se.h5" ) ) { message( "HDF5 file exists" ) } else { message( "Downloading HDF5 file" ) downl...