Showing posts from January, 2020

DEE2 project update Jan 2020

Happy New Year! Here I'll go through a few updates on the DEE2 project. In case you don't know about it you can read the paper here . 1) Data processing One of the most obvious updates since the paper came out was that the number of datasets hosted by DEE2 has increased substantially from 581,094 to 805,385 runs. This was achieved by using a combination of minicluster at Deakin Uni, several Nectar Coud servers and the Massive HPC located at Monash Uni. The queue is still growing, so I'll need to further expand compute resources in future. 2) Getting a new source of metadata DEE2 was reliant on the SRADBV2 package for SRA metadata. This was really critical to provide a list of accession numbers for the database to process as well as sample information so that end users can search for data sets of interest. Unfortunately, SRADBV2 became deprecated only a few months after the paper was published. This was possibly due to the expanding size of the SRA metadata making