Posts

Showing posts with the label dat project

Publishing datasets on the dat network - benefits and pitfalls

Image
As I mentioned in an earlier post, Dat is a new data sharing tool that uses concepts of bittorrent and git to enable peer-to-peer sharing of versioned data. This is cool for sharing datasets that change over time, because when you sync the dataset, only the changes are retrieved, sort of like git. As it uses peer-to-peer technology, it is fairly resilient to node failures as the datasets are mirrored between peers. The "dat publish" command registers the repository on datbase.org, meaning that the files can be retrieved by anyone via a normal browser.

To demonstrate, I have released the bulk data dumps from my RNA-seq data processing project, DEE2, which consists of 158 GB of gene expression data. These data are freely available via a browser at https://datbase.org/dee2/bulk or by using the dat command-line tool.


If you're after a single file, then you can use the following syntax to retrieve over https:

wget https://datbase.org/download/<long dat address>/<file…