Posts

Showing posts from September, 2018

Incorporate dee2 data into your R-based RNA-seq workflow

Dee2.io is a portal for accessing gene expression data derived from public RNA-seq datasets. So far there are over 400k available datasets and its growing every day. While there are existing databases of such as Expression Atlas , Recount2  and ARCHS4 , dee2.io offers a number of unique benefits. For instance, dee2 includes gene-wise counts fron STAR as well as transcript-wise quantifications from Kallisto. There are a few ways you can access these data. Firstly, there is a nice web interface that is mobile friendly. Secondly, there are data dumps available if you are running a large scale analysis.  But the purpose of this post is to demonstrate the improved R interface in action together with SRAdbv2 and statistics with edgeR and DESeq. The official documentation is available on  GitHub . Getting started This tutorial provides a walkthrough for how to work with dee2 expression data, starting with dataset searches, obtaining the data from dee2.io and then performing a differentia

Update on DEE2 project for Sept 2018

Image
A few updates for DEE2 i would like to share.  I switched over to NameCheap domain name service which appears to be working much nicer than the previous one (HostPapa). The domain name sever change broke the docker image so it was slightly modified and rebuilt.  I've integrated with SRAdbV2, an now there are many more datasets in the queue. I think many of these are small ones related to single cell RNA-seq. I am using as many computers as possible to clear up the backlog. I've noticed a lot of SRA project with one or a few datasets missing, so I have have written a script to identify these and queue them with priority.  The R interface hs undergone several improvements and should be more robust now. A whole bunch of new documentation has been added, including a complete walkthrough starting with SRAdbV2 query, fetching DEE2 data, and differential analysis with edgeR and DESeq. Also bulk data dumps are again available via http. Dat turned out to be too slow and unrelia