Showing posts from October, 2020

DEE2 project update October 2020

Bit of a thread for some updates to the DEE2 data set. It's a resource of uniformly processed RNA-seq data free to use under a permissive GPL3 licence. Find it at Yesterday a batch of 117k human runs were uploaded. This brings the total number of runs to 1,298,581. To my knowledge this is the largest such data set in the world. This is 10x larger than our first release in 2015! (125k runs) The number of SRA projects with completed data analysis bundles is 32692  Accesible here: The getDEE2 package is the recommended way to access this data if you are familiar with R. You can access individual runs or enrire data bundles with the various functions. The pkg is part of the latest BioC release out today The button for redirecting DEE2 data directly to Degust is broken and we are looking for a fix. For now  you will need to download the data to the disk and upload to the Degust webpage

EdgeR or DESeq2? Comparing the performance of differential expression tools

It seems like this discussion comes up a lot. Choosing a differential expression (DE) tool could change the results and conclusions of studies. How much depends on the strength of the data. Indeed this has been coveres by others already ( here , here ). In this post I will compare these tools for a particular dataset that highlights the different ways these algorithms perform. So you can try it out at home, I've uploaded the code for this to GitHub here . To get it to work, clone the repository and open the Rmarkdown (.Rmd) file in Rstudio. You will need to install Bioconductor packages edgeR , DESeq2 and the CRAN package eulerr (for the nifty venn diagrams). Once that's done you can start working with the Rmd file. For an introduction to Rmd  read on here . Suffice to say that it's the most convenient way to generate a data report that includes code, results and descriptive text for background information, interpretation, references and the rest. To create the report in