Posts

Understanding pathway-level regulation of chromatin marks with the "mitch" Bioconductor package (Epigenetics 2021 Conference Presentation)

Image
Presented 18th February 2021 Abstract Gene expression is governed by numerous chromatin modifications. Understanding these dynamics is critical to understanding human health and disease, but there are few software options for researchers looking to integrate multi-omics data at the level of pathways. To address this, we developed mitch, an R package for multi-contrast gene set enrichment analysis. It uses a rank-MANOVA statistical approach to identify sets of genes that exhibit joint enrichment across multiple contrasts. In this talk I will demonstrate using mitch and showcase its advanced visualisation features to explore the regulation of signaling and biochemical pathways at the chromatin level.

10 quick tips for genomics data management

I get asked a lot about the best ways to store sequence data because the files are massive and researchers have various levels of  knowledge of the hardware and software. Here I'll run through some best practices for genomics research data management based on my 10 years of experience in the space. 1. Always work on servers, not remote machines or laptops On-prem machines and cloud servers are preferred because you can log into the from anywhere using ssh or other protocol. These machines are better suited to heavy loads and are less likely to breakdown because of the institutional tech support and maintenance. Institutional data transfer speeds will be far superior to your home network. Never do computational work on a laptop. Avoid storing data on your own portable hard drives or flash drives. If you don't have a server, ask for access at your institution or research cloud provider (we use Nectar in Australia). 2. Download the data to the place where you will be working on i

DEE2 projects on demand

Image
We have noted that the time between new datasets appearing on SRA and being processed by DEE2 has been about 3 to 6 months. Our dream is to shrink this down to two weeks, but we simply do not have access to that much compute power at the moment. To address this we have devised an "on-demand" feature so that you can request certain datasets to be processed rapidly. We think this is a great feature because it serves the main mission of the DEE2 project which is to make all RNA-seq data freely available to everyone.  Here's how to use it:  1. Visit  http://dee2.io/request.html  and you will be greeted with a webform. Select the organism of interest. 2. Provide the SRA project accession number of the dataset. These numbers begin in SRP/ERP/DRP. If you have a different type of accession such as GEO Series (GSE) or Bioproject (PRJNA) then you will need to navigate NCBI  to find the SRP number.  3. Check that the SRP number is in the standard DEE2 queue. To do that, follow the l

Effect of COVID-19 on genomics publications in 2020

Image
COVID-19 was and remains a major crisis in many countries, disrupting general life as well as scientific research. But how has it impacted scientific output in genomics? To evaluate this I investigated the number of papers published in PubMed Central  (PMC) in the period from 2016 through 2020. I used total number of papers as well as those matching the genomics search term with the approach below: (genom*[Abstract]) AND ("2020"[Publication Date] : "2020"[Publication Date])  <Note: the number of papers is only a lagging proxy measure of aggregate activity in a field, it does not relate to scientific quality> Here are the number of papers and genomics papers published annually over this period. What you can see is that genomics experienced a major fall in number of papers appearing in PMC in 2020 while total papers did not. Indeed 2020 was the only year since 2000 that the number of published genomics papers has actually gone down compared to the previous year

DEE2 project update October 2020

Image
Bit of a thread for some updates to the DEE2 data set. It's a resource of uniformly processed RNA-seq data free to use under a permissive GPL3 licence. Find it at http://dee2.io Yesterday a batch of 117k human runs were uploaded. This brings the total number of runs to 1,298,581. To my knowledge this is the largest such data set in the world. This is 10x larger than our first release in 2015! (125k runs) The number of SRA projects with completed data analysis bundles is 32692  Accesible here: http://dee2.io/huge/ The getDEE2 package is the recommended way to access this data if you are familiar with R. You can access individual runs or enrire data bundles with the various functions. The pkg is part of the latest BioC release out today https://www.bioconductor.org/packages/devel/bioc/html/getDEE2.html The button for redirecting DEE2 data directly to Degust is broken and we are looking for a fix. For now  you will need to download the data to the disk and upload to the Degust webpage

EdgeR or DESeq2? Comparing the performance of differential expression tools

Image
It seems like this discussion comes up a lot. Choosing a differential expression (DE) tool could change the results and conclusions of studies. How much depends on the strength of the data. Indeed this has been coveres by others already ( here , here ). In this post I will compare these tools for a particular dataset that highlights the different ways these algorithms perform. So you can try it out at home, I've uploaded the code for this to GitHub here . To get it to work, clone the repository and open the Rmarkdown (.Rmd) file in Rstudio. You will need to install Bioconductor packages edgeR , DESeq2 and the CRAN package eulerr (for the nifty venn diagrams). Once that's done you can start working with the Rmd file. For an introduction to Rmd  read on here . Suffice to say that it's the most convenient way to generate a data report that includes code, results and descriptive text for background information, interpretation, references and the rest. To create the report in

Installing R-4.0 on Ubuntu 18.04 painlessly

There are some great instructions here but there are a few gotchas to be aware of. Follow these instructions first, and it will make the process a LOT easier: 1. Make sure you have these dependancies installed. If you don't, you're going to have trouble installing any R packages like R curl, tidyverse, devtools, etc sudo apt install libssl-dev libcurl4-openssl-dev libxml2-dev 2. Remove any existing R install. sudo apt remove r-base* --purge 3. Remove any remaining packages. It's better to install them "fresh". They will be in locations like /home/user/R/R-3.6.6 and /usr/lib/R/library/ 4. Remove any existing entry for R in the etc/apt/sources.list to install an older R version. sudo nano /etc/apt/sources.list For example you might have something like this: deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ 5. Then follow the DigitalOcean instructions here  and you should be set up in just a few minutes.