A docker image for infinum methylation analysis

Performing a differential methylation analysis of infinium array data requires an impressively large number of R packages, such as `minfi`, `missmethyl`, `limma`, `genomicRanges`, `DMRcate`, `bunpHunter` and many others. Each of these in turn are considered heavy packages as they each require many dependancies. This means it can take up to an hour to go from a vanilla R installation to one with all the needed packages installed. If you are using multiple computers you might find that these have slightly different versions of R, bioconductor and this large stack of dependancies, which could lead to different results. You may also find that it is difficult to install this large set of dependancies on shared systems, as some dependenncies might require installation of system libraries that need admin permissions to install.



The way I've tried to alleviate this problem is to install all my needed packages into a Docker image which can then be downloaded and run in a few minutes on a new system, rather than an hour. The image is based on bioconductor release version 3:14*, which uses R version 4.1.3 Visit the DockerHub image here to learn more.

On a system where docker is installed, you can run the following to get the image and then start an interactive shell:

docker pull mziemann/meth_analysis
docker run -it --entrypoint /bin/bash mziemann/meth_analysis

Once inside the image, you can access all those dependancies, and perform those analyses. You can use the wormhole tool to copy results files out of the container, or just exit the container and then use docker cp command to fetch what you need.

For those of you on a shared system without docker or sudo access, you can still use this image, but you'll need to pick a tool like singularity or udocker which doesn't require root permissions. Personally I prefer udocker as it can be installed without sudo permissions, and the process is relatively simple. Once it is installed, you type  the commands 

udocker pull mziemann/meth_analysis
udocker run mziemann/meth_analysis bash

which will pull the image and start a new container. If you need to load some data into the container, this command can "mount" a folder into a location in the container so you can work with it.

udocker run --volume=/home/user/data:/analysis mziemann/meth_analysis /bin/bash

So feel free to remix the Dockerfile to your needs, create your own image for infinium methylation analysis with your favourite packages and rest easy that not only will your reproducibility be better, but installing stuff on a new computer will be a lot less painful.

*You may be wondering why I'm using a relatively old bioC version. The answer is that this is the newest one that can be supported on the server I'm using at my uni, which runs Ubuntu 20.04 LTS. I tried newer ones but it complained about missing system libraries that I could not fix.

Popular posts from this blog

Data analysis step 8: Pathway analysis with GSEA

Uploading data to GEO - which method is faster?

Using GTF tools to get gene lengths