The folks at EBI have produced a summary of all the high throughput sequence aligners commonly used (link), and there is a companion paper in the current issue of Bioinformatics. The timeline is a fascinating info-graphic.
There is also a table describing the features of each. One thing I noticed is that there are relatively few aligners for RNA data, despite RNA-seq being arguably the most important application in NGS right now. (I will cover the use of RNA specific aligners in future posts.) The paper discusses some of the generalities of selecting an appropriate alignment tool for the job but doesn't go so far as to suggest which one is "best".
It seems like this discussion comes up a lot. Choosing a differential expression (DE) tool could change the results and conclusions of studies. How much depends on the strength of the data. Indeed this has been coveres by others already ( here , here ). In this post I will compare these tools for a particular dataset that highlights the different ways these algorithms perform. So you can try it out at home, I've uploaded the code for this to GitHub here . To get it to work, clone the repository and open the Rmarkdown (.Rmd) file in Rstudio. You will need to install Bioconductor packages edgeR , DESeq2 and the CRAN package eulerr (for the nifty venn diagrams). Once that's done you can start working with the Rmd file. For an introduction to Rmd read on here . Suffice to say that it's the most convenient way to generate a data report that includes code, results and descriptive text for background information, interpretation, references and the rest. To create the report in
In our RNA-seq series so far we've performed differential analysis and generated some pretty graphs, showing thousands of differentially expressed genes after azacitidine treatment. In order to understand the biology underlying the differential gene expression profile, we need to perform pathway analysis. We use Gene Set Enrichment Analysis ( GSEA ) because it can detect pathway changes more sensitively and robustly than some methods. A 2013 paper compared a bunch of gene set analyses software with microarrays and is worth a look. Generate a rank file The rank file is a list of detected genes and a rank metric score. At the top of the list are genes with the "strongest" up-regulation, at the bottom of the list are the genes with the "strongest" down-regulation and the genes not changing are in the middle. The metric score I like to use is the sign of the fold change multiplied by the inverse of the p-value, although there may be better methods out there
There are some great instructions here but there are a few gotchas to be aware of. Follow these instructions first, and it will make the process a LOT easier: 1. Make sure you have these dependancies installed. If you don't, you're going to have trouble installing any R packages like R curl, tidyverse, devtools, etc sudo apt install libssl-dev libcurl4-openssl-dev libxml2-dev 2. Remove any existing R install. sudo apt remove r-base* --purge 3. Remove any remaining packages. It's better to install them "fresh". They will be in locations like /home/user/R/R-3.6.6 and /usr/lib/R/library/ 4. Remove any existing entry for R in the etc/apt/sources.list to install an older R version. sudo nano /etc/apt/sources.list For example you might have something like this: deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ 5. Then follow the DigitalOcean instructions here and you should be set up in just a few minutes.