The five pillars of computational reproducibility: Bioinformatics and beyond
I've been working on a new project to follow-up our paper last year on the problems with pathway enrichment analysis. That article turned out to be a bleak and depressing look into how frequently used tools in genomics are misused. It is not an exaggeration to say that most articles showing some type of enrichment analysis are doing it wrong and no doubt this is severely impacting the literature. However I think it isn't helpful to only focus on the negative aspects of bioinformatics and computational research. We also need to lead the way towards resolving these issues. The best way to do this is in my view is to provide step-by-step guides and tutorials for common routines. So this is what we are in the process of doing, making a protocol for pathway enrichment analysis that is "extremely reproducible". By this, I mean that the analysis could be reproduced independently in future with the minimum of fuss and time. As we were writing this we also recognised that the