Urgent need for minimum standards for reproducible functional enrichment analysis
So our preprint is online called “Guidelines for reliable and reproducible functional enrichment analysis” so I thought I’d give you an overview.
Enrichment analysis is widely used for exploration and interpretation of omics data, but I’ve noticed sloppy work is becoming more common. Over the past few years I’ve been asked to review several manuscripts where the enrichment analysis was poorly conducted and reported. Examples of this include lack of FDR control, incorrect background gene list specification and lack of essential methodological details. In the article we show how common these problems are and what the impact could be on the results.
In this article we did two screens of articles from PubMed Central. Firstly selected 200 articles with terms in the abstract related to enrichment analysis. These were examined with regards to a checklist around reporting and methodological issues, and were cross-checked for accuracy. As some articles describe >1 analysis, the total number of analyses was 235.
They did a variety of different types of omics analysis, but the most common were gene expression analysis
we show that these methodological problems are very common. For example out of 235 analyses in our sample, only 119 (53%) specified conducting p-value correction for multiple testing!
In terms of tools, over-representation analysis including DAVID, Ingenuity and PANTHER were the most popular, followed by GSEA
Over-representation relies on an accurate background gene list, but we found this was correctly described in only 8/197 cases (4.1%)!!
There were 15 analyses that lacked even the most basic methodological detail, like which tool was used to calculate enrichment. The excerpt below is an example basically saying that KEGG pathway analysis was conducted but not stating how it was done. The article linked is the original KEGG paper from 2000 (PMID: 10592173) which doesn't describe any statistical enrichment analysis.
After examining another 1395 articles, we looked for trends between rigour/reporting and publication metrics. There wasn’t any association between out analysis scores and metrics like Scimago Journal Rank and number of citations accrued. This shows this isn’t a problem only with lower ranked journals.
This is a massive wake-up call that serious methodological issues are widespread with enrichment analysis.
We also conducted some benchmarking analysis to understand how much these errors can impact results. We began by using differential RNA-seq expression data, we show that when used properly, over-representation (ORA) and GSEA like methods (functional class scoring; FCS) give similar results. In this example we used the clusterProfiler package for ORA and mitch package for FCS; more details in the article.
When an inappropriate background gene list is used in this case the whole genome (instead of only the genes detected in the assay), the results are vastly different! Likely these extra sets are false positives.
We also observed that most over-representation analyses performed enrichment of up and down genes combined into one list. Our own testing shows that this approach has much lower power to discover differentially regulated gene sets as compared to analysing up and down separately. In fact the number of differentially regulated gene sets was down by ~90% using the combined approach.
To combat these problems we provide a checklist of minimum standards for enrichment analysis.
The manuscript also contains some recommendations if you want to take your analysis from these minimum standards up to gold standard.
Hopefully this is of use to you. Happy to receive your feedback
Big shout out to Sameer Jadaan, Kaushalya Perera and Tanuveer Kaur for massive help in screening these articles. Also big thanks to Nick Wong, Antony Kasi and Anup Shah for comments on the draft.
If you are interested in exploring the data collected and code used, our R scripts are deposited here: https://github.com/markziemann/SurveyEnrichmentMethods