Improved Key Microbial Biomarker Discovery Using Ensemble Statistical Methods

In recent years, there has been a growing awareness of the importance of the microbiome in health and disease. Consequently, the number of large microbiome-related clinical trials has also significantly increased. However, advanced biostatistical analysis is required to properly combine microbiome t...

Full description

Saved in:
Bibliographic Details
Main Authors: Walter Pirovano, Yashjit Gangopadhyay, Mirna Lilian Baak, Christiaan Arie de Leeuw, Radhika Bongoni, Eline Suzanne Klaassens
Format: Article
Language:English
Published: Wiley 2025-01-01
Series:Advanced Gut & Microbiome Research
Online Access:http://dx.doi.org/10.1155/agm3/9676659
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In recent years, there has been a growing awareness of the importance of the microbiome in health and disease. Consequently, the number of large microbiome-related clinical trials has also significantly increased. However, advanced biostatistical analysis is required to properly combine microbiome taxonomic abundance data with phenotypical metadata and reliably predict disease states. While differential abundance analysis and machine-learning techniques are widely used to perform such analyses, selecting the best method is not trivial due to the complexity and specific characteristics of both the data and the algorithms. Here, we present a consensus-based key microbial biomarker (KMB) biostatistical analysis framework that links microbial abundance obtained from amplicon-based or shotgun metagenome sequencing with metadata. The framework integrates machine learning (ML) algorithms and statistical methods to determine the most relevant microbial biomarkers and signatures that explain variation in the microbial abundance counts and metadata classes based on predefined metrics. We evaluated the performance of our framework on publicly available case-control datasets of colorectal cancer, Alzheimer’s disease, and Parkinson’s disease and show that, compared to individually run methods, the combined approach is better able to detect KMB species and signatures associated with health and disease conditions. We conclude that our proposed KMB framework provides an innovative and robust strategy that can contribute to the further development of improved diagnostic tools for early disease detection, personalized medicine design, patient stratification, and a better general understanding of the mechanisms behind observed results in pre and postclinical trials.
ISSN:2755-1652