Effects of data transformation and model selection on feature importance in microbiome classification data

Abstract Background Accurate classification of host phenotypes from microbiome data is crucial for advancing microbiome-based therapies, with machine learning offering effective solutions. However, the complexity of the gut microbiome, data sparsity, compositionality, and population-specificity pres...

Full description

Saved in:
Bibliographic Details
Main Authors: Zuzanna Karwowska, Oliver Aasmets, Estonian Biobank research team, Tomasz Kosciolek, Elin Org
Format: Article
Language:English
Published: BMC 2025-01-01
Series:Microbiome
Online Access:https://doi.org/10.1186/s40168-024-01996-6
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841559200132497408
author Zuzanna Karwowska
Oliver Aasmets
Estonian Biobank research team
Tomasz Kosciolek
Elin Org
author_facet Zuzanna Karwowska
Oliver Aasmets
Estonian Biobank research team
Tomasz Kosciolek
Elin Org
author_sort Zuzanna Karwowska
collection DOAJ
description Abstract Background Accurate classification of host phenotypes from microbiome data is crucial for advancing microbiome-based therapies, with machine learning offering effective solutions. However, the complexity of the gut microbiome, data sparsity, compositionality, and population-specificity present significant challenges. Microbiome data transformations can alleviate some of the aforementioned challenges, but their usage in machine learning tasks has largely been unexplored. Results Our analysis of over 8500 samples from 24 shotgun metagenomic datasets showed that it is possible to classify healthy and diseased individuals using microbiome data with minimal dependence on the choice of algorithm or transformation. Presence-absence transformations performed comparably to abundance-based transformations, and only a small subset of predictors is necessary for accurate classification. However, while different transformations resulted in comparable classification performance, the most important features varied significantly, which highlights the need to reevaluate machine learning–based biomarker detection. Conclusions Microbiome data transformations can significantly influence feature selection but have a limited effect on classification accuracy. Our findings suggest that while classification is robust across different transformations, the variation in feature selection necessitates caution when using machine learning for biomarker identification. This research provides valuable insights for applying machine learning to microbiome data and identifies important directions for future work.
format Article
id doaj-art-317adbb17287406f84bf3abcc2b02544
institution Kabale University
issn 2049-2618
language English
publishDate 2025-01-01
publisher BMC
record_format Article
series Microbiome
spelling doaj-art-317adbb17287406f84bf3abcc2b025442025-01-05T12:41:19ZengBMCMicrobiome2049-26182025-01-0113111410.1186/s40168-024-01996-6Effects of data transformation and model selection on feature importance in microbiome classification dataZuzanna Karwowska0Oliver Aasmets1Estonian Biobank research teamTomasz Kosciolek2Elin Org3Małopolska Centre of Biotechnology, Jagiellonian UniversityEstonian Genome Centre, Institute of Genomics, University of TartuMałopolska Centre of Biotechnology, Jagiellonian UniversityEstonian Genome Centre, Institute of Genomics, University of TartuAbstract Background Accurate classification of host phenotypes from microbiome data is crucial for advancing microbiome-based therapies, with machine learning offering effective solutions. However, the complexity of the gut microbiome, data sparsity, compositionality, and population-specificity present significant challenges. Microbiome data transformations can alleviate some of the aforementioned challenges, but their usage in machine learning tasks has largely been unexplored. Results Our analysis of over 8500 samples from 24 shotgun metagenomic datasets showed that it is possible to classify healthy and diseased individuals using microbiome data with minimal dependence on the choice of algorithm or transformation. Presence-absence transformations performed comparably to abundance-based transformations, and only a small subset of predictors is necessary for accurate classification. However, while different transformations resulted in comparable classification performance, the most important features varied significantly, which highlights the need to reevaluate machine learning–based biomarker detection. Conclusions Microbiome data transformations can significantly influence feature selection but have a limited effect on classification accuracy. Our findings suggest that while classification is robust across different transformations, the variation in feature selection necessitates caution when using machine learning for biomarker identification. This research provides valuable insights for applying machine learning to microbiome data and identifies important directions for future work.https://doi.org/10.1186/s40168-024-01996-6
spellingShingle Zuzanna Karwowska
Oliver Aasmets
Estonian Biobank research team
Tomasz Kosciolek
Elin Org
Effects of data transformation and model selection on feature importance in microbiome classification data
Microbiome
title Effects of data transformation and model selection on feature importance in microbiome classification data
title_full Effects of data transformation and model selection on feature importance in microbiome classification data
title_fullStr Effects of data transformation and model selection on feature importance in microbiome classification data
title_full_unstemmed Effects of data transformation and model selection on feature importance in microbiome classification data
title_short Effects of data transformation and model selection on feature importance in microbiome classification data
title_sort effects of data transformation and model selection on feature importance in microbiome classification data
url https://doi.org/10.1186/s40168-024-01996-6
work_keys_str_mv AT zuzannakarwowska effectsofdatatransformationandmodelselectiononfeatureimportanceinmicrobiomeclassificationdata
AT oliveraasmets effectsofdatatransformationandmodelselectiononfeatureimportanceinmicrobiomeclassificationdata
AT estonianbiobankresearchteam effectsofdatatransformationandmodelselectiononfeatureimportanceinmicrobiomeclassificationdata
AT tomaszkosciolek effectsofdatatransformationandmodelselectiononfeatureimportanceinmicrobiomeclassificationdata
AT elinorg effectsofdatatransformationandmodelselectiononfeatureimportanceinmicrobiomeclassificationdata