Effects of data transformation and model selection on feature importance in microbiome classification data
Abstract Background Accurate classification of host phenotypes from microbiome data is crucial for advancing microbiome-based therapies, with machine learning offering effective solutions. However, the complexity of the gut microbiome, data sparsity, compositionality, and population-specificity pres...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2025-01-01
|
Series: | Microbiome |
Online Access: | https://doi.org/10.1186/s40168-024-01996-6 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841559200132497408 |
---|---|
author | Zuzanna Karwowska Oliver Aasmets Estonian Biobank research team Tomasz Kosciolek Elin Org |
author_facet | Zuzanna Karwowska Oliver Aasmets Estonian Biobank research team Tomasz Kosciolek Elin Org |
author_sort | Zuzanna Karwowska |
collection | DOAJ |
description | Abstract Background Accurate classification of host phenotypes from microbiome data is crucial for advancing microbiome-based therapies, with machine learning offering effective solutions. However, the complexity of the gut microbiome, data sparsity, compositionality, and population-specificity present significant challenges. Microbiome data transformations can alleviate some of the aforementioned challenges, but their usage in machine learning tasks has largely been unexplored. Results Our analysis of over 8500 samples from 24 shotgun metagenomic datasets showed that it is possible to classify healthy and diseased individuals using microbiome data with minimal dependence on the choice of algorithm or transformation. Presence-absence transformations performed comparably to abundance-based transformations, and only a small subset of predictors is necessary for accurate classification. However, while different transformations resulted in comparable classification performance, the most important features varied significantly, which highlights the need to reevaluate machine learning–based biomarker detection. Conclusions Microbiome data transformations can significantly influence feature selection but have a limited effect on classification accuracy. Our findings suggest that while classification is robust across different transformations, the variation in feature selection necessitates caution when using machine learning for biomarker identification. This research provides valuable insights for applying machine learning to microbiome data and identifies important directions for future work. |
format | Article |
id | doaj-art-317adbb17287406f84bf3abcc2b02544 |
institution | Kabale University |
issn | 2049-2618 |
language | English |
publishDate | 2025-01-01 |
publisher | BMC |
record_format | Article |
series | Microbiome |
spelling | doaj-art-317adbb17287406f84bf3abcc2b025442025-01-05T12:41:19ZengBMCMicrobiome2049-26182025-01-0113111410.1186/s40168-024-01996-6Effects of data transformation and model selection on feature importance in microbiome classification dataZuzanna Karwowska0Oliver Aasmets1Estonian Biobank research teamTomasz Kosciolek2Elin Org3Małopolska Centre of Biotechnology, Jagiellonian UniversityEstonian Genome Centre, Institute of Genomics, University of TartuMałopolska Centre of Biotechnology, Jagiellonian UniversityEstonian Genome Centre, Institute of Genomics, University of TartuAbstract Background Accurate classification of host phenotypes from microbiome data is crucial for advancing microbiome-based therapies, with machine learning offering effective solutions. However, the complexity of the gut microbiome, data sparsity, compositionality, and population-specificity present significant challenges. Microbiome data transformations can alleviate some of the aforementioned challenges, but their usage in machine learning tasks has largely been unexplored. Results Our analysis of over 8500 samples from 24 shotgun metagenomic datasets showed that it is possible to classify healthy and diseased individuals using microbiome data with minimal dependence on the choice of algorithm or transformation. Presence-absence transformations performed comparably to abundance-based transformations, and only a small subset of predictors is necessary for accurate classification. However, while different transformations resulted in comparable classification performance, the most important features varied significantly, which highlights the need to reevaluate machine learning–based biomarker detection. Conclusions Microbiome data transformations can significantly influence feature selection but have a limited effect on classification accuracy. Our findings suggest that while classification is robust across different transformations, the variation in feature selection necessitates caution when using machine learning for biomarker identification. This research provides valuable insights for applying machine learning to microbiome data and identifies important directions for future work.https://doi.org/10.1186/s40168-024-01996-6 |
spellingShingle | Zuzanna Karwowska Oliver Aasmets Estonian Biobank research team Tomasz Kosciolek Elin Org Effects of data transformation and model selection on feature importance in microbiome classification data Microbiome |
title | Effects of data transformation and model selection on feature importance in microbiome classification data |
title_full | Effects of data transformation and model selection on feature importance in microbiome classification data |
title_fullStr | Effects of data transformation and model selection on feature importance in microbiome classification data |
title_full_unstemmed | Effects of data transformation and model selection on feature importance in microbiome classification data |
title_short | Effects of data transformation and model selection on feature importance in microbiome classification data |
title_sort | effects of data transformation and model selection on feature importance in microbiome classification data |
url | https://doi.org/10.1186/s40168-024-01996-6 |
work_keys_str_mv | AT zuzannakarwowska effectsofdatatransformationandmodelselectiononfeatureimportanceinmicrobiomeclassificationdata AT oliveraasmets effectsofdatatransformationandmodelselectiononfeatureimportanceinmicrobiomeclassificationdata AT estonianbiobankresearchteam effectsofdatatransformationandmodelselectiononfeatureimportanceinmicrobiomeclassificationdata AT tomaszkosciolek effectsofdatatransformationandmodelselectiononfeatureimportanceinmicrobiomeclassificationdata AT elinorg effectsofdatatransformationandmodelselectiononfeatureimportanceinmicrobiomeclassificationdata |