Analyzing microbiome data with taxonomic misclassification using a zero-inflated Dirichlet-multinomial model

Abstract The human microbiome is the collection of microorganisms living on and inside of our bodies. A major aim of microbiome research is understanding the role microbial communities play in human health with the goal of designing personalized interventions that modulate the microbiome to treat or...

Full description

Saved in:
Bibliographic Details
Main Author: Matthew D. Koslovsky
Format: Article
Language:English
Published: BMC 2025-02-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-025-06078-4
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850238064174039040
author Matthew D. Koslovsky
author_facet Matthew D. Koslovsky
author_sort Matthew D. Koslovsky
collection DOAJ
description Abstract The human microbiome is the collection of microorganisms living on and inside of our bodies. A major aim of microbiome research is understanding the role microbial communities play in human health with the goal of designing personalized interventions that modulate the microbiome to treat or prevent disease. Microbiome data are challenging to analyze due to their high-dimensionality, overdispersion, and zero-inflation. Analysis is further complicated by the steps taken to collect and process microbiome samples. For example, sequencing instruments have a fixed capacity for the total number of reads delivered. It is therefore essential to treat microbial samples as compositional. Another complicating factor of modeling microbiome data is that taxa counts are subject to measurement error introduced at various stages of the measurement protocol. Advances in sequencing technology and preprocessing pipelines coupled with our growing knowledge of the human microbiome have reduced, but not eliminated, measurement error. Ignoring measurement error during analysis, though common in practice, can then lead to biased inference and curb reproducibility. We propose a Dirichlet-multinomial modeling framework for microbiome data with excess zeros and potential taxonomic misclassification. We demonstrate how accommodating taxonomic misclassification improves estimation performance and investigate differences in gut microbial composition between healthy and obese children.
format Article
id doaj-art-2e7a33d686144c79b24ae45980cf1e42
institution OA Journals
issn 1471-2105
language English
publishDate 2025-02-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj-art-2e7a33d686144c79b24ae45980cf1e422025-08-20T02:01:35ZengBMCBMC Bioinformatics1471-21052025-02-0126111910.1186/s12859-025-06078-4Analyzing microbiome data with taxonomic misclassification using a zero-inflated Dirichlet-multinomial modelMatthew D. Koslovsky0Department of Statistics, Colorado State UniversityAbstract The human microbiome is the collection of microorganisms living on and inside of our bodies. A major aim of microbiome research is understanding the role microbial communities play in human health with the goal of designing personalized interventions that modulate the microbiome to treat or prevent disease. Microbiome data are challenging to analyze due to their high-dimensionality, overdispersion, and zero-inflation. Analysis is further complicated by the steps taken to collect and process microbiome samples. For example, sequencing instruments have a fixed capacity for the total number of reads delivered. It is therefore essential to treat microbial samples as compositional. Another complicating factor of modeling microbiome data is that taxa counts are subject to measurement error introduced at various stages of the measurement protocol. Advances in sequencing technology and preprocessing pipelines coupled with our growing knowledge of the human microbiome have reduced, but not eliminated, measurement error. Ignoring measurement error during analysis, though common in practice, can then lead to biased inference and curb reproducibility. We propose a Dirichlet-multinomial modeling framework for microbiome data with excess zeros and potential taxonomic misclassification. We demonstrate how accommodating taxonomic misclassification improves estimation performance and investigate differences in gut microbial composition between healthy and obese children.https://doi.org/10.1186/s12859-025-06078-4CompositionalHigh-dimensionalMultivariate count dataObesity
spellingShingle Matthew D. Koslovsky
Analyzing microbiome data with taxonomic misclassification using a zero-inflated Dirichlet-multinomial model
BMC Bioinformatics
Compositional
High-dimensional
Multivariate count data
Obesity
title Analyzing microbiome data with taxonomic misclassification using a zero-inflated Dirichlet-multinomial model
title_full Analyzing microbiome data with taxonomic misclassification using a zero-inflated Dirichlet-multinomial model
title_fullStr Analyzing microbiome data with taxonomic misclassification using a zero-inflated Dirichlet-multinomial model
title_full_unstemmed Analyzing microbiome data with taxonomic misclassification using a zero-inflated Dirichlet-multinomial model
title_short Analyzing microbiome data with taxonomic misclassification using a zero-inflated Dirichlet-multinomial model
title_sort analyzing microbiome data with taxonomic misclassification using a zero inflated dirichlet multinomial model
topic Compositional
High-dimensional
Multivariate count data
Obesity
url https://doi.org/10.1186/s12859-025-06078-4
work_keys_str_mv AT matthewdkoslovsky analyzingmicrobiomedatawithtaxonomicmisclassificationusingazeroinflateddirichletmultinomialmodel