Benchmarking Differential Abundance Tests for 16S microbiome sequencing data using simulated data based on experimental templates.

Differential abundance (DA) analysis of metagenomic microbiome data is essential for understanding microbial community dynamics across various environments and hosts. Identifying microorganisms that differ significantly in abundance between conditions (e.g., health vs. disease) is crucial for insigh...

Full description

Saved in:
Bibliographic Details
Main Authors: Eva Kohnert, Clemens Kreutz
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0321452
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850269861862703104
author Eva Kohnert
Clemens Kreutz
author_facet Eva Kohnert
Clemens Kreutz
author_sort Eva Kohnert
collection DOAJ
description Differential abundance (DA) analysis of metagenomic microbiome data is essential for understanding microbial community dynamics across various environments and hosts. Identifying microorganisms that differ significantly in abundance between conditions (e.g., health vs. disease) is crucial for insights into environmental adaptations, disease development, and host health. However, the statistical interpretation of microbiome data is challenged by inherent sparsity and compositional nature, necessitating tailored DA methods. This benchmarking study aims to simulate synthetic 16S microbiome data using metaSPARSim (Patuzzi I, Baruzzo G, Losasso C, Ricci A, Di Camillo B. MetaSPARSim: a 16S rRNA gene sequencing count data simulator. BMC Bioinformatics. 2019;20:416. https://doi.org/10.1186/s12859-019-2882-6 PMID: 31757204) MIDASim (He M, Zhao N, Satten GA. MIDASim: a fast and simple simulator for realistic microbiome data. Available from: https://doi.org/10.1101/2023.03.23.533996), and sparseDOSSA2 (Ma S, Ren B, Mallick H, Moon YS, Schwager E, Maharjan S, et al. A statistical model for describing and simulating microbial community profiles. PLOS Comput Biol. 2021;17(9):e1008913. https://doi.org/10.1371/journal.pcbi.1008913 PMID: 34516542) , leveraging 38 real-world experimental templates (S3 Table) previously utilized in a benchmark study comparing DA tools. These datasets, drawn from diverse environments such as human gut, soil, and marine habitats, serve as the foundation for our simulation efforts. We employ the same 14 DA tests that were previously used with the same experimental data in benchmark studies alongside 8 DA tests that were developed subsequently. Initially, we will generate synthetic data closely mirroring the experimental datasets, incorporating a known truth to cover a broad range of real-world data characteristics. This approach allows us to assess the ability of DA methods to recover known true differential abundances. We will further simulate datasets by altering sparsity, effect size, and sample size, thus creating a comprehensive collection for applying the 22 DA tests. The outcomes, focusing on sensitivities and specificities, will provide insights into the performance of DA tests and their dependencies on sparsity, effect size, and sample size. Additionally, we will calculate data characteristics (S1 and S2 Table) for each simulated dataset and use a multiple regression to identify informative data characteristics influencing test performance. Our prior study, where we used simulated data without incorporating a known truth, demonstrated the feasibility of using synthetic data to validate experimental findings. This current study aims to enhance our understanding by systematically evaluating the impact of known truth incorporation on DA test performance, thereby providing further information for the selection and application of DA methods in microbiome research.
format Article
id doaj-art-f0087cbcd03340b192d0cb7aea98dc92
institution OA Journals
issn 1932-6203
language English
publishDate 2025-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-f0087cbcd03340b192d0cb7aea98dc922025-08-20T01:52:54ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01205e032145210.1371/journal.pone.0321452Benchmarking Differential Abundance Tests for 16S microbiome sequencing data using simulated data based on experimental templates.Eva KohnertClemens KreutzDifferential abundance (DA) analysis of metagenomic microbiome data is essential for understanding microbial community dynamics across various environments and hosts. Identifying microorganisms that differ significantly in abundance between conditions (e.g., health vs. disease) is crucial for insights into environmental adaptations, disease development, and host health. However, the statistical interpretation of microbiome data is challenged by inherent sparsity and compositional nature, necessitating tailored DA methods. This benchmarking study aims to simulate synthetic 16S microbiome data using metaSPARSim (Patuzzi I, Baruzzo G, Losasso C, Ricci A, Di Camillo B. MetaSPARSim: a 16S rRNA gene sequencing count data simulator. BMC Bioinformatics. 2019;20:416. https://doi.org/10.1186/s12859-019-2882-6 PMID: 31757204) MIDASim (He M, Zhao N, Satten GA. MIDASim: a fast and simple simulator for realistic microbiome data. Available from: https://doi.org/10.1101/2023.03.23.533996), and sparseDOSSA2 (Ma S, Ren B, Mallick H, Moon YS, Schwager E, Maharjan S, et al. A statistical model for describing and simulating microbial community profiles. PLOS Comput Biol. 2021;17(9):e1008913. https://doi.org/10.1371/journal.pcbi.1008913 PMID: 34516542) , leveraging 38 real-world experimental templates (S3 Table) previously utilized in a benchmark study comparing DA tools. These datasets, drawn from diverse environments such as human gut, soil, and marine habitats, serve as the foundation for our simulation efforts. We employ the same 14 DA tests that were previously used with the same experimental data in benchmark studies alongside 8 DA tests that were developed subsequently. Initially, we will generate synthetic data closely mirroring the experimental datasets, incorporating a known truth to cover a broad range of real-world data characteristics. This approach allows us to assess the ability of DA methods to recover known true differential abundances. We will further simulate datasets by altering sparsity, effect size, and sample size, thus creating a comprehensive collection for applying the 22 DA tests. The outcomes, focusing on sensitivities and specificities, will provide insights into the performance of DA tests and their dependencies on sparsity, effect size, and sample size. Additionally, we will calculate data characteristics (S1 and S2 Table) for each simulated dataset and use a multiple regression to identify informative data characteristics influencing test performance. Our prior study, where we used simulated data without incorporating a known truth, demonstrated the feasibility of using synthetic data to validate experimental findings. This current study aims to enhance our understanding by systematically evaluating the impact of known truth incorporation on DA test performance, thereby providing further information for the selection and application of DA methods in microbiome research.https://doi.org/10.1371/journal.pone.0321452
spellingShingle Eva Kohnert
Clemens Kreutz
Benchmarking Differential Abundance Tests for 16S microbiome sequencing data using simulated data based on experimental templates.
PLoS ONE
title Benchmarking Differential Abundance Tests for 16S microbiome sequencing data using simulated data based on experimental templates.
title_full Benchmarking Differential Abundance Tests for 16S microbiome sequencing data using simulated data based on experimental templates.
title_fullStr Benchmarking Differential Abundance Tests for 16S microbiome sequencing data using simulated data based on experimental templates.
title_full_unstemmed Benchmarking Differential Abundance Tests for 16S microbiome sequencing data using simulated data based on experimental templates.
title_short Benchmarking Differential Abundance Tests for 16S microbiome sequencing data using simulated data based on experimental templates.
title_sort benchmarking differential abundance tests for 16s microbiome sequencing data using simulated data based on experimental templates
url https://doi.org/10.1371/journal.pone.0321452
work_keys_str_mv AT evakohnert benchmarkingdifferentialabundancetestsfor16smicrobiomesequencingdatausingsimulateddatabasedonexperimentaltemplates
AT clemenskreutz benchmarkingdifferentialabundancetestsfor16smicrobiomesequencingdatausingsimulateddatabasedonexperimentaltemplates