Group-wise normalization in differential abundance analysis of microbiome samples
Abstract Background A key challenge in differential abundance analysis (DAA) of microbial sequencing data is that the counts for each sample are compositional, resulting in potentially biased comparisons of the absolute abundance across study groups. Normalization-based DAA methods rely on external...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-07-01
|
| Series: | BMC Bioinformatics |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s12859-025-06235-9 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849738388353056768 |
|---|---|
| author | Dylan Clark-Boucher Brent A. Coull Harrison T. Reeder Fenglei Wang Qi Sun Jacqueline R. Starr Kyu Ha Lee |
| author_facet | Dylan Clark-Boucher Brent A. Coull Harrison T. Reeder Fenglei Wang Qi Sun Jacqueline R. Starr Kyu Ha Lee |
| author_sort | Dylan Clark-Boucher |
| collection | DOAJ |
| description | Abstract Background A key challenge in differential abundance analysis (DAA) of microbial sequencing data is that the counts for each sample are compositional, resulting in potentially biased comparisons of the absolute abundance across study groups. Normalization-based DAA methods rely on external normalization factors that account for compositionality by standardizing the counts onto a common numerical scale. However, existing normalization methods have struggled to maintain the false discovery rate in settings where the variance or compositional bias is large. This article proposes a novel framework for normalization that can reduce bias in DAA by re-conceptualizing normalization as a group-level task. We present two new normalization methods within the group-wise framework: group-wise relative log expression (G-RLE) and fold-truncated sum scaling (FTSS). Results G-RLE and FTSS achieve higher statistical power for identifying differentially abundant taxa than existing methods in model-based and synthetic data simulation settings. The two novel methods also maintain the false discovery rate in challenging scenarios where existing methods suffer. The best results are obtained from using FTSS normalization with the DAA method MetagenomeSeq. Conclusion Compared with other methods for normalizing compositional sequence count data prior to DAA, the proposed group-level normalization frameworks offer more robust statistical inference. With a solid mathematical foundation, validated performance in numerical studies, and publicly available software, these new methods can help improve rigor and reproducibility in microbiome research. |
| format | Article |
| id | doaj-art-ec81b612eb7248f2b0152ed4ec5c3389 |
| institution | DOAJ |
| issn | 1471-2105 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | BMC |
| record_format | Article |
| series | BMC Bioinformatics |
| spelling | doaj-art-ec81b612eb7248f2b0152ed4ec5c33892025-08-20T03:06:36ZengBMCBMC Bioinformatics1471-21052025-07-0126111710.1186/s12859-025-06235-9Group-wise normalization in differential abundance analysis of microbiome samplesDylan Clark-Boucher0Brent A. Coull1Harrison T. Reeder2Fenglei Wang3Qi Sun4Jacqueline R. Starr5Kyu Ha Lee6Department of Biostatistics, Harvard TH Chan School of Public HealthDepartment of Biostatistics, Harvard TH Chan School of Public HealthBiostatistics, Massachusetts General HospitalDepartment of Nutrition, Harvard TH Chan School of Public HealthDepartment of Nutrition, Harvard TH Chan School of Public HealthChanning Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical SchoolDepartment of Biostatistics, Harvard TH Chan School of Public HealthAbstract Background A key challenge in differential abundance analysis (DAA) of microbial sequencing data is that the counts for each sample are compositional, resulting in potentially biased comparisons of the absolute abundance across study groups. Normalization-based DAA methods rely on external normalization factors that account for compositionality by standardizing the counts onto a common numerical scale. However, existing normalization methods have struggled to maintain the false discovery rate in settings where the variance or compositional bias is large. This article proposes a novel framework for normalization that can reduce bias in DAA by re-conceptualizing normalization as a group-level task. We present two new normalization methods within the group-wise framework: group-wise relative log expression (G-RLE) and fold-truncated sum scaling (FTSS). Results G-RLE and FTSS achieve higher statistical power for identifying differentially abundant taxa than existing methods in model-based and synthetic data simulation settings. The two novel methods also maintain the false discovery rate in challenging scenarios where existing methods suffer. The best results are obtained from using FTSS normalization with the DAA method MetagenomeSeq. Conclusion Compared with other methods for normalizing compositional sequence count data prior to DAA, the proposed group-level normalization frameworks offer more robust statistical inference. With a solid mathematical foundation, validated performance in numerical studies, and publicly available software, these new methods can help improve rigor and reproducibility in microbiome research.https://doi.org/10.1186/s12859-025-06235-9MicrobiomeNormalizationCompositional dataDifferential abundance analysis |
| spellingShingle | Dylan Clark-Boucher Brent A. Coull Harrison T. Reeder Fenglei Wang Qi Sun Jacqueline R. Starr Kyu Ha Lee Group-wise normalization in differential abundance analysis of microbiome samples BMC Bioinformatics Microbiome Normalization Compositional data Differential abundance analysis |
| title | Group-wise normalization in differential abundance analysis of microbiome samples |
| title_full | Group-wise normalization in differential abundance analysis of microbiome samples |
| title_fullStr | Group-wise normalization in differential abundance analysis of microbiome samples |
| title_full_unstemmed | Group-wise normalization in differential abundance analysis of microbiome samples |
| title_short | Group-wise normalization in differential abundance analysis of microbiome samples |
| title_sort | group wise normalization in differential abundance analysis of microbiome samples |
| topic | Microbiome Normalization Compositional data Differential abundance analysis |
| url | https://doi.org/10.1186/s12859-025-06235-9 |
| work_keys_str_mv | AT dylanclarkboucher groupwisenormalizationindifferentialabundanceanalysisofmicrobiomesamples AT brentacoull groupwisenormalizationindifferentialabundanceanalysisofmicrobiomesamples AT harrisontreeder groupwisenormalizationindifferentialabundanceanalysisofmicrobiomesamples AT fengleiwang groupwisenormalizationindifferentialabundanceanalysisofmicrobiomesamples AT qisun groupwisenormalizationindifferentialabundanceanalysisofmicrobiomesamples AT jacquelinerstarr groupwisenormalizationindifferentialabundanceanalysisofmicrobiomesamples AT kyuhalee groupwisenormalizationindifferentialabundanceanalysisofmicrobiomesamples |