Characterizing the omics landscape based on 10,000+ datasets

Abstract The characteristics of data produced by omics technologies are pivotal, as they critically influence the feasibility and effectiveness of computational methods applied in downstream analyses, such as data harmonization and differential abundance analyses. Furthermore, variability in these d...

Full description

Saved in:
Bibliographic Details
Main Authors: Eva Brombacher, Oliver Schilling, Clemens Kreutz
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-025-87256-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832585766022152192
author Eva Brombacher
Oliver Schilling
Clemens Kreutz
author_facet Eva Brombacher
Oliver Schilling
Clemens Kreutz
author_sort Eva Brombacher
collection DOAJ
description Abstract The characteristics of data produced by omics technologies are pivotal, as they critically influence the feasibility and effectiveness of computational methods applied in downstream analyses, such as data harmonization and differential abundance analyses. Furthermore, variability in these data characteristics across datasets plays a crucial role, leading to diverging outcomes in benchmarking studies, which are essential for guiding the selection of appropriate analysis methods in all omics fields. Additionally, downstream analysis tools are often developed and applied within specific omics communities due to the presumed differences in data characteristics attributed to each omics technology. In this study, we investigate over ten thousand datasets to understand how proteomics, metabolomics, lipidomics, transcriptomics, and microbiome data vary in specific data characteristics. We were able to show patterns of data characteristics specific to the investigated omics types and provide a tool that enables researchers to assess how representative a given omics dataset is for its respective discipline. Moreover, we illustrate how data characteristics can impact analyses at the example of normalization in the presence of sample-dependent proportions of missing values. Given the variability of omics data characteristics, we encourage the systematic inspection of these characteristics in benchmark studies and for downstream analyses to prevent suboptimal method selection and unintended bias.
format Article
id doaj-art-365579adddeb4857be0e702e844e748f
institution Kabale University
issn 2045-2322
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-365579adddeb4857be0e702e844e748f2025-01-26T12:31:04ZengNature PortfolioScientific Reports2045-23222025-01-0115111210.1038/s41598-025-87256-5Characterizing the omics landscape based on 10,000+ datasetsEva Brombacher0Oliver Schilling1Clemens Kreutz2Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center-University of FreiburgInstitute for Surgical Pathology, Medical Center-University of Freiburg, Faculty of Medicine, University of FreiburgInstitute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center-University of FreiburgAbstract The characteristics of data produced by omics technologies are pivotal, as they critically influence the feasibility and effectiveness of computational methods applied in downstream analyses, such as data harmonization and differential abundance analyses. Furthermore, variability in these data characteristics across datasets plays a crucial role, leading to diverging outcomes in benchmarking studies, which are essential for guiding the selection of appropriate analysis methods in all omics fields. Additionally, downstream analysis tools are often developed and applied within specific omics communities due to the presumed differences in data characteristics attributed to each omics technology. In this study, we investigate over ten thousand datasets to understand how proteomics, metabolomics, lipidomics, transcriptomics, and microbiome data vary in specific data characteristics. We were able to show patterns of data characteristics specific to the investigated omics types and provide a tool that enables researchers to assess how representative a given omics dataset is for its respective discipline. Moreover, we illustrate how data characteristics can impact analyses at the example of normalization in the presence of sample-dependent proportions of missing values. Given the variability of omics data characteristics, we encourage the systematic inspection of these characteristics in benchmark studies and for downstream analyses to prevent suboptimal method selection and unintended bias.https://doi.org/10.1038/s41598-025-87256-5
spellingShingle Eva Brombacher
Oliver Schilling
Clemens Kreutz
Characterizing the omics landscape based on 10,000+ datasets
Scientific Reports
title Characterizing the omics landscape based on 10,000+ datasets
title_full Characterizing the omics landscape based on 10,000+ datasets
title_fullStr Characterizing the omics landscape based on 10,000+ datasets
title_full_unstemmed Characterizing the omics landscape based on 10,000+ datasets
title_short Characterizing the omics landscape based on 10,000+ datasets
title_sort characterizing the omics landscape based on 10 000 datasets
url https://doi.org/10.1038/s41598-025-87256-5
work_keys_str_mv AT evabrombacher characterizingtheomicslandscapebasedon10000datasets
AT oliverschilling characterizingtheomicslandscapebasedon10000datasets
AT clemenskreutz characterizingtheomicslandscapebasedon10000datasets