Characterizing the omics landscape based on 10,000+ datasets

Abstract The characteristics of data produced by omics technologies are pivotal, as they critically influence the feasibility and effectiveness of computational methods applied in downstream analyses, such as data harmonization and differential abundance analyses. Furthermore, variability in these d...

Full description

Saved in:

Bibliographic Details
Main Authors:	Eva Brombacher, Oliver Schilling, Clemens Kreutz
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-01-01
Series:	Scientific Reports
Online Access:	https://doi.org/10.1038/s41598-025-87256-5
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832585766022152192
author	Eva Brombacher Oliver Schilling Clemens Kreutz
author_facet	Eva Brombacher Oliver Schilling Clemens Kreutz
author_sort	Eva Brombacher
collection	DOAJ
description	Abstract The characteristics of data produced by omics technologies are pivotal, as they critically influence the feasibility and effectiveness of computational methods applied in downstream analyses, such as data harmonization and differential abundance analyses. Furthermore, variability in these data characteristics across datasets plays a crucial role, leading to diverging outcomes in benchmarking studies, which are essential for guiding the selection of appropriate analysis methods in all omics fields. Additionally, downstream analysis tools are often developed and applied within specific omics communities due to the presumed differences in data characteristics attributed to each omics technology. In this study, we investigate over ten thousand datasets to understand how proteomics, metabolomics, lipidomics, transcriptomics, and microbiome data vary in specific data characteristics. We were able to show patterns of data characteristics specific to the investigated omics types and provide a tool that enables researchers to assess how representative a given omics dataset is for its respective discipline. Moreover, we illustrate how data characteristics can impact analyses at the example of normalization in the presence of sample-dependent proportions of missing values. Given the variability of omics data characteristics, we encourage the systematic inspection of these characteristics in benchmark studies and for downstream analyses to prevent suboptimal method selection and unintended bias.
format	Article
id	doaj-art-365579adddeb4857be0e702e844e748f
institution	Kabale University
issn	2045-2322
language	English
publishDate	2025-01-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-365579adddeb4857be0e702e844e748f2025-01-26T12:31:04ZengNature PortfolioScientific Reports2045-23222025-01-0115111210.1038/s41598-025-87256-5Characterizing the omics landscape based on 10,000+ datasetsEva Brombacher0Oliver Schilling1Clemens Kreutz2Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center-University of FreiburgInstitute for Surgical Pathology, Medical Center-University of Freiburg, Faculty of Medicine, University of FreiburgInstitute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center-University of FreiburgAbstract The characteristics of data produced by omics technologies are pivotal, as they critically influence the feasibility and effectiveness of computational methods applied in downstream analyses, such as data harmonization and differential abundance analyses. Furthermore, variability in these data characteristics across datasets plays a crucial role, leading to diverging outcomes in benchmarking studies, which are essential for guiding the selection of appropriate analysis methods in all omics fields. Additionally, downstream analysis tools are often developed and applied within specific omics communities due to the presumed differences in data characteristics attributed to each omics technology. In this study, we investigate over ten thousand datasets to understand how proteomics, metabolomics, lipidomics, transcriptomics, and microbiome data vary in specific data characteristics. We were able to show patterns of data characteristics specific to the investigated omics types and provide a tool that enables researchers to assess how representative a given omics dataset is for its respective discipline. Moreover, we illustrate how data characteristics can impact analyses at the example of normalization in the presence of sample-dependent proportions of missing values. Given the variability of omics data characteristics, we encourage the systematic inspection of these characteristics in benchmark studies and for downstream analyses to prevent suboptimal method selection and unintended bias.https://doi.org/10.1038/s41598-025-87256-5
spellingShingle	Eva Brombacher Oliver Schilling Clemens Kreutz Characterizing the omics landscape based on 10,000+ datasets Scientific Reports
title	Characterizing the omics landscape based on 10,000+ datasets
title_full	Characterizing the omics landscape based on 10,000+ datasets
title_fullStr	Characterizing the omics landscape based on 10,000+ datasets
title_full_unstemmed	Characterizing the omics landscape based on 10,000+ datasets
title_short	Characterizing the omics landscape based on 10,000+ datasets
title_sort	characterizing the omics landscape based on 10 000 datasets
url	https://doi.org/10.1038/s41598-025-87256-5
work_keys_str_mv	AT evabrombacher characterizingtheomicslandscapebasedon10000datasets AT oliverschilling characterizingtheomicslandscapebasedon10000datasets AT clemenskreutz characterizingtheomicslandscapebasedon10000datasets

Characterizing the omics landscape based on 10,000+ datasets

Similar Items