Characterizing the omics landscape based on 10,000+ datasets

Abstract The characteristics of data produced by omics technologies are pivotal, as they critically influence the feasibility and effectiveness of computational methods applied in downstream analyses, such as data harmonization and differential abundance analyses. Furthermore, variability in these d...

Full description

Saved in:
Bibliographic Details
Main Authors: Eva Brombacher, Oliver Schilling, Clemens Kreutz
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-025-87256-5
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract The characteristics of data produced by omics technologies are pivotal, as they critically influence the feasibility and effectiveness of computational methods applied in downstream analyses, such as data harmonization and differential abundance analyses. Furthermore, variability in these data characteristics across datasets plays a crucial role, leading to diverging outcomes in benchmarking studies, which are essential for guiding the selection of appropriate analysis methods in all omics fields. Additionally, downstream analysis tools are often developed and applied within specific omics communities due to the presumed differences in data characteristics attributed to each omics technology. In this study, we investigate over ten thousand datasets to understand how proteomics, metabolomics, lipidomics, transcriptomics, and microbiome data vary in specific data characteristics. We were able to show patterns of data characteristics specific to the investigated omics types and provide a tool that enables researchers to assess how representative a given omics dataset is for its respective discipline. Moreover, we illustrate how data characteristics can impact analyses at the example of normalization in the presence of sample-dependent proportions of missing values. Given the variability of omics data characteristics, we encourage the systematic inspection of these characteristics in benchmark studies and for downstream analyses to prevent suboptimal method selection and unintended bias.
ISSN:2045-2322