Characterizing the omics landscape based on 10,000+ datasets
Abstract The characteristics of data produced by omics technologies are pivotal, as they critically influence the feasibility and effectiveness of computational methods applied in downstream analyses, such as data harmonization and differential abundance analyses. Furthermore, variability in these d...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-01-01
|
Series: | Scientific Reports |
Online Access: | https://doi.org/10.1038/s41598-025-87256-5 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832585766022152192 |
---|---|
author | Eva Brombacher Oliver Schilling Clemens Kreutz |
author_facet | Eva Brombacher Oliver Schilling Clemens Kreutz |
author_sort | Eva Brombacher |
collection | DOAJ |
description | Abstract The characteristics of data produced by omics technologies are pivotal, as they critically influence the feasibility and effectiveness of computational methods applied in downstream analyses, such as data harmonization and differential abundance analyses. Furthermore, variability in these data characteristics across datasets plays a crucial role, leading to diverging outcomes in benchmarking studies, which are essential for guiding the selection of appropriate analysis methods in all omics fields. Additionally, downstream analysis tools are often developed and applied within specific omics communities due to the presumed differences in data characteristics attributed to each omics technology. In this study, we investigate over ten thousand datasets to understand how proteomics, metabolomics, lipidomics, transcriptomics, and microbiome data vary in specific data characteristics. We were able to show patterns of data characteristics specific to the investigated omics types and provide a tool that enables researchers to assess how representative a given omics dataset is for its respective discipline. Moreover, we illustrate how data characteristics can impact analyses at the example of normalization in the presence of sample-dependent proportions of missing values. Given the variability of omics data characteristics, we encourage the systematic inspection of these characteristics in benchmark studies and for downstream analyses to prevent suboptimal method selection and unintended bias. |
format | Article |
id | doaj-art-365579adddeb4857be0e702e844e748f |
institution | Kabale University |
issn | 2045-2322 |
language | English |
publishDate | 2025-01-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj-art-365579adddeb4857be0e702e844e748f2025-01-26T12:31:04ZengNature PortfolioScientific Reports2045-23222025-01-0115111210.1038/s41598-025-87256-5Characterizing the omics landscape based on 10,000+ datasetsEva Brombacher0Oliver Schilling1Clemens Kreutz2Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center-University of FreiburgInstitute for Surgical Pathology, Medical Center-University of Freiburg, Faculty of Medicine, University of FreiburgInstitute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center-University of FreiburgAbstract The characteristics of data produced by omics technologies are pivotal, as they critically influence the feasibility and effectiveness of computational methods applied in downstream analyses, such as data harmonization and differential abundance analyses. Furthermore, variability in these data characteristics across datasets plays a crucial role, leading to diverging outcomes in benchmarking studies, which are essential for guiding the selection of appropriate analysis methods in all omics fields. Additionally, downstream analysis tools are often developed and applied within specific omics communities due to the presumed differences in data characteristics attributed to each omics technology. In this study, we investigate over ten thousand datasets to understand how proteomics, metabolomics, lipidomics, transcriptomics, and microbiome data vary in specific data characteristics. We were able to show patterns of data characteristics specific to the investigated omics types and provide a tool that enables researchers to assess how representative a given omics dataset is for its respective discipline. Moreover, we illustrate how data characteristics can impact analyses at the example of normalization in the presence of sample-dependent proportions of missing values. Given the variability of omics data characteristics, we encourage the systematic inspection of these characteristics in benchmark studies and for downstream analyses to prevent suboptimal method selection and unintended bias.https://doi.org/10.1038/s41598-025-87256-5 |
spellingShingle | Eva Brombacher Oliver Schilling Clemens Kreutz Characterizing the omics landscape based on 10,000+ datasets Scientific Reports |
title | Characterizing the omics landscape based on 10,000+ datasets |
title_full | Characterizing the omics landscape based on 10,000+ datasets |
title_fullStr | Characterizing the omics landscape based on 10,000+ datasets |
title_full_unstemmed | Characterizing the omics landscape based on 10,000+ datasets |
title_short | Characterizing the omics landscape based on 10,000+ datasets |
title_sort | characterizing the omics landscape based on 10 000 datasets |
url | https://doi.org/10.1038/s41598-025-87256-5 |
work_keys_str_mv | AT evabrombacher characterizingtheomicslandscapebasedon10000datasets AT oliverschilling characterizingtheomicslandscapebasedon10000datasets AT clemenskreutz characterizingtheomicslandscapebasedon10000datasets |