A Two-Step Algorithm for Handling Block-Wise Missing Data in Multi-Omics

High-throughput technologies produce large-scale omics datasets, and their integration facilitates biomarker discovery and predictive modeling. However, challenges such as data heterogeneity, high dimensionality, and block-wise missing data complicate the analysis. To address these issues, optimizat...

Full description

Saved in:
Bibliographic Details
Main Authors: Sergi Baena-Miret, Ferran Reverter, Alex Sánchez, Esteban Vegas
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/7/3650
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:High-throughput technologies produce large-scale omics datasets, and their integration facilitates biomarker discovery and predictive modeling. However, challenges such as data heterogeneity, high dimensionality, and block-wise missing data complicate the analysis. To address these issues, optimization techniques, including regularization and constraint-based approaches, have been already employed for regression and binary classification problems. Building on these methods, we extended this framework to support multi-class classification. Indeed, applied to a multi-class classification task for breast cancer subtypes, our model achieves accuracy between 73% and 81% under various block-wise missing data scenarios. Additionally, we assess its performance on a regression problem using the exposome dataset, integrating a larger number of omics datasets. Across different missing data scenarios, our model demonstrates a strong correlation (75%) between true and predicted responses. Furthermore, we have updated the bwm R package, which previously supported binary and continuous response types, to also include multi-class response types.
ISSN:2076-3417