MIDAA: deep archetypal analysis for interpretable multi-omic data integration based on biological principles

Abstract High-throughput multi-omic molecular profiling allows the probing of biological systems at unprecedented resolution. However, integrating and interpreting high-dimensional, sparse, and noisy multimodal datasets remains challenging. Deriving new biological insights with current methods is di...

Full description

Saved in:
Bibliographic Details
Main Authors: Salvatore Milite, Giulio Caravagna, Andrea Sottoriva
Format: Article
Language:English
Published: BMC 2025-04-01
Series:Genome Biology
Online Access:https://doi.org/10.1186/s13059-025-03530-9
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract High-throughput multi-omic molecular profiling allows the probing of biological systems at unprecedented resolution. However, integrating and interpreting high-dimensional, sparse, and noisy multimodal datasets remains challenging. Deriving new biological insights with current methods is difficult because they are not rooted in biological principles but prioritise tasks like dimensionality reduction. Here, we introduce a framework that combines archetypal analysis, an approach grounded in biological principles, with deep learning. Using archetypes based on evolutionary trade-offs and Pareto optimality, MIDAA finds extreme data points that define the geometry of the latent space, preserving the complexity of biological interactions while retaining an interpretable output. We demonstrate that these extreme points represent cellular programmes reflecting the underlying biology. Moreover, we show that, compared to alternative methods, MIDAA can identify parsimonious, interpretable, and biologically relevant patterns from real and simulated multi-omics.
ISSN:1474-760X