D4Care: A Deep Dynamic Memory-Driven Cross-Modal Feature Representation Network for Clinical Outcome Prediction

With the advancement of information technology, artificial intelligence (AI) has demonstrated significant potential in clinical prediction, helping to improve the level of intelligent medical care. Current clinical practice primarily relies on patients’ time series data and clinical notes to predict...

Full description

Saved in:
Bibliographic Details
Main Authors: Binyue Chen, Guohua Liu
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/11/6054
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:With the advancement of information technology, artificial intelligence (AI) has demonstrated significant potential in clinical prediction, helping to improve the level of intelligent medical care. Current clinical practice primarily relies on patients’ time series data and clinical notes to predict health status and makes predictions by simply concatenating cross-modal features. However, they not only ignore the inherent correlation between cross-modal features, but also fail to analyze the collaborative representation of multi-granularity features from diverse perspectives. To address these challenges, we propose a deep dynamic memory-driven cross-modal feature representation network for clinical outcome prediction. Specifically, we use a Bi-directional Gated Recurrent Unit (BiGRU) network to capture dynamic features in time series data and a dual-view feature encoding model with sentence-aware and entity-aware capabilities to extract clinical text features from global semantic and local concept perspectives, respectively. Furthermore, we introduce a memory-driven cross-modal attention mechanism, which dynamically establishes deep correlations between clinical text and time series features through learnable memory matrices. In addition, we also introduce a memory-aware constrained layer normalization to alleviate the challenges of multi-modal feature heterogeneity. Besides, we use gating mechanisms and dynamic memory components to enable the model to learn feature information of different historical-current patterns, further improving the model’s performance. Lastly, we combine the integrated gradients for feature attribution analysis to enhance the model’s interpretability. Finally, we evaluate the model’s performance on the MIMIC-III dataset, and the experimental results demonstrate that the model outperforms current advanced baselines in clinical outcome prediction tasks. Notably, our model maintains high predictive accuracy and robustness even when faced with imbalanced data. It can also provide a new perspective for researchers in the field of AI medicine.
ISSN:2076-3417