D4Care: A Deep Dynamic Memory-Driven Cross-Modal Feature Representation Network for Clinical Outcome Prediction
With the advancement of information technology, artificial intelligence (AI) has demonstrated significant potential in clinical prediction, helping to improve the level of intelligent medical care. Current clinical practice primarily relies on patients’ time series data and clinical notes to predict...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/11/6054 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | With the advancement of information technology, artificial intelligence (AI) has demonstrated significant potential in clinical prediction, helping to improve the level of intelligent medical care. Current clinical practice primarily relies on patients’ time series data and clinical notes to predict health status and makes predictions by simply concatenating cross-modal features. However, they not only ignore the inherent correlation between cross-modal features, but also fail to analyze the collaborative representation of multi-granularity features from diverse perspectives. To address these challenges, we propose a deep dynamic memory-driven cross-modal feature representation network for clinical outcome prediction. Specifically, we use a Bi-directional Gated Recurrent Unit (BiGRU) network to capture dynamic features in time series data and a dual-view feature encoding model with sentence-aware and entity-aware capabilities to extract clinical text features from global semantic and local concept perspectives, respectively. Furthermore, we introduce a memory-driven cross-modal attention mechanism, which dynamically establishes deep correlations between clinical text and time series features through learnable memory matrices. In addition, we also introduce a memory-aware constrained layer normalization to alleviate the challenges of multi-modal feature heterogeneity. Besides, we use gating mechanisms and dynamic memory components to enable the model to learn feature information of different historical-current patterns, further improving the model’s performance. Lastly, we combine the integrated gradients for feature attribution analysis to enhance the model’s interpretability. Finally, we evaluate the model’s performance on the MIMIC-III dataset, and the experimental results demonstrate that the model outperforms current advanced baselines in clinical outcome prediction tasks. Notably, our model maintains high predictive accuracy and robustness even when faced with imbalanced data. It can also provide a new perspective for researchers in the field of AI medicine. |
|---|---|
| ISSN: | 2076-3417 |