Multimodal Retrieval Method for Images and Diagnostic Reports Using Cross-Attention
<b>Background:</b> Conventional medical image retrieval methods treat images and text as independent embeddings, limiting their ability to fully utilize the complementary information from both modalities. This separation often results in suboptimal retrieval performance, as the intricate...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-02-01
|
| Series: | AI |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2673-2688/6/2/38 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | <b>Background:</b> Conventional medical image retrieval methods treat images and text as independent embeddings, limiting their ability to fully utilize the complementary information from both modalities. This separation often results in suboptimal retrieval performance, as the intricate relationships between images and text remain underexplored. <b>Methods:</b> To address this limitation, we propose a novel retrieval method that integrates medical image and text embeddings using a cross-attention mechanism. Our approach creates a unified representation by directly modeling the interactions between the two modalities, significantly enhancing retrieval accuracy. <b>Results:</b> Built upon the pre-trained BioMedCLIP model, our method outperforms existing techniques across multiple metrics, achieving the highest mean Average Precision (mAP) on the MIMIC-CXR dataset. <b>Conclusions:</b> These results highlight the effectiveness of our method in advancing multimodal medical image retrieval and set the stage for further innovation in the field. |
|---|---|
| ISSN: | 2673-2688 |