Medical Report Generation With Knowledge Distillation and Multi-Stage Hierarchical Attention in Vision Transformer Encoder and GPT-2 Decoder
Automated medical report generation is a challenging task that involves synthesizing diagnostic findings and clinical observations from medical images. In this study, we propose a novel framework that integrates knowledge distillation and multi-stage hierarchical attention mechanisms to enhance the...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11078274/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Automated medical report generation is a challenging task that involves synthesizing diagnostic findings and clinical observations from medical images. In this study, we propose a novel framework that integrates knowledge distillation and multi-stage hierarchical attention mechanisms to enhance the generation of comprehensive and accurate medical reports. Our approach leverages knowledge distillation with Vision Transformer (ViT) as the image encoder to capture complex visual features, the model benefits from knowledge distillation, transferring knowledge from an ensemble of Convolutional Neural Networks (CNNs) – including VGG16, InceptionV3, and DenseNet121 – to the ViT, ensuring rich and diverse feature extraction. The GPT-2 used as decoder for generating coherent and contextually relevant narratives. The multi-stage hierarchical attention mechanism further refines this process by progressively focusing on key image regions and aligning them with the generated textual content. On the MIMIC-CXR dataset, our model achieved a BLEU score of 0.127 with precision 0.8832 for the abnormalities, demonstrating notable improvements over previous methods. Further analysis reveals that our approach enhances the generation of detailed and accurate medical reports, as validated by both quantitative metrics and qualitative assessments, reinforcing its effectiveness in capturing critical clinical information. |
|---|---|
| ISSN: | 2169-3536 |