Enhanced Transformer Network With High-Dimensional Attention Mechanism for Diabetic Retinopathy Classification
Diabetic Retinopathy (DR) is a severe condition that affects diabetic patients, potentially leading to irreversible vision loss if not addressed in its early stage. DR is classified into two types: Non-Proliferative DR (NPDR), the initial stage, and Proliferative DR (PDR), the advanced stage. The pr...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11080385/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Diabetic Retinopathy (DR) is a severe condition that affects diabetic patients, potentially leading to irreversible vision loss if not addressed in its early stage. DR is classified into two types: Non-Proliferative DR (NPDR), the initial stage, and Proliferative DR (PDR), the advanced stage. The progression of DR includes mild NPDR, moderate NPDR, severe NPDR, and PDR. Various deep learning-based detection and classification algorithms have been developed to identify and categorize the disease. Prominent among them are Convolutional Neural Networks (CNN), Recurrent Neural Networks, Generative Adversarial Networks, and the Vision Transformer (ViT). This study aims at developing an efficient model for precise DR classification. This work proposes an enhanced transformer network, termed Vision Transformer with High Dimensional Attention (HDA-ViT), which incorporates high-dimensional spatial and channel attention ahead of patch embedding in ViT. The rationale behind incorporating this dual attention mechanism is to make the ViT focus on the most relevant portion of images. In addition, the classification head present in the ViT block incorporates sequential dropout layer in addition to the standard linear dropout layer. For experimentation, APTOS 2019 fundus image dataset containing 3662 images falling under 5 classes is used. Experimental results show that the proposed HDA-ViT network results in a classification accuracy of 99.32%, and outperforms the state-of-the-art techniques for DR classification. |
|---|---|
| ISSN: | 2169-3536 |