Enhanced Transformer Network With High-Dimensional Attention Mechanism for Diabetic Retinopathy Classification

Diabetic Retinopathy (DR) is a severe condition that affects diabetic patients, potentially leading to irreversible vision loss if not addressed in its early stage. DR is classified into two types: Non-Proliferative DR (NPDR), the initial stage, and Proliferative DR (PDR), the advanced stage. The pr...

Full description

Saved in:

Bibliographic Details
Main Authors:	M. Rizvana, Sathiya Narayanan
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Deep learning neural network attention mechanism vision transformer
Online Access:	https://ieeexplore.ieee.org/document/11080385/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Diabetic Retinopathy (DR) is a severe condition that affects diabetic patients, potentially leading to irreversible vision loss if not addressed in its early stage. DR is classified into two types: Non-Proliferative DR (NPDR), the initial stage, and Proliferative DR (PDR), the advanced stage. The progression of DR includes mild NPDR, moderate NPDR, severe NPDR, and PDR. Various deep learning-based detection and classification algorithms have been developed to identify and categorize the disease. Prominent among them are Convolutional Neural Networks (CNN), Recurrent Neural Networks, Generative Adversarial Networks, and the Vision Transformer (ViT). This study aims at developing an efficient model for precise DR classification. This work proposes an enhanced transformer network, termed Vision Transformer with High Dimensional Attention (HDA-ViT), which incorporates high-dimensional spatial and channel attention ahead of patch embedding in ViT. The rationale behind incorporating this dual attention mechanism is to make the ViT focus on the most relevant portion of images. In addition, the classification head present in the ViT block incorporates sequential dropout layer in addition to the standard linear dropout layer. For experimentation, APTOS 2019 fundus image dataset containing 3662 images falling under 5 classes is used. Experimental results show that the proposed HDA-ViT network results in a classification accuracy of 99.32%, and outperforms the state-of-the-art techniques for DR classification.
ISSN:	2169-3536

Enhanced Transformer Network With High-Dimensional Attention Mechanism for Diabetic Retinopathy Classification

Similar Items