A recurrent multimodal sparse transformer framework for gastrointestinal disease classification

Abstract Accurate and early diagnosis of gastrointestinal (GI) tract diseases is essential for effective treatment planning and improved patient outcomes. However, existing diagnostic frameworks often face limitations due to modality imbalance, feature redundancy, and cross-modal inconsistencies, pa...

Full description

Saved in:
Bibliographic Details
Main Authors: V. Sharmila, S. Geetha
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-08897-0
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Accurate and early diagnosis of gastrointestinal (GI) tract diseases is essential for effective treatment planning and improved patient outcomes. However, existing diagnostic frameworks often face limitations due to modality imbalance, feature redundancy, and cross-modal inconsistencies, particularly when dealing with heterogeneous data such as medical text and endoscopic images. To bridge these gaps, this study proposes a novel recurrent multimodal principal gradient K-proximal sparse transformer (RMP-GKPS-transformer) framework for comprehensive GI disease classification. The approach integrates clinical text and WCE images using a robust multi-modal fusion strategy that incorporates Bio-RoBERTa for textual feature extraction, a graph vision spatial channel attention transformer network for image feature learning, and cross-attention mechanisms for modality alignment. Further, the model employs principal component analysis (PCA) for dimensionality reduction and gradient boosting machines (GBMs) for semantic conflict resolution. Classification is performed using an ensemble of random forest KNN, proximal policy optimization (PPO), and a sparse radial basis function (RBF) kernel to ensure accuracy and interpretability. Experimental evaluation on publicly available datasets achieved 99.82% accuracy, a Dice coefficient of 98.7%, and significantly lower execution time compared to state-of-the-art methods. The results confirm the framework’s effectiveness in aligning and leveraging multi-modal data for precise classification of six GI diseases, offering a scalable and interpretable solution for enhanced clinical decision-making in gastroenterology.
ISSN:2045-2322