A Hybrid Vision Transformer and Graph Neural Network Model with Attention Mechanisms for Diabetic Retinopathy Detection
DOI:
https://doi.org/10.37965/jait.2025.0645Keywords:
diabetic retinopathy detection, Vision Transformer, Graph Neural Network, attention mechanisms, retinal fundus images, hybrid structureAbstract
Diabetic Retinopathy (DR) is a foremost reason of blindness worldwide due to diabetics, highlighting the need for early and accurate detection to prevent severe vision impairment. However, current DR detection methods often fall short in capturing the intricate relationships between retinal structures and struggle to effectively utilize both local and global features within retinal images. To deal these challenges, this study introduces a novel hybrid structure that combines Vision Transformers (ViTs) with Graph Neural Networks (GNNs), augmented by attention mechanisms, for identification and classification of DR using retinal fundus images.
The main objective is to build a robust structure that can accurately capture the complex spatial and temporal relationships within retinal images, thereby improving the precision and reliability of DR detection. The presented approach begins with bilateral filtering during the image pre-processing stage, which preserves essential structural details, such as blood vessels, while reducing noise. ViTs are incorporated to capture higher level features by grouping images into sequences of non-overlapping groups. These features are then used to construct spatial and temporal graphs, enabling the model to capture both detailed local information and broader sequential relationships within the retinal images. The integration of attention mechanisms within the GNNs allows the structure to concentrate on efficient features, further enhancing its detection capabilities.
The outcome results illustrates that the hybrid structure outperforms several cutting-edge approaches, achieving an accuracy of 93.2% and an AUC-ROC of 0.961 on the APTOS 2019 Blindness Detection dataset. Ablation studies underscore the significance of attention mechanisms and the synergistic use of spatial and temporal graphs. Despite the structure's strong performance, its complexity and computational demands may limit its feasibility in resource-constrained settings. Future research aims to optimize the structure for such environments and extend its application to other retinal diseases.
Metrics
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Authors

This work is licensed under a Creative Commons Attribution 4.0 International License.