A deep semi-supervised learning approach to the detection of glaucoma on out-of-distribution retinal fundus image datasets
Abstract Background Accurate detection of glaucoma plays a critical role in treating the disease and can be performed on limited labeled retinal fundus images and large-scale unlabeled ones leveraging a deep semi-supervised learning (SSL) technology. This study aims to investigate how glaucoma depic...
Saved in:
| Main Authors: | , , , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-05-01
|
| Series: | BMC Ophthalmology |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s12886-025-04153-1 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Background Accurate detection of glaucoma plays a critical role in treating the disease and can be performed on limited labeled retinal fundus images and large-scale unlabeled ones leveraging a deep semi-supervised learning (SSL) technology. This study aims to investigate how glaucoma depicted on fundus images can be reliably detected by the SSL technology and the impact of the quantities and qualities of unlabeled images on the outcome. Methods We retrospectively collected a dataset consisting of 7,503 fundus images and classified them into four categories, namely none, mild, moderate, or severe glaucoma. We used the collected dataset and a public out-of-distribution (OOD) dataset (EyeQ) to train an available SSL method (called SRC-MT) to grade glaucoma. Results SRC-MT achieved an average area under the receiver operating characteristic curve (AUC) of 0.8944 and 0.8969 on global field-of-view (FOV) regions and local disc regions, respectively when trained on 600 labeled images and 5401 unlabeled ones from the collected dataset. When separately introducing 16,817, 6,435, and 5,540 unlabeled OOD images with the qualities of ‘good’, ‘usable’, and ‘reject’ from the EyeQ dataset into 5,401 unlabeled images, its performance became 0.8972, 0.8908, and 0.8922, respectively for global FOV regions on the testing subset from the collected dataset, and 0.7342, 0.5090, and 0.5072 on three public datasets (i.e., AIROGS, EDDFS, and FIVES). Conclusions SRC-MT achieved promising performance for glaucoma grading, especially in global FOV regions. Its performance increased when using more labeled images, but degraded when using more unlabeled OOD images with worse image qualities. |
|---|---|
| ISSN: | 1471-2415 |