Comparison of Deep Learning and Clinician Performance for Detecting Referable Glaucoma from Fundus Photographs in a Safety Net Population
Purpose: Develop and test a deep learning (DL) algorithm for detecting referable glaucoma. Design: Retrospective cohort study. Participants: A total of 6116 patients from the Los Angeles County (LAC) Department of Health Services (DHS) were included. Methods: Fundus photographs and patient-level lab...
Saved in:
| Main Authors: | , , , , , , , , , , , , , , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-07-01
|
| Series: | Ophthalmology Science |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2666914525000491 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Purpose: Develop and test a deep learning (DL) algorithm for detecting referable glaucoma. Design: Retrospective cohort study. Participants: A total of 6116 patients from the Los Angeles County (LAC) Department of Health Services (DHS) were included. Methods: Fundus photographs and patient-level labels of referable glaucoma (cup-to-disc ratio ≥0.6) provided by 21 certified optometrists. A DL algorithm based on the Visual Geometry Group-19 architecture was trained using patient-level labels generalized to images from both eyes. Area under the receiver operating curve (AUROC), sensitivity, and specificity were calculated to assess algorithm performance using an independent test set that was also graded by 13 clinicians with 0 to 10 years of experience. Algorithm performance was tested using reference labels provided by either LAC DHS optometrists or an expert panel of 3 glaucoma specialists. Main Outcome Measures: Area under the receiver operating curve, sensitivity, and specificity. Results: The DL algorithm was trained using 12 998 images from 5616 patients (2086 referable glaucoma, 3530 nonglaucoma). In this data set, the mean age was 56.8 ± 10.5 years with 54.8% women, 68.2% Latinos, 8.9% Blacks, 6.0% Asians, and 2.7% Whites. One thousand images from 500 patients (250 referable glaucoma, 250 nonglaucoma) with similar demographics (P ≥ 0.57) were used to test the algorithm. Algorithm performance matched or exceeded that of all independent clinician graders in detecting patient-level referable glaucoma based on LAC DHS optometrist (AUROC = 0.92) or expert panel (AUROC = 0.93) reference labels. Clinician grader sensitivity (range, 0.33–0.99) and specificity (range, 0.68–0.98) ranged widely and did not correlate with years of experience (P≥ 0.49). Algorithm performance (AUROC = 0.93) also matched or exceeded the sensitivity (range, 0.78–1.00) and specificity (range, 0.32–0.87) of 6 certified LAC DHS optometrists in the subsets of the test data set they graded. Conclusions: A DL algorithm for detecting referable glaucoma trained using patient-level data provided by certified LAC DHS optometrists approximates or exceeds performance by ophthalmologists and optometrists, who exhibit variable sensitivity and specificity unrelated to experience level. Implementation of this algorithm in screening workflows could help reallocate resources and provide more reproducible and timely glaucoma care. Financial Disclosure(s): Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article. |
|---|---|
| ISSN: | 2666-9145 |