Evaluating ChatGPT-4o for ophthalmic image interpretation: From in-context learning to code-free clinical tool generation
Background: Large language models (LLMs) such as ChatGPT-4o have demonstrated emerging capabilities in medical reasoning and image interpretation. However, their diagnostic applicability in ophthalmology, particularly across diverse imaging modalities, remains insufficiently characterized. This stud...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
KeAi Communications Co., Ltd.
2025-09-01
|
| Series: | Informatics and Health |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2949953425000219 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849388235479842816 |
|---|---|
| author | Joon Yul Choi Tae Keun Yoo |
| author_facet | Joon Yul Choi Tae Keun Yoo |
| author_sort | Joon Yul Choi |
| collection | DOAJ |
| description | Background: Large language models (LLMs) such as ChatGPT-4o have demonstrated emerging capabilities in medical reasoning and image interpretation. However, their diagnostic applicability in ophthalmology, particularly across diverse imaging modalities, remains insufficiently characterized. This study evaluates ChatGPT-4o’s performance in ophthalmic image interpretation, exemplar-guided reasoning (in-context learning), and code-free diagnostic tool generation using publicly available datasets. Methods: We assessed ChatGPT-4o through three clinically relevant tasks: (1) image interpretation without prior examples, using fundus, external ocular, and facial photographs representing key ophthalmic conditions; (2) in-context learning with example-based prompts to improve classification accuracy; and (3) generation of an interactive HTML-based decision-support tool from a clinical diagnostic algorithm. All evaluations were performed using open-access datasets without model fine-tuning Results: When interpreting images without reference examples, ChatGPT-4o achieved diagnostic accuracies of 90.3 % for diabetic retinopathy, 77.4 % for age-related macular degeneration, 100 % for conjunctival melanoma, 97.3 % for pterygium, and 85.7 % for strabismus subtypes. In-context learning consistently improved diagnostic performance across all modalities, with strabismus classification reaching 100 % accuracy. Compared to EfficientNetB2, ChatGPT-4o demonstrated comparable or superior performance in several diagnostic tasks. Additionally, the model successfully translated schematic clinical algorithms into functional, browser-based diagnostic tools using natural language prompts alone. Conclusions: ChatGPT-4o demonstrates promise in ophthalmic image interpretation and low-code clinical tool development, particularly when guided by in-context learning. However, these findings are based on a limited diagnostic spectrum and publicly available datasets. Broader clinical validation and head-to-head comparisons with domain-specific models are needed to establish its practical utility in ophthalmology. |
| format | Article |
| id | doaj-art-a0b6b21f8ddf4e17a9f37749e7f7b364 |
| institution | Kabale University |
| issn | 2949-9534 |
| language | English |
| publishDate | 2025-09-01 |
| publisher | KeAi Communications Co., Ltd. |
| record_format | Article |
| series | Informatics and Health |
| spelling | doaj-art-a0b6b21f8ddf4e17a9f37749e7f7b3642025-08-20T03:42:22ZengKeAi Communications Co., Ltd.Informatics and Health2949-95342025-09-012215816910.1016/j.infoh.2025.07.002Evaluating ChatGPT-4o for ophthalmic image interpretation: From in-context learning to code-free clinical tool generationJoon Yul Choi0Tae Keun Yoo1Department of Biomedical Engineering, Yonsei University, Wonju, South KoreaDepartment of Ophthalmology, Hangil Eye Hospital, Incheon, South Korea; Correspondence to: Department of Ophthalmology, Hangil Eye Hospital, 35 Bupyeong-daero, Bupyeong-gu, Incheon 21388, South Korea.Background: Large language models (LLMs) such as ChatGPT-4o have demonstrated emerging capabilities in medical reasoning and image interpretation. However, their diagnostic applicability in ophthalmology, particularly across diverse imaging modalities, remains insufficiently characterized. This study evaluates ChatGPT-4o’s performance in ophthalmic image interpretation, exemplar-guided reasoning (in-context learning), and code-free diagnostic tool generation using publicly available datasets. Methods: We assessed ChatGPT-4o through three clinically relevant tasks: (1) image interpretation without prior examples, using fundus, external ocular, and facial photographs representing key ophthalmic conditions; (2) in-context learning with example-based prompts to improve classification accuracy; and (3) generation of an interactive HTML-based decision-support tool from a clinical diagnostic algorithm. All evaluations were performed using open-access datasets without model fine-tuning Results: When interpreting images without reference examples, ChatGPT-4o achieved diagnostic accuracies of 90.3 % for diabetic retinopathy, 77.4 % for age-related macular degeneration, 100 % for conjunctival melanoma, 97.3 % for pterygium, and 85.7 % for strabismus subtypes. In-context learning consistently improved diagnostic performance across all modalities, with strabismus classification reaching 100 % accuracy. Compared to EfficientNetB2, ChatGPT-4o demonstrated comparable or superior performance in several diagnostic tasks. Additionally, the model successfully translated schematic clinical algorithms into functional, browser-based diagnostic tools using natural language prompts alone. Conclusions: ChatGPT-4o demonstrates promise in ophthalmic image interpretation and low-code clinical tool development, particularly when guided by in-context learning. However, these findings are based on a limited diagnostic spectrum and publicly available datasets. Broader clinical validation and head-to-head comparisons with domain-specific models are needed to establish its practical utility in ophthalmology.http://www.sciencedirect.com/science/article/pii/S2949953425000219Large language modelFundus photographyStrabismusIn-context learningOphthalmic diagnosisDecision-support tool |
| spellingShingle | Joon Yul Choi Tae Keun Yoo Evaluating ChatGPT-4o for ophthalmic image interpretation: From in-context learning to code-free clinical tool generation Informatics and Health Large language model Fundus photography Strabismus In-context learning Ophthalmic diagnosis Decision-support tool |
| title | Evaluating ChatGPT-4o for ophthalmic image interpretation: From in-context learning to code-free clinical tool generation |
| title_full | Evaluating ChatGPT-4o for ophthalmic image interpretation: From in-context learning to code-free clinical tool generation |
| title_fullStr | Evaluating ChatGPT-4o for ophthalmic image interpretation: From in-context learning to code-free clinical tool generation |
| title_full_unstemmed | Evaluating ChatGPT-4o for ophthalmic image interpretation: From in-context learning to code-free clinical tool generation |
| title_short | Evaluating ChatGPT-4o for ophthalmic image interpretation: From in-context learning to code-free clinical tool generation |
| title_sort | evaluating chatgpt 4o for ophthalmic image interpretation from in context learning to code free clinical tool generation |
| topic | Large language model Fundus photography Strabismus In-context learning Ophthalmic diagnosis Decision-support tool |
| url | http://www.sciencedirect.com/science/article/pii/S2949953425000219 |
| work_keys_str_mv | AT joonyulchoi evaluatingchatgpt4oforophthalmicimageinterpretationfromincontextlearningtocodefreeclinicaltoolgeneration AT taekeunyoo evaluatingchatgpt4oforophthalmicimageinterpretationfromincontextlearningtocodefreeclinicaltoolgeneration |