Evaluating ChatGPT-4o for ophthalmic image interpretation: From in-context learning to code-free clinical tool generation

Background: Large language models (LLMs) such as ChatGPT-4o have demonstrated emerging capabilities in medical reasoning and image interpretation. However, their diagnostic applicability in ophthalmology, particularly across diverse imaging modalities, remains insufficiently characterized. This stud...

Full description

Saved in:
Bibliographic Details
Main Authors: Joon Yul Choi, Tae Keun Yoo
Format: Article
Language:English
Published: KeAi Communications Co., Ltd. 2025-09-01
Series:Informatics and Health
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2949953425000219
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849388235479842816
author Joon Yul Choi
Tae Keun Yoo
author_facet Joon Yul Choi
Tae Keun Yoo
author_sort Joon Yul Choi
collection DOAJ
description Background: Large language models (LLMs) such as ChatGPT-4o have demonstrated emerging capabilities in medical reasoning and image interpretation. However, their diagnostic applicability in ophthalmology, particularly across diverse imaging modalities, remains insufficiently characterized. This study evaluates ChatGPT-4o’s performance in ophthalmic image interpretation, exemplar-guided reasoning (in-context learning), and code-free diagnostic tool generation using publicly available datasets. Methods: We assessed ChatGPT-4o through three clinically relevant tasks: (1) image interpretation without prior examples, using fundus, external ocular, and facial photographs representing key ophthalmic conditions; (2) in-context learning with example-based prompts to improve classification accuracy; and (3) generation of an interactive HTML-based decision-support tool from a clinical diagnostic algorithm. All evaluations were performed using open-access datasets without model fine-tuning Results: When interpreting images without reference examples, ChatGPT-4o achieved diagnostic accuracies of 90.3 % for diabetic retinopathy, 77.4 % for age-related macular degeneration, 100 % for conjunctival melanoma, 97.3 % for pterygium, and 85.7 % for strabismus subtypes. In-context learning consistently improved diagnostic performance across all modalities, with strabismus classification reaching 100 % accuracy. Compared to EfficientNetB2, ChatGPT-4o demonstrated comparable or superior performance in several diagnostic tasks. Additionally, the model successfully translated schematic clinical algorithms into functional, browser-based diagnostic tools using natural language prompts alone. Conclusions: ChatGPT-4o demonstrates promise in ophthalmic image interpretation and low-code clinical tool development, particularly when guided by in-context learning. However, these findings are based on a limited diagnostic spectrum and publicly available datasets. Broader clinical validation and head-to-head comparisons with domain-specific models are needed to establish its practical utility in ophthalmology.
format Article
id doaj-art-a0b6b21f8ddf4e17a9f37749e7f7b364
institution Kabale University
issn 2949-9534
language English
publishDate 2025-09-01
publisher KeAi Communications Co., Ltd.
record_format Article
series Informatics and Health
spelling doaj-art-a0b6b21f8ddf4e17a9f37749e7f7b3642025-08-20T03:42:22ZengKeAi Communications Co., Ltd.Informatics and Health2949-95342025-09-012215816910.1016/j.infoh.2025.07.002Evaluating ChatGPT-4o for ophthalmic image interpretation: From in-context learning to code-free clinical tool generationJoon Yul Choi0Tae Keun Yoo1Department of Biomedical Engineering, Yonsei University, Wonju, South KoreaDepartment of Ophthalmology, Hangil Eye Hospital, Incheon, South Korea; Correspondence to: Department of Ophthalmology, Hangil Eye Hospital, 35 Bupyeong-daero, Bupyeong-gu, Incheon 21388, South Korea.Background: Large language models (LLMs) such as ChatGPT-4o have demonstrated emerging capabilities in medical reasoning and image interpretation. However, their diagnostic applicability in ophthalmology, particularly across diverse imaging modalities, remains insufficiently characterized. This study evaluates ChatGPT-4o’s performance in ophthalmic image interpretation, exemplar-guided reasoning (in-context learning), and code-free diagnostic tool generation using publicly available datasets. Methods: We assessed ChatGPT-4o through three clinically relevant tasks: (1) image interpretation without prior examples, using fundus, external ocular, and facial photographs representing key ophthalmic conditions; (2) in-context learning with example-based prompts to improve classification accuracy; and (3) generation of an interactive HTML-based decision-support tool from a clinical diagnostic algorithm. All evaluations were performed using open-access datasets without model fine-tuning Results: When interpreting images without reference examples, ChatGPT-4o achieved diagnostic accuracies of 90.3 % for diabetic retinopathy, 77.4 % for age-related macular degeneration, 100 % for conjunctival melanoma, 97.3 % for pterygium, and 85.7 % for strabismus subtypes. In-context learning consistently improved diagnostic performance across all modalities, with strabismus classification reaching 100 % accuracy. Compared to EfficientNetB2, ChatGPT-4o demonstrated comparable or superior performance in several diagnostic tasks. Additionally, the model successfully translated schematic clinical algorithms into functional, browser-based diagnostic tools using natural language prompts alone. Conclusions: ChatGPT-4o demonstrates promise in ophthalmic image interpretation and low-code clinical tool development, particularly when guided by in-context learning. However, these findings are based on a limited diagnostic spectrum and publicly available datasets. Broader clinical validation and head-to-head comparisons with domain-specific models are needed to establish its practical utility in ophthalmology.http://www.sciencedirect.com/science/article/pii/S2949953425000219Large language modelFundus photographyStrabismusIn-context learningOphthalmic diagnosisDecision-support tool
spellingShingle Joon Yul Choi
Tae Keun Yoo
Evaluating ChatGPT-4o for ophthalmic image interpretation: From in-context learning to code-free clinical tool generation
Informatics and Health
Large language model
Fundus photography
Strabismus
In-context learning
Ophthalmic diagnosis
Decision-support tool
title Evaluating ChatGPT-4o for ophthalmic image interpretation: From in-context learning to code-free clinical tool generation
title_full Evaluating ChatGPT-4o for ophthalmic image interpretation: From in-context learning to code-free clinical tool generation
title_fullStr Evaluating ChatGPT-4o for ophthalmic image interpretation: From in-context learning to code-free clinical tool generation
title_full_unstemmed Evaluating ChatGPT-4o for ophthalmic image interpretation: From in-context learning to code-free clinical tool generation
title_short Evaluating ChatGPT-4o for ophthalmic image interpretation: From in-context learning to code-free clinical tool generation
title_sort evaluating chatgpt 4o for ophthalmic image interpretation from in context learning to code free clinical tool generation
topic Large language model
Fundus photography
Strabismus
In-context learning
Ophthalmic diagnosis
Decision-support tool
url http://www.sciencedirect.com/science/article/pii/S2949953425000219
work_keys_str_mv AT joonyulchoi evaluatingchatgpt4oforophthalmicimageinterpretationfromincontextlearningtocodefreeclinicaltoolgeneration
AT taekeunyoo evaluatingchatgpt4oforophthalmicimageinterpretationfromincontextlearningtocodefreeclinicaltoolgeneration