Enhancing Diagnostic Accuracy of Ophthalmological Conditions With Complex Prompts in GPT-4: Comparative Analysis of Global and Low- and Middle-Income Country (LMIC)–Specific Pathologies

Abstract BackgroundThe global incidence of blindness has continued to increase, despite the enactment of a Global Eye Health Action Plan by the World Health Assembly. This can be attributed, in part, to an aging population, but also to the limited diagnostic resources within l...

Full description

Saved in:
Bibliographic Details
Main Authors: Shona Alex Tapiwa M'gadzah, Andrew O'Malley
Format: Article
Language:English
Published: JMIR Publications 2025-06-01
Series:JMIR Formative Research
Online Access:https://formative.jmir.org/2025/1/e64986
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract BackgroundThe global incidence of blindness has continued to increase, despite the enactment of a Global Eye Health Action Plan by the World Health Assembly. This can be attributed, in part, to an aging population, but also to the limited diagnostic resources within low- and middle-income countries (LMICs). The advent of generative artificial intelligence (AI) within health care could pose a novel solution to combating the prevalence of blindness globally. ObjectiveThe objectives of this study are to quantify the effect the addition of a complex prompt has on the diagnostic accuracy of a commercially available LLM, and to assess whether such LLMs are better or worse at diagnosing conditions that are more prevalent in LMICs. MethodsTen clinical vignettes representing globally and LMIC-prevalent ophthalmological conditions were presented to GPT-4‐0125-preview using simple and complex prompts. Diagnostic performance metrics, including sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), were calculated. Statistical comparison between prompts was conducted using a chi-square test of independence. ResultsThe complex prompt achieved a higher diagnostic accuracy (90.1%) compared to the simple prompt (60.4%), with a statistically significant difference (χ2P ConclusionsThe study established that overall, the inclusion of a complex prompt positively affected the diagnostic accuracy of GPT-4‐0125-preview, particularly for LMIC-prevalent conditions. This highlights the potential for LLMs, when appropriately tailored, to support clinicians in diverse health care settings. Future research should explore the generalizability of these findings across other models and specialties.
ISSN:2561-326X