Large language model-based multimodal system for detecting and grading ocular surface diseases from smartphone images

BackgroundThe development of medical artificial intelligence (AI) models is primarily driven by the need to address healthcare resource scarcity, particularly in underserved regions. Proposing an affordable, accessible, interpretable, and automated AI system for non-clinical settings is crucial to e...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zhongwen Li, Zhouqian Wang, Liheng Xiu, Pengyao Zhang, Wenfang Wang, Yangyang Wang, Gang Chen, Weihua Yang, Wei Chen
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2025-05-01
Series:	Frontiers in Cell and Developmental Biology
Subjects:	ocular surface disease large language model multimodal model keratitis conjunctivitis pterygium
Online Access:	https://www.frontiersin.org/articles/10.3389/fcell.2025.1600202/full
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849309933167706112
author	Zhongwen Li Zhongwen Li Zhouqian Wang Liheng Xiu Pengyao Zhang Wenfang Wang Yangyang Wang Gang Chen Weihua Yang Wei Chen
author_facet	Zhongwen Li Zhongwen Li Zhouqian Wang Liheng Xiu Pengyao Zhang Wenfang Wang Yangyang Wang Gang Chen Weihua Yang Wei Chen
author_sort	Zhongwen Li
collection	DOAJ
description	BackgroundThe development of medical artificial intelligence (AI) models is primarily driven by the need to address healthcare resource scarcity, particularly in underserved regions. Proposing an affordable, accessible, interpretable, and automated AI system for non-clinical settings is crucial to expanding access to quality healthcare.MethodsThis cross-sectional study developed the Multimodal Ocular Surface Assessment and Interpretation Copilot (MOSAIC) using three multimodal large language models: gpt-4-turbo, claude-3-opus, and gemini-1.5-pro-latest, for detecting three ocular surface diseases (OSDs) and grading keratitis and pterygium. A total of 375 smartphone-captured ocular surface images collected from 290 eyes were utilized to validate MOSAIC. The performance of MOSAIC was evaluated in both zero-shot and few-shot settings, with tasks including image quality control, OSD detection, analysis of the severity of keratitis, and pterygium grading. The interpretability of the system was also evaluated.ResultsMOSAIC achieved 95.00% accuracy in image quality control, 86.96% in OSD detection, 88.33% in distinguishing mild from severe keratitis, and 66.67% in determining pterygium grades with five-shot settings. The performance significantly improved with the increasing learning shots (p < 0.01). The system attained high ROUGE-L F1 scores of 0.70–0.78, depicting its interpretable image comprehension capability.ConclusionMOSAIC exhibited exceptional few-shot learning capabilities, achieving high accuracy in OSD management with minimal training examples. This system has significant potential for smartphone integration to enhance the accessibility and effectiveness of OSD detection and grading in resource-limited settings.
format	Article
id	doaj-art-09e2ec12ddfc46fdbdaa73990e7c2dea
institution	Kabale University
issn	2296-634X
language	English
publishDate	2025-05-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Cell and Developmental Biology
spelling	doaj-art-09e2ec12ddfc46fdbdaa73990e7c2dea2025-08-20T03:53:56ZengFrontiers Media S.A.Frontiers in Cell and Developmental Biology2296-634X2025-05-011310.3389/fcell.2025.16002021600202Large language model-based multimodal system for detecting and grading ocular surface diseases from smartphone imagesZhongwen Li0Zhongwen Li1Zhouqian Wang2Liheng Xiu3Pengyao Zhang4Wenfang Wang5Yangyang Wang6Gang Chen7Weihua Yang8Wei Chen9Ningbo Key Laboratory of Medical Research on Blinding Eye Diseases, Ningbo Eye Institute, Ningbo Eye Hospital, Wenzhou Medical University, Ningbo, ChinaNational Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, ChinaNingbo Key Laboratory of Medical Research on Blinding Eye Diseases, Ningbo Eye Institute, Ningbo Eye Hospital, Wenzhou Medical University, Ningbo, ChinaDepartment of Ophthalmology, West China Second University Hospital, Sichuan University, Chengdu, ChinaNingbo Key Laboratory of Medical Research on Blinding Eye Diseases, Ningbo Eye Institute, Ningbo Eye Hospital, Wenzhou Medical University, Ningbo, ChinaNingbo Key Laboratory of Medical Research on Blinding Eye Diseases, Ningbo Eye Institute, Ningbo Eye Hospital, Wenzhou Medical University, Ningbo, ChinaNingbo Key Laboratory of Medical Research on Blinding Eye Diseases, Ningbo Eye Institute, Ningbo Eye Hospital, Wenzhou Medical University, Ningbo, ChinaFirst People’s Hospital of Aksu, Aksu, ChinaShenzhen Eye Hospital, Shenzhen Eye Medical Center, Southern Medical University, Shenzhen, ChinaNational Clinical Research Center for Ocular Diseases, Eye Hospital, Wenzhou Medical University, Wenzhou, ChinaBackgroundThe development of medical artificial intelligence (AI) models is primarily driven by the need to address healthcare resource scarcity, particularly in underserved regions. Proposing an affordable, accessible, interpretable, and automated AI system for non-clinical settings is crucial to expanding access to quality healthcare.MethodsThis cross-sectional study developed the Multimodal Ocular Surface Assessment and Interpretation Copilot (MOSAIC) using three multimodal large language models: gpt-4-turbo, claude-3-opus, and gemini-1.5-pro-latest, for detecting three ocular surface diseases (OSDs) and grading keratitis and pterygium. A total of 375 smartphone-captured ocular surface images collected from 290 eyes were utilized to validate MOSAIC. The performance of MOSAIC was evaluated in both zero-shot and few-shot settings, with tasks including image quality control, OSD detection, analysis of the severity of keratitis, and pterygium grading. The interpretability of the system was also evaluated.ResultsMOSAIC achieved 95.00% accuracy in image quality control, 86.96% in OSD detection, 88.33% in distinguishing mild from severe keratitis, and 66.67% in determining pterygium grades with five-shot settings. The performance significantly improved with the increasing learning shots (p < 0.01). The system attained high ROUGE-L F1 scores of 0.70–0.78, depicting its interpretable image comprehension capability.ConclusionMOSAIC exhibited exceptional few-shot learning capabilities, achieving high accuracy in OSD management with minimal training examples. This system has significant potential for smartphone integration to enhance the accessibility and effectiveness of OSD detection and grading in resource-limited settings.https://www.frontiersin.org/articles/10.3389/fcell.2025.1600202/fullocular surface diseaselarge language modelmultimodal modelkeratitisconjunctivitispterygium
spellingShingle	Zhongwen Li Zhongwen Li Zhouqian Wang Liheng Xiu Pengyao Zhang Wenfang Wang Yangyang Wang Gang Chen Weihua Yang Wei Chen Large language model-based multimodal system for detecting and grading ocular surface diseases from smartphone images Frontiers in Cell and Developmental Biology ocular surface disease large language model multimodal model keratitis conjunctivitis pterygium
title	Large language model-based multimodal system for detecting and grading ocular surface diseases from smartphone images
title_full	Large language model-based multimodal system for detecting and grading ocular surface diseases from smartphone images
title_fullStr	Large language model-based multimodal system for detecting and grading ocular surface diseases from smartphone images
title_full_unstemmed	Large language model-based multimodal system for detecting and grading ocular surface diseases from smartphone images
title_short	Large language model-based multimodal system for detecting and grading ocular surface diseases from smartphone images
title_sort	large language model based multimodal system for detecting and grading ocular surface diseases from smartphone images
topic	ocular surface disease large language model multimodal model keratitis conjunctivitis pterygium
url	https://www.frontiersin.org/articles/10.3389/fcell.2025.1600202/full
work_keys_str_mv	AT zhongwenli largelanguagemodelbasedmultimodalsystemfordetectingandgradingocularsurfacediseasesfromsmartphoneimages AT zhongwenli largelanguagemodelbasedmultimodalsystemfordetectingandgradingocularsurfacediseasesfromsmartphoneimages AT zhouqianwang largelanguagemodelbasedmultimodalsystemfordetectingandgradingocularsurfacediseasesfromsmartphoneimages AT lihengxiu largelanguagemodelbasedmultimodalsystemfordetectingandgradingocularsurfacediseasesfromsmartphoneimages AT pengyaozhang largelanguagemodelbasedmultimodalsystemfordetectingandgradingocularsurfacediseasesfromsmartphoneimages AT wenfangwang largelanguagemodelbasedmultimodalsystemfordetectingandgradingocularsurfacediseasesfromsmartphoneimages AT yangyangwang largelanguagemodelbasedmultimodalsystemfordetectingandgradingocularsurfacediseasesfromsmartphoneimages AT gangchen largelanguagemodelbasedmultimodalsystemfordetectingandgradingocularsurfacediseasesfromsmartphoneimages AT weihuayang largelanguagemodelbasedmultimodalsystemfordetectingandgradingocularsurfacediseasesfromsmartphoneimages AT weichen largelanguagemodelbasedmultimodalsystemfordetectingandgradingocularsurfacediseasesfromsmartphoneimages

Large language model-based multimodal system for detecting and grading ocular surface diseases from smartphone images

Similar Items