Large Language Models: Pioneering New Educational Frontiers in Childhood Myopia

Abstract Introduction This study aimed to evaluate the performance of three large language models (LLMs), namely ChatGPT-3.5, ChatGPT-4o (o1 Preview), and Google Gemini, in producing patient education materials (PEMs) and improving the readability of online PEMs on childhood myopia. Methods LLM-gene...

Full description

Saved in:

Bibliographic Details
Main Authors:	Mohammad Delsoz, Amr Hassan, Amin Nabavi, Amir Rahdar, Brian Fowler, Natalie C. Kerr, Lauren Claire Ditta, Mary E. Hoehn, Margaret M. DeAngelis, Andrzej Grzybowski, Yih-Chung Tham, Siamak Yousefi
Format:	Article
Language:	English
Published:	Adis, Springer Healthcare 2025-04-01
Series:	Ophthalmology and Therapy
Subjects:	Large language models Patient education materials Childhood myopia
Online Access:	https://doi.org/10.1007/s40123-025-01142-x
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Abstract Introduction This study aimed to evaluate the performance of three large language models (LLMs), namely ChatGPT-3.5, ChatGPT-4o (o1 Preview), and Google Gemini, in producing patient education materials (PEMs) and improving the readability of online PEMs on childhood myopia. Methods LLM-generated responses were assessed using three prompts. Prompt A requested to “Write educational material on childhood myopia.” Prompt B added a modifier specifying “a sixth-grade reading level using the FKGL (Flesch-Kincaid Grade Level) readability formula.” Prompt C aimed to rewrite existing PEMs to a sixth-grade level using FKGL. Reponses were assessed for quality (DISCERN tool), readability (FKGL, SMOG (Simple Measure of Gobbledygook)), Patient Education Materials Assessment Tool (PEMAT, understandability/actionability), and accuracy. Results ChatGPT-4o (01) and ChatGPT-3.5 generated good-quality PEMs (DISCERN 52.8 and 52.7, respectively); however, quality declined from prompt A to prompt B (p = 0.001 and p = 0.013). Google Gemini produced fair-quality (DISCERN 43) but improved with prompt B (p = 0.02). All PEMs exceeded the 70% PEMAT understandability threshold but failed the 70% actionability threshold (40%). No misinformation was identified. Readability improved with prompt B; ChatGPT-4o (01) and ChatGPT-3.5 achieved a sixth-grade level or below (FGKL 6 ± 0.6 and 6.2 ± 0.3), while Google Gemini did not (FGKL 7 ± 0.6). ChatGPT-4o (01) outperformed Google Gemini in readability (p < 0.001) but was comparable to ChatGPT-3.5 (p = 0.846). Prompt C improved readability across all LLMs, with ChatGPT-4o (o1 Preview) showing the most significant gains (FKGL 5.8 ± 1.5; p < 0.001). Conclusions ChatGPT-4o (o1 Preview) demonstrates potential in producing accurate, good-quality, understandable PEMs, and in improving online PEMs on childhood myopia.
ISSN:	2193-8245 2193-6528

Large Language Models: Pioneering New Educational Frontiers in Childhood Myopia

Similar Items