Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study
BackgroundOcular myasthenia gravis (OMG) is a neuromuscular disorder primarily affecting the extraocular muscles, leading to ptosis and diplopia. Effective patient education is crucial for disease management; however, in China, limited health care resources often restrict pat...
Saved in:
| Main Authors: | , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
JMIR Publications
2025-04-01
|
| Series: | Journal of Medical Internet Research |
| Online Access: | https://www.jmir.org/2025/1/e67883 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850203270940721152 |
|---|---|
| author | Bin Wei Lili Yao Xin Hu Yuxiang Hu Jie Rao Yu Ji Zhuoer Dong Yichong Duan Xiaorong Wu |
| author_facet | Bin Wei Lili Yao Xin Hu Yuxiang Hu Jie Rao Yu Ji Zhuoer Dong Yichong Duan Xiaorong Wu |
| author_sort | Bin Wei |
| collection | DOAJ |
| description |
BackgroundOcular myasthenia gravis (OMG) is a neuromuscular disorder primarily affecting the extraocular muscles, leading to ptosis and diplopia. Effective patient education is crucial for disease management; however, in China, limited health care resources often restrict patients’ access to personalized medical guidance. Large language models (LLMs) have emerged as potential tools to bridge this gap by providing instant, AI-driven health information. However, their accuracy and readability in educating patients with OMG remain uncertain.
ObjectiveThe purpose of this study was to systematically evaluate the effectiveness of multiple LLMs in the education of Chinese patients with OMG. Specifically, the validity of these models in answering patients with OMG-related questions was assessed through accuracy, completeness, readability, usefulness, and safety, and patients’ ratings of their usability and readability were analyzed.
MethodsThe study was conducted in two phases: 130 choice ophthalmology examination questions were input into 5 different LLMs. Their performance was compared with that of undergraduates, master’s students, and ophthalmology residents. In addition, 23 common patients with OMG-related patient questions were posed to 4 LLMs, and their responses were evaluated by ophthalmologists across 5 domains. In the second phase, 20 patients with OMG interacted with the 2 LLMs from the first phase, each asking 3 questions. Patients assessed the responses for satisfaction and readability, while ophthalmologists evaluated the responses again using the 5 domains.
ResultsChatGPT o1-preview achieved the highest accuracy rate of 73% on 130 ophthalmology examination questions, outperforming other LLMs and professional groups like undergraduates and master’s students. For 23 common patients with OMG-related questions, ChatGPT o1-preview scored highest in correctness (4.44), completeness (4.44), helpfulness (4.47), and safety (4.6). GEMINI (Google DeepMind) provided the easiest-to-understand responses in readability assessments, while GPT-4o had the most complex responses, suitable for readers with higher education levels. In the second phase with 20 patients with OMG, ChatGPT o1-preview received higher satisfaction scores than Ernie 3.5 (Baidu; 4.40 vs 3.89, P=.002), although Ernie 3.5’s responses were slightly more readable (4.31 vs 4.03, P=.01).
ConclusionsLLMs such as ChatGPT o1-preview may have the potential to enhance patient education. Addressing challenges such as misinformation risk, readability issues, and ethical considerations is crucial for their effective and safe integration into clinical practice. |
| format | Article |
| id | doaj-art-d2ed70090e8844fb8dabac3b3feaf0d9 |
| institution | OA Journals |
| issn | 1438-8871 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | JMIR Publications |
| record_format | Article |
| series | Journal of Medical Internet Research |
| spelling | doaj-art-d2ed70090e8844fb8dabac3b3feaf0d92025-08-20T02:11:34ZengJMIR PublicationsJournal of Medical Internet Research1438-88712025-04-0127e6788310.2196/67883Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods StudyBin Weihttps://orcid.org/0009-0007-0946-699XLili Yaohttps://orcid.org/0009-0007-2301-0841Xin Huhttps://orcid.org/0009-0005-2760-7800Yuxiang Huhttps://orcid.org/0000-0003-0042-2871Jie Raohttps://orcid.org/0000-0002-0527-1200Yu Jihttps://orcid.org/0000-0003-1781-0491Zhuoer Donghttps://orcid.org/0009-0006-2790-0983Yichong Duanhttps://orcid.org/0009-0007-1621-2330Xiaorong Wuhttps://orcid.org/0000-0003-4580-4304 BackgroundOcular myasthenia gravis (OMG) is a neuromuscular disorder primarily affecting the extraocular muscles, leading to ptosis and diplopia. Effective patient education is crucial for disease management; however, in China, limited health care resources often restrict patients’ access to personalized medical guidance. Large language models (LLMs) have emerged as potential tools to bridge this gap by providing instant, AI-driven health information. However, their accuracy and readability in educating patients with OMG remain uncertain. ObjectiveThe purpose of this study was to systematically evaluate the effectiveness of multiple LLMs in the education of Chinese patients with OMG. Specifically, the validity of these models in answering patients with OMG-related questions was assessed through accuracy, completeness, readability, usefulness, and safety, and patients’ ratings of their usability and readability were analyzed. MethodsThe study was conducted in two phases: 130 choice ophthalmology examination questions were input into 5 different LLMs. Their performance was compared with that of undergraduates, master’s students, and ophthalmology residents. In addition, 23 common patients with OMG-related patient questions were posed to 4 LLMs, and their responses were evaluated by ophthalmologists across 5 domains. In the second phase, 20 patients with OMG interacted with the 2 LLMs from the first phase, each asking 3 questions. Patients assessed the responses for satisfaction and readability, while ophthalmologists evaluated the responses again using the 5 domains. ResultsChatGPT o1-preview achieved the highest accuracy rate of 73% on 130 ophthalmology examination questions, outperforming other LLMs and professional groups like undergraduates and master’s students. For 23 common patients with OMG-related questions, ChatGPT o1-preview scored highest in correctness (4.44), completeness (4.44), helpfulness (4.47), and safety (4.6). GEMINI (Google DeepMind) provided the easiest-to-understand responses in readability assessments, while GPT-4o had the most complex responses, suitable for readers with higher education levels. In the second phase with 20 patients with OMG, ChatGPT o1-preview received higher satisfaction scores than Ernie 3.5 (Baidu; 4.40 vs 3.89, P=.002), although Ernie 3.5’s responses were slightly more readable (4.31 vs 4.03, P=.01). ConclusionsLLMs such as ChatGPT o1-preview may have the potential to enhance patient education. Addressing challenges such as misinformation risk, readability issues, and ethical considerations is crucial for their effective and safe integration into clinical practice.https://www.jmir.org/2025/1/e67883 |
| spellingShingle | Bin Wei Lili Yao Xin Hu Yuxiang Hu Jie Rao Yu Ji Zhuoer Dong Yichong Duan Xiaorong Wu Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study Journal of Medical Internet Research |
| title | Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study |
| title_full | Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study |
| title_fullStr | Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study |
| title_full_unstemmed | Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study |
| title_short | Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study |
| title_sort | evaluating the effectiveness of large language models in providing patient education for chinese patients with ocular myasthenia gravis mixed methods study |
| url | https://www.jmir.org/2025/1/e67883 |
| work_keys_str_mv | AT binwei evaluatingtheeffectivenessoflargelanguagemodelsinprovidingpatienteducationforchinesepatientswithocularmyastheniagravismixedmethodsstudy AT liliyao evaluatingtheeffectivenessoflargelanguagemodelsinprovidingpatienteducationforchinesepatientswithocularmyastheniagravismixedmethodsstudy AT xinhu evaluatingtheeffectivenessoflargelanguagemodelsinprovidingpatienteducationforchinesepatientswithocularmyastheniagravismixedmethodsstudy AT yuxianghu evaluatingtheeffectivenessoflargelanguagemodelsinprovidingpatienteducationforchinesepatientswithocularmyastheniagravismixedmethodsstudy AT jierao evaluatingtheeffectivenessoflargelanguagemodelsinprovidingpatienteducationforchinesepatientswithocularmyastheniagravismixedmethodsstudy AT yuji evaluatingtheeffectivenessoflargelanguagemodelsinprovidingpatienteducationforchinesepatientswithocularmyastheniagravismixedmethodsstudy AT zhuoerdong evaluatingtheeffectivenessoflargelanguagemodelsinprovidingpatienteducationforchinesepatientswithocularmyastheniagravismixedmethodsstudy AT yichongduan evaluatingtheeffectivenessoflargelanguagemodelsinprovidingpatienteducationforchinesepatientswithocularmyastheniagravismixedmethodsstudy AT xiaorongwu evaluatingtheeffectivenessoflargelanguagemodelsinprovidingpatienteducationforchinesepatientswithocularmyastheniagravismixedmethodsstudy |