Comparing Artificial Intelligence–Generated and Clinician-Created Personalized Self-Management Guidance for Patients With Knee Osteoarthritis: Blinded Observational Study

BackgroundKnee osteoarthritis is a prevalent, chronic musculoskeletal disorder that impairs mobility and quality of life. Personalized patient education aims to improve self-management and adherence; yet, its delivery is often limited by time constraints, clinician workload,...

Full description

Saved in:
Bibliographic Details
Main Authors: Kai Du, Ao Li, Qi-Heng Zuo, Chen-Yu Zhang, Ren Guo, Ping Chen, Wei-Shuai Du, Shu-Ming Li
Format: Article
Language:English
Published: JMIR Publications 2025-05-01
Series:Journal of Medical Internet Research
Online Access:https://www.jmir.org/2025/1/e67830
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850036403373604864
author Kai Du
Ao Li
Qi-Heng Zuo
Chen-Yu Zhang
Ren Guo
Ping Chen
Wei-Shuai Du
Shu-Ming Li
author_facet Kai Du
Ao Li
Qi-Heng Zuo
Chen-Yu Zhang
Ren Guo
Ping Chen
Wei-Shuai Du
Shu-Ming Li
author_sort Kai Du
collection DOAJ
description BackgroundKnee osteoarthritis is a prevalent, chronic musculoskeletal disorder that impairs mobility and quality of life. Personalized patient education aims to improve self-management and adherence; yet, its delivery is often limited by time constraints, clinician workload, and the heterogeneity of patient needs. Recent advances in large language models offer potential solutions. GPT-4 (OpenAI), distinguished by its long-context reasoning and adoption in clinical artificial intelligence research, emerged as a leading candidate for personalized health communication. However, its application in generating condition-specific educational guidance remains underexplored, and concerns about misinformation, personalization limits, and ethical oversight remain. ObjectiveWe evaluated GPT-4’s ability to generate individualized self-management guidance for patients with knee osteoarthritis in comparison with clinician-created content. MethodsThis 2-phase, double-blind, observational study used data from 50 patients previously enrolled in a registered randomized trial. In phase 1, 2 orthopedic clinicians each generated personalized education materials for 25 patient profiles using anonymized clinical data, including history, symptoms, and lifestyle. In phase 2, the same datasets were processed by GPT-4 using standardized prompts. All content was anonymized and evaluated by 2 independent, blinded clinical experts using validated scoring systems. Evaluation criteria included efficiency, readability (Flesch-Kincaid, Gunning Fog, Coleman-Liau, and Simple Measure of Gobbledygook), accuracy, personalization, and comprehensiveness and safety. Disagreements between reviewers were resolved through consensus or third-party adjudication. ResultsGPT-4 outperformed clinicians in content generation speed (530.03 vs 37.29 words per min, P<.001). Readability was better on the Flesch-Kincaid (mean 11.56, SD 1.08 vs mean 12.67 SD 0.95), Gunning Fog (mean 12.47, SD 1.36 vs mean 14.56, SD 0.93), and Simple Measure of Gobbledygook (mean 13.33, SD 1.00 vs mean 13.81 SD 0.69) indices (all P<.001), though GPT-4 scored slightly higher on the Coleman-Liau Index (mean 15.90, SD 1.03 vs mean 15.15, SD 0.91). GPT-4 also outperformed clinicians in accuracy (mean 5.31, SD 1.73 vs mean 4.76, SD 1.10; P=.05, personalization (mean 54.32, SD 6.21 vs mean 33.20, SD 5.40; P<.001), comprehensiveness (mean 51.74, SD 6.47 vs mean 35.26, SD 6.66; P<.001), and safety (median 61, IQR 58-66 vs median 50, IQR 47-55.25; P<.001). ConclusionsGPT-4 could generate personalized self-management guidance for knee osteoarthritis with greater efficiency, accuracy, personalization, comprehensiveness, and safety than clinician-generated content, as assessed using standardized, guideline-aligned evaluation frameworks. These findings underscore the potential of large language models to support scalable, high-quality patient education in chronic disease management. The observed lexical complexity suggests the need to refine outputs for populations with limited health literacy. As an exploratory, single-center study, these results warrant confirmation in larger, multicenter cohorts with diverse demographic profiles. Future implementation should be guided by ethical and operational safeguards, including data privacy, transparency, and the delineation of clinical responsibility. Hybrid models integrating artificial intelligence–generated content with clinician oversight may offer a pragmatic path forward.
format Article
id doaj-art-d57a47782ae448eb87eef862c0ccc5fc
institution DOAJ
issn 1438-8871
language English
publishDate 2025-05-01
publisher JMIR Publications
record_format Article
series Journal of Medical Internet Research
spelling doaj-art-d57a47782ae448eb87eef862c0ccc5fc2025-08-20T02:57:08ZengJMIR PublicationsJournal of Medical Internet Research1438-88712025-05-0127e6783010.2196/67830Comparing Artificial Intelligence–Generated and Clinician-Created Personalized Self-Management Guidance for Patients With Knee Osteoarthritis: Blinded Observational StudyKai Duhttps://orcid.org/0009-0004-8375-869XAo Lihttps://orcid.org/0009-0009-7888-0478Qi-Heng Zuohttps://orcid.org/0009-0004-6947-3807Chen-Yu Zhanghttps://orcid.org/0009-0009-2029-3860Ren Guohttps://orcid.org/0009-0003-0973-2523Ping Chenhttps://orcid.org/0009-0005-8494-298XWei-Shuai Duhttps://orcid.org/0009-0003-0335-6715Shu-Ming Lihttps://orcid.org/0000-0002-7460-1349 BackgroundKnee osteoarthritis is a prevalent, chronic musculoskeletal disorder that impairs mobility and quality of life. Personalized patient education aims to improve self-management and adherence; yet, its delivery is often limited by time constraints, clinician workload, and the heterogeneity of patient needs. Recent advances in large language models offer potential solutions. GPT-4 (OpenAI), distinguished by its long-context reasoning and adoption in clinical artificial intelligence research, emerged as a leading candidate for personalized health communication. However, its application in generating condition-specific educational guidance remains underexplored, and concerns about misinformation, personalization limits, and ethical oversight remain. ObjectiveWe evaluated GPT-4’s ability to generate individualized self-management guidance for patients with knee osteoarthritis in comparison with clinician-created content. MethodsThis 2-phase, double-blind, observational study used data from 50 patients previously enrolled in a registered randomized trial. In phase 1, 2 orthopedic clinicians each generated personalized education materials for 25 patient profiles using anonymized clinical data, including history, symptoms, and lifestyle. In phase 2, the same datasets were processed by GPT-4 using standardized prompts. All content was anonymized and evaluated by 2 independent, blinded clinical experts using validated scoring systems. Evaluation criteria included efficiency, readability (Flesch-Kincaid, Gunning Fog, Coleman-Liau, and Simple Measure of Gobbledygook), accuracy, personalization, and comprehensiveness and safety. Disagreements between reviewers were resolved through consensus or third-party adjudication. ResultsGPT-4 outperformed clinicians in content generation speed (530.03 vs 37.29 words per min, P<.001). Readability was better on the Flesch-Kincaid (mean 11.56, SD 1.08 vs mean 12.67 SD 0.95), Gunning Fog (mean 12.47, SD 1.36 vs mean 14.56, SD 0.93), and Simple Measure of Gobbledygook (mean 13.33, SD 1.00 vs mean 13.81 SD 0.69) indices (all P<.001), though GPT-4 scored slightly higher on the Coleman-Liau Index (mean 15.90, SD 1.03 vs mean 15.15, SD 0.91). GPT-4 also outperformed clinicians in accuracy (mean 5.31, SD 1.73 vs mean 4.76, SD 1.10; P=.05, personalization (mean 54.32, SD 6.21 vs mean 33.20, SD 5.40; P<.001), comprehensiveness (mean 51.74, SD 6.47 vs mean 35.26, SD 6.66; P<.001), and safety (median 61, IQR 58-66 vs median 50, IQR 47-55.25; P<.001). ConclusionsGPT-4 could generate personalized self-management guidance for knee osteoarthritis with greater efficiency, accuracy, personalization, comprehensiveness, and safety than clinician-generated content, as assessed using standardized, guideline-aligned evaluation frameworks. These findings underscore the potential of large language models to support scalable, high-quality patient education in chronic disease management. The observed lexical complexity suggests the need to refine outputs for populations with limited health literacy. As an exploratory, single-center study, these results warrant confirmation in larger, multicenter cohorts with diverse demographic profiles. Future implementation should be guided by ethical and operational safeguards, including data privacy, transparency, and the delineation of clinical responsibility. Hybrid models integrating artificial intelligence–generated content with clinician oversight may offer a pragmatic path forward.https://www.jmir.org/2025/1/e67830
spellingShingle Kai Du
Ao Li
Qi-Heng Zuo
Chen-Yu Zhang
Ren Guo
Ping Chen
Wei-Shuai Du
Shu-Ming Li
Comparing Artificial Intelligence–Generated and Clinician-Created Personalized Self-Management Guidance for Patients With Knee Osteoarthritis: Blinded Observational Study
Journal of Medical Internet Research
title Comparing Artificial Intelligence–Generated and Clinician-Created Personalized Self-Management Guidance for Patients With Knee Osteoarthritis: Blinded Observational Study
title_full Comparing Artificial Intelligence–Generated and Clinician-Created Personalized Self-Management Guidance for Patients With Knee Osteoarthritis: Blinded Observational Study
title_fullStr Comparing Artificial Intelligence–Generated and Clinician-Created Personalized Self-Management Guidance for Patients With Knee Osteoarthritis: Blinded Observational Study
title_full_unstemmed Comparing Artificial Intelligence–Generated and Clinician-Created Personalized Self-Management Guidance for Patients With Knee Osteoarthritis: Blinded Observational Study
title_short Comparing Artificial Intelligence–Generated and Clinician-Created Personalized Self-Management Guidance for Patients With Knee Osteoarthritis: Blinded Observational Study
title_sort comparing artificial intelligence generated and clinician created personalized self management guidance for patients with knee osteoarthritis blinded observational study
url https://www.jmir.org/2025/1/e67830
work_keys_str_mv AT kaidu comparingartificialintelligencegeneratedandcliniciancreatedpersonalizedselfmanagementguidanceforpatientswithkneeosteoarthritisblindedobservationalstudy
AT aoli comparingartificialintelligencegeneratedandcliniciancreatedpersonalizedselfmanagementguidanceforpatientswithkneeosteoarthritisblindedobservationalstudy
AT qihengzuo comparingartificialintelligencegeneratedandcliniciancreatedpersonalizedselfmanagementguidanceforpatientswithkneeosteoarthritisblindedobservationalstudy
AT chenyuzhang comparingartificialintelligencegeneratedandcliniciancreatedpersonalizedselfmanagementguidanceforpatientswithkneeosteoarthritisblindedobservationalstudy
AT renguo comparingartificialintelligencegeneratedandcliniciancreatedpersonalizedselfmanagementguidanceforpatientswithkneeosteoarthritisblindedobservationalstudy
AT pingchen comparingartificialintelligencegeneratedandcliniciancreatedpersonalizedselfmanagementguidanceforpatientswithkneeosteoarthritisblindedobservationalstudy
AT weishuaidu comparingartificialintelligencegeneratedandcliniciancreatedpersonalizedselfmanagementguidanceforpatientswithkneeosteoarthritisblindedobservationalstudy
AT shumingli comparingartificialintelligencegeneratedandcliniciancreatedpersonalizedselfmanagementguidanceforpatientswithkneeosteoarthritisblindedobservationalstudy