Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis

Abstract BackgroundCancer survivors and their caregivers, particularly those from disadvantaged backgrounds with limited health literacy or racial and ethnic minorities facing language barriers, are at a disproportionately higher risk of experiencing symptom burdens from cance...

Full description

Saved in:

Bibliographic Details
Main Authors:	Darren Liu, Xiao Hu, Canhua Xiao, Jinbing Bai, Zahra A Barandouzi, Stephanie Lee, Caitlin Webster, La-Urshalar Brock, Lindsay Lee, Delgersuren Bold, Yufen Lin
Format:	Article
Language:	English
Published:	JMIR Publications 2025-04-01
Series:	JMIR Cancer
Online Access:	https://cancer.jmir.org/2025/1/e67914
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849738084759896064
author	Darren Liu Xiao Hu Canhua Xiao Jinbing Bai Zahra A Barandouzi Stephanie Lee Caitlin Webster La-Urshalar Brock Lindsay Lee Delgersuren Bold Yufen Lin
author_facet	Darren Liu Xiao Hu Canhua Xiao Jinbing Bai Zahra A Barandouzi Stephanie Lee Caitlin Webster La-Urshalar Brock Lindsay Lee Delgersuren Bold Yufen Lin
author_sort	Darren Liu
collection	DOAJ
description	Abstract BackgroundCancer survivors and their caregivers, particularly those from disadvantaged backgrounds with limited health literacy or racial and ethnic minorities facing language barriers, are at a disproportionately higher risk of experiencing symptom burdens from cancer and its treatments. Large language models (LLMs) offer a promising avenue for generating concise, linguistically appropriate, and accessible educational materials tailored to these populations. However, there is limited research evaluating how effectively LLMs perform in creating targeted content for individuals with diverse literacy and language needs. ObjectiveThis study aimed to evaluate the overall performance of LLMs in generating tailored educational content for cancer survivors and their caregivers with limited health literacy or language barriers, compare the performances of 3 Generative Pretrained Transformer (GPT) models (ie, GPT-3.5 Turbo, GPT-4, and GPT-4 Turbo; OpenAI), and examine how different prompting approaches influence the quality of the generated content. MethodsWe selected 30 topics from national guidelines on cancer care and education. GPT-3.5 Turbo, GPT-4, and GPT-4 Turbo were used to generate tailored content of up to 250 words at a 6th-grade reading level, with translations into Spanish and Chinese for each topic. Two distinct prompting approaches (textual and bulleted) were applied and evaluated. Nine oncology experts evaluated 360 generated responses based on predetermined criteria: word limit, reading level, and quality assessment (ie, clarity, accuracy, relevance, completeness, and comprehensibility). ANOVA (analysis of variance) or ResultsOverall, LLMs showed excellent performance in tailoring educational content, with 74.2% (267/360) adhering to the specified word limit and achieving an average quality assessment score of 8.933 out of 10. However, LLMs showed moderate performance in reading level, with 41.1% (148/360) of content failing to meet the sixth-grade reading level. LLMs demonstrated strong translation capabilities, achieving an accuracy of 96.7% (87/90) for Spanish and 81.1% (73/90) for Chinese translations. Common errors included imprecise scopes, inaccuracies in definitions, and content that lacked actionable recommendations. The more advanced GPT-4 family models showed better overall performance compared to GPT-3.5 Turbo. Prompting GPTs to produce bulleted-format content was likely to result in better educational content compared with textual-format content. ConclusionsAll 3 LLMs demonstrated high potential for delivering multilingual, concise, and low health literacy educational content for cancer survivors and caregivers who face limited literacy or language barriers. GPT-4 family models were notably more robust. While further refinement is required to ensure simpler reading levels and fully comprehensive information, these findings highlight LLMs as an emerging tool for bridging gaps in cancer education and advancing health equity. Future research should integrate expert feedback, additional prompt engineering strategies, and specialized training data to optimize content accuracy and accessibility.
format	Article
id	doaj-art-64802d6993b74439a7f4ecaa1952f9a2
institution	DOAJ
issn	2369-1999
language	English
publishDate	2025-04-01
publisher	JMIR Publications
record_format	Article
series	JMIR Cancer
spelling	doaj-art-64802d6993b74439a7f4ecaa1952f9a22025-08-20T03:06:43ZengJMIR PublicationsJMIR Cancer2369-19992025-04-0111e67914e6791410.2196/67914Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality AnalysisDarren Liuhttp://orcid.org/0009-0004-5019-4402Xiao Huhttp://orcid.org/0000-0001-9478-5571Canhua Xiaohttp://orcid.org/0000-0003-1391-5325Jinbing Baihttp://orcid.org/0000-0001-6726-5714Zahra A Barandouzihttp://orcid.org/0000-0002-8537-4751Stephanie Leehttp://orcid.org/0000-0002-6553-3161Caitlin Websterhttp://orcid.org/0000-0002-6464-6308La-Urshalar Brockhttp://orcid.org/0000-0002-7088-9373Lindsay Leehttp://orcid.org/0009-0006-5918-853XDelgersuren Boldhttp://orcid.org/0000-0002-2983-571XYufen Linhttp://orcid.org/0000-0002-9182-2928 Abstract BackgroundCancer survivors and their caregivers, particularly those from disadvantaged backgrounds with limited health literacy or racial and ethnic minorities facing language barriers, are at a disproportionately higher risk of experiencing symptom burdens from cancer and its treatments. Large language models (LLMs) offer a promising avenue for generating concise, linguistically appropriate, and accessible educational materials tailored to these populations. However, there is limited research evaluating how effectively LLMs perform in creating targeted content for individuals with diverse literacy and language needs. ObjectiveThis study aimed to evaluate the overall performance of LLMs in generating tailored educational content for cancer survivors and their caregivers with limited health literacy or language barriers, compare the performances of 3 Generative Pretrained Transformer (GPT) models (ie, GPT-3.5 Turbo, GPT-4, and GPT-4 Turbo; OpenAI), and examine how different prompting approaches influence the quality of the generated content. MethodsWe selected 30 topics from national guidelines on cancer care and education. GPT-3.5 Turbo, GPT-4, and GPT-4 Turbo were used to generate tailored content of up to 250 words at a 6th-grade reading level, with translations into Spanish and Chinese for each topic. Two distinct prompting approaches (textual and bulleted) were applied and evaluated. Nine oncology experts evaluated 360 generated responses based on predetermined criteria: word limit, reading level, and quality assessment (ie, clarity, accuracy, relevance, completeness, and comprehensibility). ANOVA (analysis of variance) or ResultsOverall, LLMs showed excellent performance in tailoring educational content, with 74.2% (267/360) adhering to the specified word limit and achieving an average quality assessment score of 8.933 out of 10. However, LLMs showed moderate performance in reading level, with 41.1% (148/360) of content failing to meet the sixth-grade reading level. LLMs demonstrated strong translation capabilities, achieving an accuracy of 96.7% (87/90) for Spanish and 81.1% (73/90) for Chinese translations. Common errors included imprecise scopes, inaccuracies in definitions, and content that lacked actionable recommendations. The more advanced GPT-4 family models showed better overall performance compared to GPT-3.5 Turbo. Prompting GPTs to produce bulleted-format content was likely to result in better educational content compared with textual-format content. ConclusionsAll 3 LLMs demonstrated high potential for delivering multilingual, concise, and low health literacy educational content for cancer survivors and caregivers who face limited literacy or language barriers. GPT-4 family models were notably more robust. While further refinement is required to ensure simpler reading levels and fully comprehensive information, these findings highlight LLMs as an emerging tool for bridging gaps in cancer education and advancing health equity. Future research should integrate expert feedback, additional prompt engineering strategies, and specialized training data to optimize content accuracy and accessibility.https://cancer.jmir.org/2025/1/e67914
spellingShingle	Darren Liu Xiao Hu Canhua Xiao Jinbing Bai Zahra A Barandouzi Stephanie Lee Caitlin Webster La-Urshalar Brock Lindsay Lee Delgersuren Bold Yufen Lin Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis JMIR Cancer
title	Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis
title_full	Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis
title_fullStr	Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis
title_full_unstemmed	Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis
title_short	Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis
title_sort	evaluation of large language models in tailoring educational content for cancer survivors and their caregivers quality analysis
url	https://cancer.jmir.org/2025/1/e67914
work_keys_str_mv	AT darrenliu evaluationoflargelanguagemodelsintailoringeducationalcontentforcancersurvivorsandtheircaregiversqualityanalysis AT xiaohu evaluationoflargelanguagemodelsintailoringeducationalcontentforcancersurvivorsandtheircaregiversqualityanalysis AT canhuaxiao evaluationoflargelanguagemodelsintailoringeducationalcontentforcancersurvivorsandtheircaregiversqualityanalysis AT jinbingbai evaluationoflargelanguagemodelsintailoringeducationalcontentforcancersurvivorsandtheircaregiversqualityanalysis AT zahraabarandouzi evaluationoflargelanguagemodelsintailoringeducationalcontentforcancersurvivorsandtheircaregiversqualityanalysis AT stephanielee evaluationoflargelanguagemodelsintailoringeducationalcontentforcancersurvivorsandtheircaregiversqualityanalysis AT caitlinwebster evaluationoflargelanguagemodelsintailoringeducationalcontentforcancersurvivorsandtheircaregiversqualityanalysis AT laurshalarbrock evaluationoflargelanguagemodelsintailoringeducationalcontentforcancersurvivorsandtheircaregiversqualityanalysis AT lindsaylee evaluationoflargelanguagemodelsintailoringeducationalcontentforcancersurvivorsandtheircaregiversqualityanalysis AT delgersurenbold evaluationoflargelanguagemodelsintailoringeducationalcontentforcancersurvivorsandtheircaregiversqualityanalysis AT yufenlin evaluationoflargelanguagemodelsintailoringeducationalcontentforcancersurvivorsandtheircaregiversqualityanalysis

Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis

Similar Items