Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis
Abstract BackgroundCancer survivors and their caregivers, particularly those from disadvantaged backgrounds with limited health literacy or racial and ethnic minorities facing language barriers, are at a disproportionately higher risk of experiencing symptom burdens from cance...
Saved in:
| Main Authors: | , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
JMIR Publications
2025-04-01
|
| Series: | JMIR Cancer |
| Online Access: | https://cancer.jmir.org/2025/1/e67914 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849738084759896064 |
|---|---|
| author | Darren Liu Xiao Hu Canhua Xiao Jinbing Bai Zahra A Barandouzi Stephanie Lee Caitlin Webster La-Urshalar Brock Lindsay Lee Delgersuren Bold Yufen Lin |
| author_facet | Darren Liu Xiao Hu Canhua Xiao Jinbing Bai Zahra A Barandouzi Stephanie Lee Caitlin Webster La-Urshalar Brock Lindsay Lee Delgersuren Bold Yufen Lin |
| author_sort | Darren Liu |
| collection | DOAJ |
| description |
Abstract
BackgroundCancer survivors and their caregivers, particularly those from disadvantaged backgrounds with limited health literacy or racial and ethnic minorities facing language barriers, are at a disproportionately higher risk of experiencing symptom burdens from cancer and its treatments. Large language models (LLMs) offer a promising avenue for generating concise, linguistically appropriate, and accessible educational materials tailored to these populations. However, there is limited research evaluating how effectively LLMs perform in creating targeted content for individuals with diverse literacy and language needs.
ObjectiveThis study aimed to evaluate the overall performance of LLMs in generating tailored educational content for cancer survivors and their caregivers with limited health literacy or language barriers, compare the performances of 3 Generative Pretrained Transformer (GPT) models (ie, GPT-3.5 Turbo, GPT-4, and GPT-4 Turbo; OpenAI), and examine how different prompting approaches influence the quality of the generated content.
MethodsWe selected 30 topics from national guidelines on cancer care and education. GPT-3.5 Turbo, GPT-4, and GPT-4 Turbo were used to generate tailored content of up to 250 words at a 6th-grade reading level, with translations into Spanish and Chinese for each topic. Two distinct prompting approaches (textual and bulleted) were applied and evaluated. Nine oncology experts evaluated 360 generated responses based on predetermined criteria: word limit, reading level, and quality assessment (ie, clarity, accuracy, relevance, completeness, and comprehensibility). ANOVA (analysis of variance) or
ResultsOverall, LLMs showed excellent performance in tailoring educational content, with 74.2% (267/360) adhering to the specified word limit and achieving an average quality assessment score of 8.933 out of 10. However, LLMs showed moderate performance in reading level, with 41.1% (148/360) of content failing to meet the sixth-grade reading level. LLMs demonstrated strong translation capabilities, achieving an accuracy of 96.7% (87/90) for Spanish and 81.1% (73/90) for Chinese translations. Common errors included imprecise scopes, inaccuracies in definitions, and content that lacked actionable recommendations. The more advanced GPT-4 family models showed better overall performance compared to GPT-3.5 Turbo. Prompting GPTs to produce bulleted-format content was likely to result in better educational content compared with textual-format content.
ConclusionsAll 3 LLMs demonstrated high potential for delivering multilingual, concise, and low health literacy educational content for cancer survivors and caregivers who face limited literacy or language barriers. GPT-4 family models were notably more robust. While further refinement is required to ensure simpler reading levels and fully comprehensive information, these findings highlight LLMs as an emerging tool for bridging gaps in cancer education and advancing health equity. Future research should integrate expert feedback, additional prompt engineering strategies, and specialized training data to optimize content accuracy and accessibility. |
| format | Article |
| id | doaj-art-64802d6993b74439a7f4ecaa1952f9a2 |
| institution | DOAJ |
| issn | 2369-1999 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | JMIR Publications |
| record_format | Article |
| series | JMIR Cancer |
| spelling | doaj-art-64802d6993b74439a7f4ecaa1952f9a22025-08-20T03:06:43ZengJMIR PublicationsJMIR Cancer2369-19992025-04-0111e67914e6791410.2196/67914Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality AnalysisDarren Liuhttp://orcid.org/0009-0004-5019-4402Xiao Huhttp://orcid.org/0000-0001-9478-5571Canhua Xiaohttp://orcid.org/0000-0003-1391-5325Jinbing Baihttp://orcid.org/0000-0001-6726-5714Zahra A Barandouzihttp://orcid.org/0000-0002-8537-4751Stephanie Leehttp://orcid.org/0000-0002-6553-3161Caitlin Websterhttp://orcid.org/0000-0002-6464-6308La-Urshalar Brockhttp://orcid.org/0000-0002-7088-9373Lindsay Leehttp://orcid.org/0009-0006-5918-853XDelgersuren Boldhttp://orcid.org/0000-0002-2983-571XYufen Linhttp://orcid.org/0000-0002-9182-2928 Abstract BackgroundCancer survivors and their caregivers, particularly those from disadvantaged backgrounds with limited health literacy or racial and ethnic minorities facing language barriers, are at a disproportionately higher risk of experiencing symptom burdens from cancer and its treatments. Large language models (LLMs) offer a promising avenue for generating concise, linguistically appropriate, and accessible educational materials tailored to these populations. However, there is limited research evaluating how effectively LLMs perform in creating targeted content for individuals with diverse literacy and language needs. ObjectiveThis study aimed to evaluate the overall performance of LLMs in generating tailored educational content for cancer survivors and their caregivers with limited health literacy or language barriers, compare the performances of 3 Generative Pretrained Transformer (GPT) models (ie, GPT-3.5 Turbo, GPT-4, and GPT-4 Turbo; OpenAI), and examine how different prompting approaches influence the quality of the generated content. MethodsWe selected 30 topics from national guidelines on cancer care and education. GPT-3.5 Turbo, GPT-4, and GPT-4 Turbo were used to generate tailored content of up to 250 words at a 6th-grade reading level, with translations into Spanish and Chinese for each topic. Two distinct prompting approaches (textual and bulleted) were applied and evaluated. Nine oncology experts evaluated 360 generated responses based on predetermined criteria: word limit, reading level, and quality assessment (ie, clarity, accuracy, relevance, completeness, and comprehensibility). ANOVA (analysis of variance) or ResultsOverall, LLMs showed excellent performance in tailoring educational content, with 74.2% (267/360) adhering to the specified word limit and achieving an average quality assessment score of 8.933 out of 10. However, LLMs showed moderate performance in reading level, with 41.1% (148/360) of content failing to meet the sixth-grade reading level. LLMs demonstrated strong translation capabilities, achieving an accuracy of 96.7% (87/90) for Spanish and 81.1% (73/90) for Chinese translations. Common errors included imprecise scopes, inaccuracies in definitions, and content that lacked actionable recommendations. The more advanced GPT-4 family models showed better overall performance compared to GPT-3.5 Turbo. Prompting GPTs to produce bulleted-format content was likely to result in better educational content compared with textual-format content. ConclusionsAll 3 LLMs demonstrated high potential for delivering multilingual, concise, and low health literacy educational content for cancer survivors and caregivers who face limited literacy or language barriers. GPT-4 family models were notably more robust. While further refinement is required to ensure simpler reading levels and fully comprehensive information, these findings highlight LLMs as an emerging tool for bridging gaps in cancer education and advancing health equity. Future research should integrate expert feedback, additional prompt engineering strategies, and specialized training data to optimize content accuracy and accessibility.https://cancer.jmir.org/2025/1/e67914 |
| spellingShingle | Darren Liu Xiao Hu Canhua Xiao Jinbing Bai Zahra A Barandouzi Stephanie Lee Caitlin Webster La-Urshalar Brock Lindsay Lee Delgersuren Bold Yufen Lin Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis JMIR Cancer |
| title | Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis |
| title_full | Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis |
| title_fullStr | Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis |
| title_full_unstemmed | Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis |
| title_short | Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis |
| title_sort | evaluation of large language models in tailoring educational content for cancer survivors and their caregivers quality analysis |
| url | https://cancer.jmir.org/2025/1/e67914 |
| work_keys_str_mv | AT darrenliu evaluationoflargelanguagemodelsintailoringeducationalcontentforcancersurvivorsandtheircaregiversqualityanalysis AT xiaohu evaluationoflargelanguagemodelsintailoringeducationalcontentforcancersurvivorsandtheircaregiversqualityanalysis AT canhuaxiao evaluationoflargelanguagemodelsintailoringeducationalcontentforcancersurvivorsandtheircaregiversqualityanalysis AT jinbingbai evaluationoflargelanguagemodelsintailoringeducationalcontentforcancersurvivorsandtheircaregiversqualityanalysis AT zahraabarandouzi evaluationoflargelanguagemodelsintailoringeducationalcontentforcancersurvivorsandtheircaregiversqualityanalysis AT stephanielee evaluationoflargelanguagemodelsintailoringeducationalcontentforcancersurvivorsandtheircaregiversqualityanalysis AT caitlinwebster evaluationoflargelanguagemodelsintailoringeducationalcontentforcancersurvivorsandtheircaregiversqualityanalysis AT laurshalarbrock evaluationoflargelanguagemodelsintailoringeducationalcontentforcancersurvivorsandtheircaregiversqualityanalysis AT lindsaylee evaluationoflargelanguagemodelsintailoringeducationalcontentforcancersurvivorsandtheircaregiversqualityanalysis AT delgersurenbold evaluationoflargelanguagemodelsintailoringeducationalcontentforcancersurvivorsandtheircaregiversqualityanalysis AT yufenlin evaluationoflargelanguagemodelsintailoringeducationalcontentforcancersurvivorsandtheircaregiversqualityanalysis |