Development and Validation of a Large Language Model–Powered Chatbot for Neurosurgery: Mixed Methods Study on Enhancing Perioperative Patient Education

BackgroundPerioperative education is crucial for optimizing outcomes in neuroendovascular procedures, where inadequate understanding can heighten patient anxiety and hinder care plan adherence. Current education models, reliant on traditional consultations and printed materia...

Full description

Saved in:

Bibliographic Details
Main Authors:	Chung Man Ho, Shaowei Guan, Prudence Kwan-Lam Mok, Candice HW Lam, Wai Ying Ho, Calvin Hoi-Kwan Mak, Harry Qin, Arkers Kwan Ching Wong, Vivian Hui
Format:	Article
Language:	English
Published:	JMIR Publications 2025-07-01
Series:	Journal of Medical Internet Research
Online Access:	https://www.jmir.org/2025/1/e74299
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849469693577920512
author	Chung Man Ho Shaowei Guan Prudence Kwan-Lam Mok Candice HW Lam Wai Ying Ho Calvin Hoi-Kwan Mak Harry Qin Arkers Kwan Ching Wong Vivian Hui
author_facet	Chung Man Ho Shaowei Guan Prudence Kwan-Lam Mok Candice HW Lam Wai Ying Ho Calvin Hoi-Kwan Mak Harry Qin Arkers Kwan Ching Wong Vivian Hui
author_sort	Chung Man Ho
collection	DOAJ
description	BackgroundPerioperative education is crucial for optimizing outcomes in neuroendovascular procedures, where inadequate understanding can heighten patient anxiety and hinder care plan adherence. Current education models, reliant on traditional consultations and printed materials, often lack scalability and personalization. Artificial intelligence (AI)–powered chatbots have demonstrated efficacy in various health care contexts; however, their role in neuroendovascular perioperative support remains underexplored. Given the complexity of neuroendovascular procedures and the need for continuous, tailored patient education, AI chatbots have the potential to offer tailored perioperative guidance to improve patient education in this specialty. ObjectiveWe aimed to develop, validate, and assess NeuroBot, an AI-driven system that uses large language models (LLMs) with retrieval-augmented generation to deliver timely, accurate, and evidence-based responses to patient inquiries in neurosurgery, ultimately improving the effectiveness of patient education. MethodsA mixed methods approach was used, consisting of 3 phases. In the first phase, internal validation, we compared the performance of Assistants API, ChatGPT, and Qwen by evaluating their responses to 306 bilingual neuroendovascular-related questions. The accuracy, relevance, and completeness of the responses were evaluated using a Likert scale; statistical analyses included ANOVA and paired t tests. In the second phase, external validation, 10 neurosurgical experts rated the responses generated by NeuroBot using the same evaluation metrics applied in the internal validation phase. The consistency of their ratings was measured using the intraclass correlation coefficient. Finally, in the third phase, a qualitative study was conducted through interviews with 18 health care providers, which helped identify key themes related to the NeuroBot’s usability and perceived benefits. Thematic analysis was performed using NVivo and interrater reliability was confirmed through Cohen κ. ResultsThe Assistants API outperformed both ChatGPT and Qwen, achieving a mean accuracy score of 5.28 out of 6 (95% CI 5.21-5.35), with a statistically significant result (P<.001). External expert ratings for NeuroBot demonstrated significant improvements, with scores of 5.70 out of 6 (95% CI 5.46-5.94) for accuracy, 5.58 out of 6 (95% CI 5.45-5.94) for relevance, and 2.70 out of 3 (95% CI 2.73-2.97) for completeness. Qualitative insights highlighted NeuroBot’s potential to reduce staff workload, enhance patient education, and deliver evidence-based responses. ConclusionsNeuroBot, leveraging LLMs with the retrieval-augmented generation technique, demonstrates the potential of LLM-based chatbots in perioperative neuroendovascular care, offering scalable and continuous support. By integrating domain-specific knowledge, NeuroBot simplifies communication between professionals and patients while ensuring patients have 24-7 access to reliable, evidence-based information. Further refinement and research will enhance NeuroBot’s ability to foster patient-centered communication, optimize clinical outcomes, and advance AI-driven innovations in health care delivery.
format	Article
id	doaj-art-77b9ecd6bfd3459c9a7b6cf5448d93fd
institution	Kabale University
issn	1438-8871
language	English
publishDate	2025-07-01
publisher	JMIR Publications
record_format	Article
series	Journal of Medical Internet Research
spelling	doaj-art-77b9ecd6bfd3459c9a7b6cf5448d93fd2025-08-20T03:25:23ZengJMIR PublicationsJournal of Medical Internet Research1438-88712025-07-0127e7429910.2196/74299Development and Validation of a Large Language Model–Powered Chatbot for Neurosurgery: Mixed Methods Study on Enhancing Perioperative Patient EducationChung Man Hohttps://orcid.org/0009-0007-1057-1463Shaowei Guanhttps://orcid.org/0009-0009-4434-1337Prudence Kwan-Lam Mokhttps://orcid.org/0009-0000-3942-1054Candice HW Lamhttps://orcid.org/0009-0006-0670-8693Wai Ying Hohttps://orcid.org/0009-0006-7272-8887Calvin Hoi-Kwan Makhttps://orcid.org/0000-0002-1443-2109Harry Qinhttps://orcid.org/0000-0002-7059-0929Arkers Kwan Ching Wonghttps://orcid.org/0000-0001-6708-3099Vivian Huihttps://orcid.org/0000-0003-1966-6139 BackgroundPerioperative education is crucial for optimizing outcomes in neuroendovascular procedures, where inadequate understanding can heighten patient anxiety and hinder care plan adherence. Current education models, reliant on traditional consultations and printed materials, often lack scalability and personalization. Artificial intelligence (AI)–powered chatbots have demonstrated efficacy in various health care contexts; however, their role in neuroendovascular perioperative support remains underexplored. Given the complexity of neuroendovascular procedures and the need for continuous, tailored patient education, AI chatbots have the potential to offer tailored perioperative guidance to improve patient education in this specialty. ObjectiveWe aimed to develop, validate, and assess NeuroBot, an AI-driven system that uses large language models (LLMs) with retrieval-augmented generation to deliver timely, accurate, and evidence-based responses to patient inquiries in neurosurgery, ultimately improving the effectiveness of patient education. MethodsA mixed methods approach was used, consisting of 3 phases. In the first phase, internal validation, we compared the performance of Assistants API, ChatGPT, and Qwen by evaluating their responses to 306 bilingual neuroendovascular-related questions. The accuracy, relevance, and completeness of the responses were evaluated using a Likert scale; statistical analyses included ANOVA and paired t tests. In the second phase, external validation, 10 neurosurgical experts rated the responses generated by NeuroBot using the same evaluation metrics applied in the internal validation phase. The consistency of their ratings was measured using the intraclass correlation coefficient. Finally, in the third phase, a qualitative study was conducted through interviews with 18 health care providers, which helped identify key themes related to the NeuroBot’s usability and perceived benefits. Thematic analysis was performed using NVivo and interrater reliability was confirmed through Cohen κ. ResultsThe Assistants API outperformed both ChatGPT and Qwen, achieving a mean accuracy score of 5.28 out of 6 (95% CI 5.21-5.35), with a statistically significant result (P<.001). External expert ratings for NeuroBot demonstrated significant improvements, with scores of 5.70 out of 6 (95% CI 5.46-5.94) for accuracy, 5.58 out of 6 (95% CI 5.45-5.94) for relevance, and 2.70 out of 3 (95% CI 2.73-2.97) for completeness. Qualitative insights highlighted NeuroBot’s potential to reduce staff workload, enhance patient education, and deliver evidence-based responses. ConclusionsNeuroBot, leveraging LLMs with the retrieval-augmented generation technique, demonstrates the potential of LLM-based chatbots in perioperative neuroendovascular care, offering scalable and continuous support. By integrating domain-specific knowledge, NeuroBot simplifies communication between professionals and patients while ensuring patients have 24-7 access to reliable, evidence-based information. Further refinement and research will enhance NeuroBot’s ability to foster patient-centered communication, optimize clinical outcomes, and advance AI-driven innovations in health care delivery.https://www.jmir.org/2025/1/e74299
spellingShingle	Chung Man Ho Shaowei Guan Prudence Kwan-Lam Mok Candice HW Lam Wai Ying Ho Calvin Hoi-Kwan Mak Harry Qin Arkers Kwan Ching Wong Vivian Hui Development and Validation of a Large Language Model–Powered Chatbot for Neurosurgery: Mixed Methods Study on Enhancing Perioperative Patient Education Journal of Medical Internet Research
title	Development and Validation of a Large Language Model–Powered Chatbot for Neurosurgery: Mixed Methods Study on Enhancing Perioperative Patient Education
title_full	Development and Validation of a Large Language Model–Powered Chatbot for Neurosurgery: Mixed Methods Study on Enhancing Perioperative Patient Education
title_fullStr	Development and Validation of a Large Language Model–Powered Chatbot for Neurosurgery: Mixed Methods Study on Enhancing Perioperative Patient Education
title_full_unstemmed	Development and Validation of a Large Language Model–Powered Chatbot for Neurosurgery: Mixed Methods Study on Enhancing Perioperative Patient Education
title_short	Development and Validation of a Large Language Model–Powered Chatbot for Neurosurgery: Mixed Methods Study on Enhancing Perioperative Patient Education
title_sort	development and validation of a large language model powered chatbot for neurosurgery mixed methods study on enhancing perioperative patient education
url	https://www.jmir.org/2025/1/e74299
work_keys_str_mv	AT chungmanho developmentandvalidationofalargelanguagemodelpoweredchatbotforneurosurgerymixedmethodsstudyonenhancingperioperativepatienteducation AT shaoweiguan developmentandvalidationofalargelanguagemodelpoweredchatbotforneurosurgerymixedmethodsstudyonenhancingperioperativepatienteducation AT prudencekwanlammok developmentandvalidationofalargelanguagemodelpoweredchatbotforneurosurgerymixedmethodsstudyonenhancingperioperativepatienteducation AT candicehwlam developmentandvalidationofalargelanguagemodelpoweredchatbotforneurosurgerymixedmethodsstudyonenhancingperioperativepatienteducation AT waiyingho developmentandvalidationofalargelanguagemodelpoweredchatbotforneurosurgerymixedmethodsstudyonenhancingperioperativepatienteducation AT calvinhoikwanmak developmentandvalidationofalargelanguagemodelpoweredchatbotforneurosurgerymixedmethodsstudyonenhancingperioperativepatienteducation AT harryqin developmentandvalidationofalargelanguagemodelpoweredchatbotforneurosurgerymixedmethodsstudyonenhancingperioperativepatienteducation AT arkerskwanchingwong developmentandvalidationofalargelanguagemodelpoweredchatbotforneurosurgerymixedmethodsstudyonenhancingperioperativepatienteducation AT vivianhui developmentandvalidationofalargelanguagemodelpoweredchatbotforneurosurgerymixedmethodsstudyonenhancingperioperativepatienteducation

Development and Validation of a Large Language Model–Powered Chatbot for Neurosurgery: Mixed Methods Study on Enhancing Perioperative Patient Education

Similar Items