Integrating AI into clinical education: evaluating general practice trainees’ proficiency in distinguishing AI-generated hallucinations and impacting factors

Abstract Objective To assess the ability of General Practice (GP) Trainees to detect AI-generated hallucinations in simulated clinical practice, ChatGPT-4o was utilized. The hallucinations were categorized into three types based on the accuracy of the answers and explanations: (1) correct answers wi...

Full description

Saved in:
Bibliographic Details
Main Authors: Jiacheng Zhou, Jintao Zhang, Rongrong Wan, Xiaochuan Cui, Qiyu Liu, Hua Guo, Xiaofen Shi, Bingbing Fu, Jia Meng, Bo Yue, Yunyun Zhang, Zhiyong Zhang
Format: Article
Language:English
Published: BMC 2025-03-01
Series:BMC Medical Education
Subjects:
Online Access:https://doi.org/10.1186/s12909-025-06916-2
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850094767782756352
author Jiacheng Zhou
Jintao Zhang
Rongrong Wan
Xiaochuan Cui
Qiyu Liu
Hua Guo
Xiaofen Shi
Bingbing Fu
Jia Meng
Bo Yue
Yunyun Zhang
Zhiyong Zhang
author_facet Jiacheng Zhou
Jintao Zhang
Rongrong Wan
Xiaochuan Cui
Qiyu Liu
Hua Guo
Xiaofen Shi
Bingbing Fu
Jia Meng
Bo Yue
Yunyun Zhang
Zhiyong Zhang
author_sort Jiacheng Zhou
collection DOAJ
description Abstract Objective To assess the ability of General Practice (GP) Trainees to detect AI-generated hallucinations in simulated clinical practice, ChatGPT-4o was utilized. The hallucinations were categorized into three types based on the accuracy of the answers and explanations: (1) correct answers with incorrect or flawed explanations, (2) incorrect answers with explanations that contradict factual evidence, and (3) incorrect answers with correct explanations. Methods This multi-center, cross-sectional survey study involved 142 GP Trainees, all of whom were undergoing General Practice Specialist Training and volunteered to participate. The study evaluated the accuracy and consistency of ChatGPT-4o, as well as the Trainees’ response time, accuracy, sensitivity (d’), and response tendencies (β). Binary regression analysis was used to explore factors affecting the Trainees’ ability to identify errors generated by ChatGPT-4o. Results A total of 137 participants were included, with a mean age of 25.93 years. Half of the participants were unfamiliar with AI, and 35.0% had never used it. ChatGPT-4o’s overall accuracy was 80.8%, which slightly decreased to 80.1% after human verification. However, the accuracy for professional practice (Subject 4) was only 57.0%, and after human verification, it dropped further to 44.2%. A total of 87 AI-generated hallucinations were identified, primarily occurring at the application and evaluation levels. The mean accuracy of detecting these hallucinations was 55.0%, and the mean sensitivity (d’) was 0.39. Regression analysis revealed that shorter response times (OR = 0.92, P = 0.02), higher self-assessed AI understanding (OR = 0.16, P = 0.04), and more frequent AI use (OR = 10.43, P = 0.01) were associated with stricter error detection criteria. Conclusions The study concluded that GP trainees faced challenges in identifying ChatGPT-4o’s errors, particularly in clinical scenarios. This highlights the importance of improving AI literacy and critical thinking skills to ensure effective integration of AI into medical education.
format Article
id doaj-art-93a277dc5e6b4cb689787bb979264ee3
institution DOAJ
issn 1472-6920
language English
publishDate 2025-03-01
publisher BMC
record_format Article
series BMC Medical Education
spelling doaj-art-93a277dc5e6b4cb689787bb979264ee32025-08-20T02:41:34ZengBMCBMC Medical Education1472-69202025-03-012511910.1186/s12909-025-06916-2Integrating AI into clinical education: evaluating general practice trainees’ proficiency in distinguishing AI-generated hallucinations and impacting factorsJiacheng Zhou0Jintao Zhang1Rongrong Wan2Xiaochuan Cui3Qiyu Liu4Hua Guo5Xiaofen Shi6Bingbing Fu7Jia Meng8Bo Yue9Yunyun Zhang10Zhiyong Zhang11Department of General Practice, The Affiliated Wuxi People’s Hospital of Nanjing Medical UniversityDepartment of General Practice, The Affiliated Wuxi People’s Hospital of Nanjing Medical UniversityDepartment of General Practice, The Affiliated Wuxi People’s Hospital of Nanjing Medical UniversityDepartment of General Practice, The Affiliated Wuxi People’s Hospital of Nanjing Medical UniversityDepartment of General Practice, The Affiliated Wuxi People’s Hospital of Nanjing Medical UniversityDepartment of General Practice, The Affiliated Wuxi People’s Hospital of Nanjing Medical UniversityDepartment of General Practice, The Affiliated Wuxi People’s Hospital of Nanjing Medical UniversityDepartment of Postgraduate Education, The First Affiliated Hospital of Jiamusi UniversityDepartment of General Practice, The Second Affiliated Hospital of Harbin Medical UniversityResidency Training Center, The Second Affiliated Hospital of Qiqihar Medical UniversityDepartment of General Practice, The Affiliated Wuxi People’s Hospital of Nanjing Medical UniversityDepartment of General Practice, The Affiliated Wuxi People’s Hospital of Nanjing Medical UniversityAbstract Objective To assess the ability of General Practice (GP) Trainees to detect AI-generated hallucinations in simulated clinical practice, ChatGPT-4o was utilized. The hallucinations were categorized into three types based on the accuracy of the answers and explanations: (1) correct answers with incorrect or flawed explanations, (2) incorrect answers with explanations that contradict factual evidence, and (3) incorrect answers with correct explanations. Methods This multi-center, cross-sectional survey study involved 142 GP Trainees, all of whom were undergoing General Practice Specialist Training and volunteered to participate. The study evaluated the accuracy and consistency of ChatGPT-4o, as well as the Trainees’ response time, accuracy, sensitivity (d’), and response tendencies (β). Binary regression analysis was used to explore factors affecting the Trainees’ ability to identify errors generated by ChatGPT-4o. Results A total of 137 participants were included, with a mean age of 25.93 years. Half of the participants were unfamiliar with AI, and 35.0% had never used it. ChatGPT-4o’s overall accuracy was 80.8%, which slightly decreased to 80.1% after human verification. However, the accuracy for professional practice (Subject 4) was only 57.0%, and after human verification, it dropped further to 44.2%. A total of 87 AI-generated hallucinations were identified, primarily occurring at the application and evaluation levels. The mean accuracy of detecting these hallucinations was 55.0%, and the mean sensitivity (d’) was 0.39. Regression analysis revealed that shorter response times (OR = 0.92, P = 0.02), higher self-assessed AI understanding (OR = 0.16, P = 0.04), and more frequent AI use (OR = 10.43, P = 0.01) were associated with stricter error detection criteria. Conclusions The study concluded that GP trainees faced challenges in identifying ChatGPT-4o’s errors, particularly in clinical scenarios. This highlights the importance of improving AI literacy and critical thinking skills to ensure effective integration of AI into medical education.https://doi.org/10.1186/s12909-025-06916-2ChatGPT-4o generated hallucinationsGeneral practice (GP) traineesGeneral practice specialist trainingResponse bias
spellingShingle Jiacheng Zhou
Jintao Zhang
Rongrong Wan
Xiaochuan Cui
Qiyu Liu
Hua Guo
Xiaofen Shi
Bingbing Fu
Jia Meng
Bo Yue
Yunyun Zhang
Zhiyong Zhang
Integrating AI into clinical education: evaluating general practice trainees’ proficiency in distinguishing AI-generated hallucinations and impacting factors
BMC Medical Education
ChatGPT-4o generated hallucinations
General practice (GP) trainees
General practice specialist training
Response bias
title Integrating AI into clinical education: evaluating general practice trainees’ proficiency in distinguishing AI-generated hallucinations and impacting factors
title_full Integrating AI into clinical education: evaluating general practice trainees’ proficiency in distinguishing AI-generated hallucinations and impacting factors
title_fullStr Integrating AI into clinical education: evaluating general practice trainees’ proficiency in distinguishing AI-generated hallucinations and impacting factors
title_full_unstemmed Integrating AI into clinical education: evaluating general practice trainees’ proficiency in distinguishing AI-generated hallucinations and impacting factors
title_short Integrating AI into clinical education: evaluating general practice trainees’ proficiency in distinguishing AI-generated hallucinations and impacting factors
title_sort integrating ai into clinical education evaluating general practice trainees proficiency in distinguishing ai generated hallucinations and impacting factors
topic ChatGPT-4o generated hallucinations
General practice (GP) trainees
General practice specialist training
Response bias
url https://doi.org/10.1186/s12909-025-06916-2
work_keys_str_mv AT jiachengzhou integratingaiintoclinicaleducationevaluatinggeneralpracticetraineesproficiencyindistinguishingaigeneratedhallucinationsandimpactingfactors
AT jintaozhang integratingaiintoclinicaleducationevaluatinggeneralpracticetraineesproficiencyindistinguishingaigeneratedhallucinationsandimpactingfactors
AT rongrongwan integratingaiintoclinicaleducationevaluatinggeneralpracticetraineesproficiencyindistinguishingaigeneratedhallucinationsandimpactingfactors
AT xiaochuancui integratingaiintoclinicaleducationevaluatinggeneralpracticetraineesproficiencyindistinguishingaigeneratedhallucinationsandimpactingfactors
AT qiyuliu integratingaiintoclinicaleducationevaluatinggeneralpracticetraineesproficiencyindistinguishingaigeneratedhallucinationsandimpactingfactors
AT huaguo integratingaiintoclinicaleducationevaluatinggeneralpracticetraineesproficiencyindistinguishingaigeneratedhallucinationsandimpactingfactors
AT xiaofenshi integratingaiintoclinicaleducationevaluatinggeneralpracticetraineesproficiencyindistinguishingaigeneratedhallucinationsandimpactingfactors
AT bingbingfu integratingaiintoclinicaleducationevaluatinggeneralpracticetraineesproficiencyindistinguishingaigeneratedhallucinationsandimpactingfactors
AT jiameng integratingaiintoclinicaleducationevaluatinggeneralpracticetraineesproficiencyindistinguishingaigeneratedhallucinationsandimpactingfactors
AT boyue integratingaiintoclinicaleducationevaluatinggeneralpracticetraineesproficiencyindistinguishingaigeneratedhallucinationsandimpactingfactors
AT yunyunzhang integratingaiintoclinicaleducationevaluatinggeneralpracticetraineesproficiencyindistinguishingaigeneratedhallucinationsandimpactingfactors
AT zhiyongzhang integratingaiintoclinicaleducationevaluatinggeneralpracticetraineesproficiencyindistinguishingaigeneratedhallucinationsandimpactingfactors