Integrating AI into clinical education: evaluating general practice trainees’ proficiency in distinguishing AI-generated hallucinations and impacting factors
Abstract Objective To assess the ability of General Practice (GP) Trainees to detect AI-generated hallucinations in simulated clinical practice, ChatGPT-4o was utilized. The hallucinations were categorized into three types based on the accuracy of the answers and explanations: (1) correct answers wi...
Saved in:
| Main Authors: | , , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-03-01
|
| Series: | BMC Medical Education |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s12909-025-06916-2 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850094767782756352 |
|---|---|
| author | Jiacheng Zhou Jintao Zhang Rongrong Wan Xiaochuan Cui Qiyu Liu Hua Guo Xiaofen Shi Bingbing Fu Jia Meng Bo Yue Yunyun Zhang Zhiyong Zhang |
| author_facet | Jiacheng Zhou Jintao Zhang Rongrong Wan Xiaochuan Cui Qiyu Liu Hua Guo Xiaofen Shi Bingbing Fu Jia Meng Bo Yue Yunyun Zhang Zhiyong Zhang |
| author_sort | Jiacheng Zhou |
| collection | DOAJ |
| description | Abstract Objective To assess the ability of General Practice (GP) Trainees to detect AI-generated hallucinations in simulated clinical practice, ChatGPT-4o was utilized. The hallucinations were categorized into three types based on the accuracy of the answers and explanations: (1) correct answers with incorrect or flawed explanations, (2) incorrect answers with explanations that contradict factual evidence, and (3) incorrect answers with correct explanations. Methods This multi-center, cross-sectional survey study involved 142 GP Trainees, all of whom were undergoing General Practice Specialist Training and volunteered to participate. The study evaluated the accuracy and consistency of ChatGPT-4o, as well as the Trainees’ response time, accuracy, sensitivity (d’), and response tendencies (β). Binary regression analysis was used to explore factors affecting the Trainees’ ability to identify errors generated by ChatGPT-4o. Results A total of 137 participants were included, with a mean age of 25.93 years. Half of the participants were unfamiliar with AI, and 35.0% had never used it. ChatGPT-4o’s overall accuracy was 80.8%, which slightly decreased to 80.1% after human verification. However, the accuracy for professional practice (Subject 4) was only 57.0%, and after human verification, it dropped further to 44.2%. A total of 87 AI-generated hallucinations were identified, primarily occurring at the application and evaluation levels. The mean accuracy of detecting these hallucinations was 55.0%, and the mean sensitivity (d’) was 0.39. Regression analysis revealed that shorter response times (OR = 0.92, P = 0.02), higher self-assessed AI understanding (OR = 0.16, P = 0.04), and more frequent AI use (OR = 10.43, P = 0.01) were associated with stricter error detection criteria. Conclusions The study concluded that GP trainees faced challenges in identifying ChatGPT-4o’s errors, particularly in clinical scenarios. This highlights the importance of improving AI literacy and critical thinking skills to ensure effective integration of AI into medical education. |
| format | Article |
| id | doaj-art-93a277dc5e6b4cb689787bb979264ee3 |
| institution | DOAJ |
| issn | 1472-6920 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | BMC |
| record_format | Article |
| series | BMC Medical Education |
| spelling | doaj-art-93a277dc5e6b4cb689787bb979264ee32025-08-20T02:41:34ZengBMCBMC Medical Education1472-69202025-03-012511910.1186/s12909-025-06916-2Integrating AI into clinical education: evaluating general practice trainees’ proficiency in distinguishing AI-generated hallucinations and impacting factorsJiacheng Zhou0Jintao Zhang1Rongrong Wan2Xiaochuan Cui3Qiyu Liu4Hua Guo5Xiaofen Shi6Bingbing Fu7Jia Meng8Bo Yue9Yunyun Zhang10Zhiyong Zhang11Department of General Practice, The Affiliated Wuxi People’s Hospital of Nanjing Medical UniversityDepartment of General Practice, The Affiliated Wuxi People’s Hospital of Nanjing Medical UniversityDepartment of General Practice, The Affiliated Wuxi People’s Hospital of Nanjing Medical UniversityDepartment of General Practice, The Affiliated Wuxi People’s Hospital of Nanjing Medical UniversityDepartment of General Practice, The Affiliated Wuxi People’s Hospital of Nanjing Medical UniversityDepartment of General Practice, The Affiliated Wuxi People’s Hospital of Nanjing Medical UniversityDepartment of General Practice, The Affiliated Wuxi People’s Hospital of Nanjing Medical UniversityDepartment of Postgraduate Education, The First Affiliated Hospital of Jiamusi UniversityDepartment of General Practice, The Second Affiliated Hospital of Harbin Medical UniversityResidency Training Center, The Second Affiliated Hospital of Qiqihar Medical UniversityDepartment of General Practice, The Affiliated Wuxi People’s Hospital of Nanjing Medical UniversityDepartment of General Practice, The Affiliated Wuxi People’s Hospital of Nanjing Medical UniversityAbstract Objective To assess the ability of General Practice (GP) Trainees to detect AI-generated hallucinations in simulated clinical practice, ChatGPT-4o was utilized. The hallucinations were categorized into three types based on the accuracy of the answers and explanations: (1) correct answers with incorrect or flawed explanations, (2) incorrect answers with explanations that contradict factual evidence, and (3) incorrect answers with correct explanations. Methods This multi-center, cross-sectional survey study involved 142 GP Trainees, all of whom were undergoing General Practice Specialist Training and volunteered to participate. The study evaluated the accuracy and consistency of ChatGPT-4o, as well as the Trainees’ response time, accuracy, sensitivity (d’), and response tendencies (β). Binary regression analysis was used to explore factors affecting the Trainees’ ability to identify errors generated by ChatGPT-4o. Results A total of 137 participants were included, with a mean age of 25.93 years. Half of the participants were unfamiliar with AI, and 35.0% had never used it. ChatGPT-4o’s overall accuracy was 80.8%, which slightly decreased to 80.1% after human verification. However, the accuracy for professional practice (Subject 4) was only 57.0%, and after human verification, it dropped further to 44.2%. A total of 87 AI-generated hallucinations were identified, primarily occurring at the application and evaluation levels. The mean accuracy of detecting these hallucinations was 55.0%, and the mean sensitivity (d’) was 0.39. Regression analysis revealed that shorter response times (OR = 0.92, P = 0.02), higher self-assessed AI understanding (OR = 0.16, P = 0.04), and more frequent AI use (OR = 10.43, P = 0.01) were associated with stricter error detection criteria. Conclusions The study concluded that GP trainees faced challenges in identifying ChatGPT-4o’s errors, particularly in clinical scenarios. This highlights the importance of improving AI literacy and critical thinking skills to ensure effective integration of AI into medical education.https://doi.org/10.1186/s12909-025-06916-2ChatGPT-4o generated hallucinationsGeneral practice (GP) traineesGeneral practice specialist trainingResponse bias |
| spellingShingle | Jiacheng Zhou Jintao Zhang Rongrong Wan Xiaochuan Cui Qiyu Liu Hua Guo Xiaofen Shi Bingbing Fu Jia Meng Bo Yue Yunyun Zhang Zhiyong Zhang Integrating AI into clinical education: evaluating general practice trainees’ proficiency in distinguishing AI-generated hallucinations and impacting factors BMC Medical Education ChatGPT-4o generated hallucinations General practice (GP) trainees General practice specialist training Response bias |
| title | Integrating AI into clinical education: evaluating general practice trainees’ proficiency in distinguishing AI-generated hallucinations and impacting factors |
| title_full | Integrating AI into clinical education: evaluating general practice trainees’ proficiency in distinguishing AI-generated hallucinations and impacting factors |
| title_fullStr | Integrating AI into clinical education: evaluating general practice trainees’ proficiency in distinguishing AI-generated hallucinations and impacting factors |
| title_full_unstemmed | Integrating AI into clinical education: evaluating general practice trainees’ proficiency in distinguishing AI-generated hallucinations and impacting factors |
| title_short | Integrating AI into clinical education: evaluating general practice trainees’ proficiency in distinguishing AI-generated hallucinations and impacting factors |
| title_sort | integrating ai into clinical education evaluating general practice trainees proficiency in distinguishing ai generated hallucinations and impacting factors |
| topic | ChatGPT-4o generated hallucinations General practice (GP) trainees General practice specialist training Response bias |
| url | https://doi.org/10.1186/s12909-025-06916-2 |
| work_keys_str_mv | AT jiachengzhou integratingaiintoclinicaleducationevaluatinggeneralpracticetraineesproficiencyindistinguishingaigeneratedhallucinationsandimpactingfactors AT jintaozhang integratingaiintoclinicaleducationevaluatinggeneralpracticetraineesproficiencyindistinguishingaigeneratedhallucinationsandimpactingfactors AT rongrongwan integratingaiintoclinicaleducationevaluatinggeneralpracticetraineesproficiencyindistinguishingaigeneratedhallucinationsandimpactingfactors AT xiaochuancui integratingaiintoclinicaleducationevaluatinggeneralpracticetraineesproficiencyindistinguishingaigeneratedhallucinationsandimpactingfactors AT qiyuliu integratingaiintoclinicaleducationevaluatinggeneralpracticetraineesproficiencyindistinguishingaigeneratedhallucinationsandimpactingfactors AT huaguo integratingaiintoclinicaleducationevaluatinggeneralpracticetraineesproficiencyindistinguishingaigeneratedhallucinationsandimpactingfactors AT xiaofenshi integratingaiintoclinicaleducationevaluatinggeneralpracticetraineesproficiencyindistinguishingaigeneratedhallucinationsandimpactingfactors AT bingbingfu integratingaiintoclinicaleducationevaluatinggeneralpracticetraineesproficiencyindistinguishingaigeneratedhallucinationsandimpactingfactors AT jiameng integratingaiintoclinicaleducationevaluatinggeneralpracticetraineesproficiencyindistinguishingaigeneratedhallucinationsandimpactingfactors AT boyue integratingaiintoclinicaleducationevaluatinggeneralpracticetraineesproficiencyindistinguishingaigeneratedhallucinationsandimpactingfactors AT yunyunzhang integratingaiintoclinicaleducationevaluatinggeneralpracticetraineesproficiencyindistinguishingaigeneratedhallucinationsandimpactingfactors AT zhiyongzhang integratingaiintoclinicaleducationevaluatinggeneralpracticetraineesproficiencyindistinguishingaigeneratedhallucinationsandimpactingfactors |