Large language models underperform in European general surgery board examinations: a comparative study with experts and surgical residents

Abstract Background Artificial intelligence (AI) has become a transformative tool in medical education and assessment. Despite advancements, AI models such as GPT-4o demonstrate variable performance on high-stakes examinations. This study compared the performance of four AI models (Llama-3, Gemini,...

Full description

Saved in:

Bibliographic Details
Main Author:	Melih Can Gül
Format:	Article
Language:	English
Published:	BMC 2025-08-01
Series:	BMC Medical Education
Subjects:	Artificial intelligence Board examinations Human-AI comparison Medical education Surgical training
Online Access:	https://doi.org/10.1186/s12909-025-07856-7
Tags:	Add Tag No Tags, Be the first to tag this record!

Internet

https://doi.org/10.1186/s12909-025-07856-7

Large language models underperform in European general surgery board examinations: a comparative study with experts and surgical residents

Internet

Similar Items