Large language models underperform in European general surgery board examinations: a comparative study with experts and surgical residents

Abstract Background Artificial intelligence (AI) has become a transformative tool in medical education and assessment. Despite advancements, AI models such as GPT-4o demonstrate variable performance on high-stakes examinations. This study compared the performance of four AI models (Llama-3, Gemini,...

Full description

Saved in:
Bibliographic Details
Main Author: Melih Can Gül
Format: Article
Language:English
Published: BMC 2025-08-01
Series:BMC Medical Education
Subjects:
Online Access:https://doi.org/10.1186/s12909-025-07856-7
Tags: Add Tag
No Tags, Be the first to tag this record!

Similar Items