Chat GPT 4o vs residents: French language evaluation in ophthalmology

Purpose: Chatbots capable of answering multiple-choice questions (MCQs) at a level comparable to residents could serve as affordable, 24/7 available educational tools with comprehensive explanations. Their non-judgmental nature could enable residents to freely ask questions without hesitation. There...

Full description

Saved in:

Bibliographic Details
Main Authors:	Leah Attal, Elad Shvartz, Nakhoul Nakhoul, Daniel Bahir
Format:	Article
Language:	English
Published:	Elsevier 2025-04-01
Series:	AJO International
Subjects:	Artificial Intelligence (AI) ChatGPT Chatbots French Ophthalmology
Online Access:	http://www.sciencedirect.com/science/article/pii/S2950253525000073
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850126682155909120
author	Leah Attal Elad Shvartz Nakhoul Nakhoul Daniel Bahir
author_facet	Leah Attal Elad Shvartz Nakhoul Nakhoul Daniel Bahir
author_sort	Leah Attal
collection	DOAJ
description	Purpose: Chatbots capable of answering multiple-choice questions (MCQs) at a level comparable to residents could serve as affordable, 24/7 available educational tools with comprehensive explanations. Their non-judgmental nature could enable residents to freely ask questions without hesitation. Therefore, this study's aim is to evaluate ChatGPT 4o's accuracy to MCQs from the national ophthalmology residency examination in French language, compared to residents and other leading AI chatbots Methods: A set of 600 questions from the national ophthalmology examination was translated into French and submitted to ChatGPT 4o, ChatGPT 4, and Gemini Advanced. The generated responses were compared to official correction grids to evaluate their accuracy. Additionally, variations over time, specialties, and accuracy with both text-based and image-based questions were analysed and compared to residents’ results. Results: ChatGPT 4o achieved an accuracy rate of 67.5 %, outperforming the accuracy of ChatGPT 4 and Gemini Advanced. However, Gemini Advanced exhibited greater sensitivity to the ethical considerations involved in medical advice generation. ChatGPT 4o demonstrated consistent accuracy over time, with particular strength in the fundamentals of ophthalmology, ocular pathologies, and refractive surgery. Its performance in image processing was significantly improved compared to other chatbots, though still inferior to text-based processing. Conclusion: ChatGPT 4o demonstrates sufficient accuracy to pass the ophthalmology national examination, though its performance falls short compared to that of residents. These findings suggest that the use of ChatGPT 4o as an educational tool in ophthalmology residency is promising, even in a non-English language. However, further improvements are needed to enhance its performances.
format	Article
id	doaj-art-9950c0efcef8441b82b17b7868e2f560
institution	OA Journals
issn	2950-2535
language	English
publishDate	2025-04-01
publisher	Elsevier
record_format	Article
series	AJO International
spelling	doaj-art-9950c0efcef8441b82b17b7868e2f5602025-08-20T02:33:51ZengElsevierAJO International2950-25352025-04-012110010410.1016/j.ajoint.2025.100104Chat GPT 4o vs residents: French language evaluation in ophthalmologyLeah Attal0Elad Shvartz1Nakhoul Nakhoul2Daniel Bahir3Azrieli Faculty of Medicine, Bar Ilan University, Safed, Israel.Azrieli Faculty of Medicine, Bar Ilan University, Safed, Israel.Azrieli Faculty of Medicine, Bar Ilan University, Safed, Israel.; Ophthalmology Department, Tzafon Medical Center, Poriya, Israel.Azrieli Faculty of Medicine, Bar Ilan University, Safed, Israel.; Ophthalmology Department, Tzafon Medical Center, Poriya, Israel.; Correspondence author at: The Tzafon Medical Center, General government hospital 768, The Baruch Padeh Medical Center, Poriya M.P. The lower Galilee 15208, Poriya, Israel.Purpose: Chatbots capable of answering multiple-choice questions (MCQs) at a level comparable to residents could serve as affordable, 24/7 available educational tools with comprehensive explanations. Their non-judgmental nature could enable residents to freely ask questions without hesitation. Therefore, this study's aim is to evaluate ChatGPT 4o's accuracy to MCQs from the national ophthalmology residency examination in French language, compared to residents and other leading AI chatbots Methods: A set of 600 questions from the national ophthalmology examination was translated into French and submitted to ChatGPT 4o, ChatGPT 4, and Gemini Advanced. The generated responses were compared to official correction grids to evaluate their accuracy. Additionally, variations over time, specialties, and accuracy with both text-based and image-based questions were analysed and compared to residents’ results. Results: ChatGPT 4o achieved an accuracy rate of 67.5 %, outperforming the accuracy of ChatGPT 4 and Gemini Advanced. However, Gemini Advanced exhibited greater sensitivity to the ethical considerations involved in medical advice generation. ChatGPT 4o demonstrated consistent accuracy over time, with particular strength in the fundamentals of ophthalmology, ocular pathologies, and refractive surgery. Its performance in image processing was significantly improved compared to other chatbots, though still inferior to text-based processing. Conclusion: ChatGPT 4o demonstrates sufficient accuracy to pass the ophthalmology national examination, though its performance falls short compared to that of residents. These findings suggest that the use of ChatGPT 4o as an educational tool in ophthalmology residency is promising, even in a non-English language. However, further improvements are needed to enhance its performances.http://www.sciencedirect.com/science/article/pii/S2950253525000073Artificial Intelligence (AI)ChatGPTChatbotsFrenchOphthalmology
spellingShingle	Leah Attal Elad Shvartz Nakhoul Nakhoul Daniel Bahir Chat GPT 4o vs residents: French language evaluation in ophthalmology AJO International Artificial Intelligence (AI) ChatGPT Chatbots French Ophthalmology
title	Chat GPT 4o vs residents: French language evaluation in ophthalmology
title_full	Chat GPT 4o vs residents: French language evaluation in ophthalmology
title_fullStr	Chat GPT 4o vs residents: French language evaluation in ophthalmology
title_full_unstemmed	Chat GPT 4o vs residents: French language evaluation in ophthalmology
title_short	Chat GPT 4o vs residents: French language evaluation in ophthalmology
title_sort	chat gpt 4o vs residents french language evaluation in ophthalmology
topic	Artificial Intelligence (AI) ChatGPT Chatbots French Ophthalmology
url	http://www.sciencedirect.com/science/article/pii/S2950253525000073
work_keys_str_mv	AT leahattal chatgpt4ovsresidentsfrenchlanguageevaluationinophthalmology AT eladshvartz chatgpt4ovsresidentsfrenchlanguageevaluationinophthalmology AT nakhoulnakhoul chatgpt4ovsresidentsfrenchlanguageevaluationinophthalmology AT danielbahir chatgpt4ovsresidentsfrenchlanguageevaluationinophthalmology

Chat GPT 4o vs residents: French language evaluation in ophthalmology

Similar Items