Performance of ChatGPT-3.5 and ChatGPT-4 in the field of specialist medical knowledge on National Specialization Exam in neurosurgery

Introduction: In recent times, there has been an increased number of published materials related to artificial intelligence (AI) in both the medical field, and specifically, in the domain of neurosurgery. Studies integrating AI into neurosurgical practice suggest an ongoing shift towards a greater d...

Full description

Saved in:

Bibliographic Details
Main Authors:	Maciej Laskowski, Marcin Ciekalski, Marcin Laskowski, Bartłomiej Błaszczyk, Marcin Setlak, Piotr Paździora, Adam Rudnik
Format:	Article
Language:	English
Published:	Śląski Uniwersytet Medyczny w Katowicach 2024-10-01
Series:	Annales Academiae Medicae Silesiensis
Subjects:	chatgpt neurosurgery artificial intelligence (ai)
Online Access:	https://annales.sum.edu.pl/Performance-of-ChatGPT-3-5-and-ChatGPT-4-in-the-field-of-specialist-medical-knowledge,186827,0,2.html
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850200887127965696
author	Maciej Laskowski Marcin Ciekalski Marcin Laskowski Bartłomiej Błaszczyk Marcin Setlak Piotr Paździora Adam Rudnik
author_facet	Maciej Laskowski Marcin Ciekalski Marcin Laskowski Bartłomiej Błaszczyk Marcin Setlak Piotr Paździora Adam Rudnik
author_sort	Maciej Laskowski
collection	DOAJ
description	Introduction: In recent times, there has been an increased number of published materials related to artificial intelligence (AI) in both the medical field, and specifically, in the domain of neurosurgery. Studies integrating AI into neurosurgical practice suggest an ongoing shift towards a greater dependence on AI-assisted tools for diagnostics, image analysis, and decision-making. Material and methods: The study evaluated the performance of ChatGPT-3.5 and ChatGPT-4 on a neurosurgery exam from Autumn 2017, which was the latest exam with officially provided answers on the Medical Examinations Center in Łódź, Poland (Centrum Egzaminów Medycznych – CEM) website. The passing score for the National Specialization Exam (Państwowy Egzamin Specjalizacyjny – PES) in Poland, as administered by CEM, is 56% of the valid questions. This exam, chosen from CEM, comprised 116 single-choice questions after eliminating four outdated questions. These questions were categorized into ten thematic groups based on the subjects they address. For data collection, both ChatGPT versions were briefed on the exam rules and asked to rate their confidence in each answer on a scale from 1 (definitely not sure) to 5 (definitely sure). All the interactions were conducted in Polish and were recorded. Results: ChatGPT-4 significantly outperformed ChatGPT-3.5, showing a notable improvement with a 29.4% margin (p < 0.001). Unlike ChatGPT-3.5, ChatGPT-4 successfully reached the passing threshold for the PES. ChatGPT-3.5 and ChatGPT-4 had the same answers in 61 questions (52.58%), both were correct in 28 questions (24.14%), and were incorrect in 33 questions (28.45%). Conclusions: ChatGPT-4 shows improved accuracy over ChatGPT-3.5, likely due to advanced algorithms and a broader training dataset, highlighting its better grasp of complex neurosurgical concepts.
format	Article
id	doaj-art-360bd79b2f28480ab8f8d5563783e8ef
institution	OA Journals
issn	1734-025X
language	English
publishDate	2024-10-01
publisher	Śląski Uniwersytet Medyczny w Katowicach
record_format	Article
series	Annales Academiae Medicae Silesiensis
spelling	doaj-art-360bd79b2f28480ab8f8d5563783e8ef2025-08-20T02:12:11ZengŚląski Uniwersytet Medyczny w KatowicachAnnales Academiae Medicae Silesiensis1734-025X2024-10-017825325810.18794/aams/186827Performance of ChatGPT-3.5 and ChatGPT-4 in the field of specialist medical knowledge on National Specialization Exam in neurosurgeryMaciej Laskowski0https://orcid.org/0009-0005-5809-0875Marcin Ciekalski1https://orcid.org/0000-0003-1392-2007Marcin Laskowski2Bartłomiej Błaszczyk3Marcin Setlak4Piotr Paździora5Adam Rudnik6Students’ Scientific Club, Department of Neurosurgery, Faculty of Medical Sciences in Katowice, Medical University of Silesia, Katowice, PolandStudents’ Scientific Club, Department of Neurosurgery, Faculty of Medical Sciences in Katowice, Medical University of Silesia, Katowice, PolandUnhyped, AI Growth Partner, Kraków, PolandDepartment of Neurosurgery, Faculty of Medical Sciences in Katowice, Medical University of Silesia, Katowice, PolandDepartment of Neurosurgery, Faculty of Medical Sciences in Katowice, Medical University of Silesia, Katowice, PolandDepartment of Neurosurgery, Faculty of Medical Sciences in Katowice, Medical University of Silesia, Katowice, PolandDepartment of Neurosurgery, Faculty of Medical Sciences in Katowice, Medical University of Silesia, Katowice, PolandIntroduction: In recent times, there has been an increased number of published materials related to artificial intelligence (AI) in both the medical field, and specifically, in the domain of neurosurgery. Studies integrating AI into neurosurgical practice suggest an ongoing shift towards a greater dependence on AI-assisted tools for diagnostics, image analysis, and decision-making. Material and methods: The study evaluated the performance of ChatGPT-3.5 and ChatGPT-4 on a neurosurgery exam from Autumn 2017, which was the latest exam with officially provided answers on the Medical Examinations Center in Łódź, Poland (Centrum Egzaminów Medycznych – CEM) website. The passing score for the National Specialization Exam (Państwowy Egzamin Specjalizacyjny – PES) in Poland, as administered by CEM, is 56% of the valid questions. This exam, chosen from CEM, comprised 116 single-choice questions after eliminating four outdated questions. These questions were categorized into ten thematic groups based on the subjects they address. For data collection, both ChatGPT versions were briefed on the exam rules and asked to rate their confidence in each answer on a scale from 1 (definitely not sure) to 5 (definitely sure). All the interactions were conducted in Polish and were recorded. Results: ChatGPT-4 significantly outperformed ChatGPT-3.5, showing a notable improvement with a 29.4% margin (p < 0.001). Unlike ChatGPT-3.5, ChatGPT-4 successfully reached the passing threshold for the PES. ChatGPT-3.5 and ChatGPT-4 had the same answers in 61 questions (52.58%), both were correct in 28 questions (24.14%), and were incorrect in 33 questions (28.45%). Conclusions: ChatGPT-4 shows improved accuracy over ChatGPT-3.5, likely due to advanced algorithms and a broader training dataset, highlighting its better grasp of complex neurosurgical concepts.https://annales.sum.edu.pl/Performance-of-ChatGPT-3-5-and-ChatGPT-4-in-the-field-of-specialist-medical-knowledge,186827,0,2.htmlchatgptneurosurgeryartificial intelligence (ai)
spellingShingle	Maciej Laskowski Marcin Ciekalski Marcin Laskowski Bartłomiej Błaszczyk Marcin Setlak Piotr Paździora Adam Rudnik Performance of ChatGPT-3.5 and ChatGPT-4 in the field of specialist medical knowledge on National Specialization Exam in neurosurgery Annales Academiae Medicae Silesiensis chatgpt neurosurgery artificial intelligence (ai)
title	Performance of ChatGPT-3.5 and ChatGPT-4 in the field of specialist medical knowledge on National Specialization Exam in neurosurgery
title_full	Performance of ChatGPT-3.5 and ChatGPT-4 in the field of specialist medical knowledge on National Specialization Exam in neurosurgery
title_fullStr	Performance of ChatGPT-3.5 and ChatGPT-4 in the field of specialist medical knowledge on National Specialization Exam in neurosurgery
title_full_unstemmed	Performance of ChatGPT-3.5 and ChatGPT-4 in the field of specialist medical knowledge on National Specialization Exam in neurosurgery
title_short	Performance of ChatGPT-3.5 and ChatGPT-4 in the field of specialist medical knowledge on National Specialization Exam in neurosurgery
title_sort	performance of chatgpt 3 5 and chatgpt 4 in the field of specialist medical knowledge on national specialization exam in neurosurgery
topic	chatgpt neurosurgery artificial intelligence (ai)
url	https://annales.sum.edu.pl/Performance-of-ChatGPT-3-5-and-ChatGPT-4-in-the-field-of-specialist-medical-knowledge,186827,0,2.html
work_keys_str_mv	AT maciejlaskowski performanceofchatgpt35andchatgpt4inthefieldofspecialistmedicalknowledgeonnationalspecializationexaminneurosurgery AT marcinciekalski performanceofchatgpt35andchatgpt4inthefieldofspecialistmedicalknowledgeonnationalspecializationexaminneurosurgery AT marcinlaskowski performanceofchatgpt35andchatgpt4inthefieldofspecialistmedicalknowledgeonnationalspecializationexaminneurosurgery AT bartłomiejbłaszczyk performanceofchatgpt35andchatgpt4inthefieldofspecialistmedicalknowledgeonnationalspecializationexaminneurosurgery AT marcinsetlak performanceofchatgpt35andchatgpt4inthefieldofspecialistmedicalknowledgeonnationalspecializationexaminneurosurgery AT piotrpazdziora performanceofchatgpt35andchatgpt4inthefieldofspecialistmedicalknowledgeonnationalspecializationexaminneurosurgery AT adamrudnik performanceofchatgpt35andchatgpt4inthefieldofspecialistmedicalknowledgeonnationalspecializationexaminneurosurgery

Performance of ChatGPT-3.5 and ChatGPT-4 in the field of specialist medical knowledge on National Specialization Exam in neurosurgery

Similar Items