ChatGPT performance in answering medical residency questions in nephrology: a pilot study in Brazil
Abstract Objective: This study evaluated the performance of ChatGPT 4 and 3.5 versions in answering nephrology questions from medical residency exams in Brazil. Methods: A total of 411 multiple-choice questions, with and without images, were analyzed, organized into four main themes: chronic kidne...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Sociedade Brasileira de Nefrologia
2025-07-01
|
| Series: | Brazilian Journal of Nephrology |
| Subjects: | |
| Online Access: | http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0101-28002025000400302&lng=en&tlng=en |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Objective: This study evaluated the performance of ChatGPT 4 and 3.5 versions in answering nephrology questions from medical residency exams in Brazil. Methods: A total of 411 multiple-choice questions, with and without images, were analyzed, organized into four main themes: chronic kidney disease (CKD), hydroelectrolytic and acid-base disorders (HABD), tubulointerstitial diseases (TID), and glomerular diseases (GD). Questions with images were answered only by ChatGPT-4. Statistical analysis was performed using the chi-square test. Results: ChatGPT-4 achieved an overall accuracy of 79.80%, while ChatGPT-3.5 achieved 56.29%, with a statistically significant difference (p < 0.001). In the main themes, ChatGPT-4 performed better in HABD (79.11% vs. 55.17%), TID (88.23% vs. 52.23%), CKD (75.51% vs. 61.95%), and DG (79.31% vs. 55.29%), all with p < 0.001. ChatGPT-4 presented an accuracy of 81.49% in questions without images and 54.54% in questions with images, with an accuracy of 60% for electrocardiogram analysis. This study is limited by the small number of image-based questions and the use of outdated examination items, reducing its ability to assess visual diagnostic skills and current clinical relevance. Furthermore, addressing only 4 areas of Nephrology may not fully represent the breadth of nephrology practice. Conclusion: ChatGPT-3.5 was found to have limitations in nephrology reasoning compared to ChatGPT-4, evidencing gaps in knowledge. The study suggests that further exploration is needed in other nephrology themes to improve the use of these AI tools. |
|---|---|
| ISSN: | 2175-8239 |