Large language model evaluation in autoimmune disease clinical questions comparing ChatGPT 4o, Claude 3.5 Sonnet and Gemini 1.5 pro

Large language model evaluation in autoimmune disease clinical questions comparing ChatGPT 4o, Claude 3.5 Sonnet and Gemini 1.5 pro

Abstract Large language models (LLMs) have established a presence in providing medical services to patients and supporting clinical practice for doctors. To explore the ability of LLMs in answering clinical questions related to autoimmune diseases, this study was designed with 65 questions related t...

Full description

Saved in:

Bibliographic Details
Main Authors:	Juntao Ma, Jie Yu, Anran Xie, Taihong Huang, Wenjing Liu, Mengyin Ma, Yue Tao, Fuyu Zang, Qisi Zheng, Wenbo Zhu, Yuxin Chen, Mingzhe Ning, Yijia Zhu
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-05-01
Series:	Scientific Reports
Subjects:	Large Language models Autoimmune diseases Performance evaluation
Online Access:	https://doi.org/10.1038/s41598-025-02601-y
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Accuracy of ChatGPT-3.5, ChatGPT-4o, Copilot, Gemini, Claude, and Perplexity in advising on lumbosacral radicular pain against clinical practice guidelines: cross-sectional study
by: Giacomo Rossettini, et al.
Published: (2025-06-01)

Evaluating LLMs for Code Generation in HRI: A Comparative Study of ChatGPT, Gemini, and Claude
by: Andrei Sobo, et al.
Published: (2025-12-01)

Performance of Large Language Models in Recognizing Brain MRI Sequences: A Comparative Analysis of ChatGPT-4o, Claude 4 Opus, and Gemini 2.5 Pro
by: Ali Salbas, et al.
Published: (2025-07-01)

Comparative analysis of ChatGPT 3.5 and ChatGPT 4 obstetric and gynecological knowledge
by: Franciszek Ługowski, et al.
Published: (2025-07-01)

Capabilities of ChatGPT-3.5 as a Urological Triage System
by: Christopher Hirtsiefer, et al.
Published: (2024-12-01)

Utilizing ChatGPT-3.5 to Assist Ophthalmologists in Clinical Decision-making
by: Samir Cayenne, et al.
Published: (2025-05-01)

Performance of the Large Language Models in African rheumatology: a diagnostic test accuracy study of ChatGPT-4, Gemini, Copilot, and Claude artificial intelligence
by: Yannick Laurent Tchenadoyo Bayala, et al.
Published: (2025-05-01)

Evaluating the perspectives of ChatGPT and Gemini on glenohumeral osteoarthritis management
by: Michael Megafu, DO, MPH, et al.
Published: (2025-07-01)

Performance of ChatGPT-4 Omni and Gemini 1.5 Pro on Ophthalmology-Related Questions in the Turkish Medical Specialty Exam
by: Mehmet Cem Sabaner, et al.
Published: (2025-08-01)

Performance of ChatGPT-3.5 and ChatGPT-4 in the Taiwan National Pharmacist Licensing Examination: Comparative Evaluation Study
by: Ying-Mei Wang, et al.
Published: (2025-01-01)

Performance of ChatGPT-3.5 and ChatGPT-4 in the field of specialist medical knowledge on National Specialization Exam in neurosurgery
by: Maciej Laskowski, et al.
Published: (2024-10-01)

Is Google Gemini better than ChatGPT at evaluating research quality?
by: Thelwall Mike
Published: (2025-05-01)

ARTIFICIAL INTELLIGENCE IN EDUCATION: CASES OF USING CHATGPT 3.5
by: Дмитро Покришень
Published: (2024-02-01)

Accuracy, appropriateness, and readability of ChatGPT-4 and ChatGPT-3.5 in answering pediatric emergency medicine post-discharge questions
by: Mitul Gupta, et al.
Published: (2025-04-01)

Comparing ChatGPT-3.5 and ChatGPT-4’s alignments with the German evidence-based S3 guideline for adult soft tissue sarcoma
by: Cheng-Peng Li, et al.
Published: (2024-12-01)

ChatGPT vs. Gemini: Which Provides Better Information on Bladder Cancer?
by: Ahmed Alasker, et al.
Published: (2025-04-01)

Comparing ChatGPT3.5 and Bard recommendations for colonoscopy intervals: Bridging the gap in healthcare settings
by: Maziar Amini, et al.
Published: (2025-01-01)

Traducir los 35 Sonnets de Pessoa: corpus grammaticum y sentimiento
by: Jordà-Billinghurst, Madeleine
Published: (2024-12-01)

Comparative analysis of ChatGPT and Gemini (Bard) in medical inquiry: a scoping review
by: Fattah H. Fattah, et al.
Published: (2025-02-01)

Performance of Large Language Models ChatGPT and Gemini on Workplace Management Questions in Radiology
by: Patricia Leutz-Schmidt, et al.
Published: (2025-02-01)

Analysis of ChatGPT-3.5’s Potential in Generating NBME-Standard Pharmacology Questions: What Can Be Improved?
by: Marwa Saad, et al.
Published: (2024-10-01)

A pilot evaluation of the diagnostic accuracy of ChatGPT-3.5 for multiple sclerosis from case reports
by: Joseph Anika, et al.
Published: (2024-12-01)

Evaluation of deepseek, gemini, ChatGPT-4o, and perplexity in responding to salivary gland cancer
by: Ahmed Bashah, et al.
Published: (2025-08-01)

Eye spy with my little AI: An introductory look at ChatGPT and Google Gemini for ophthalmologists
by: Aishwarya Naik, et al.
Published: (2024-12-01)

Comportamiento argumentativo del ChatGPT 3.5: similitudes y diferencias con la práctica argumentativa humana
by: Cristián Noemi Padilla, et al.
Published: (2024-07-01)

Comparative analysis of the performance of the large language models ChatGPT-3.5, ChatGPT-4 and Open AI-o1 in the field of Programmed Cell Death in myeloma
by: Wu Kun, et al.
Published: (2025-05-01)

Assessing the accuracy and readability of ChatGPT-4 and Gemini in answering oral cancer queries—an exploratory study
by: Márcio Diniz-Freitas, et al.
Published: (2024-11-01)

Evaluating the Effectiveness of ChatGPT and Google Gemini in Providing Lung Cancer Screening Recommendations for Vulnerable Communities
by: Caretia J. Washington, BS, et al.
Published: (2025-06-01)

Do Chatbots Exhibit Personality Traits? A Comparison of ChatGPT and Gemini Through Self-Assessment
by: W. Wiktor Jedrzejczak, et al.
Published: (2025-06-01)

Large language models’ capabilities in responding to tuberculosis medical questions: testing ChatGPT, Gemini, and Copilot
by: Meisam Dastani, et al.
Published: (2025-05-01)

Analysis Of User Experience Of ChatGPT And Gemini Users Using The User Experience Quistionnaire (UEQ) For Education
by: Ilham Nasrul, et al.
Published: (2024-11-01)

29. The Free Basic Version of ChatGPT-3.5 Produces More Readable Cosmetic Surgery Patient Educational Materials than Subscription-based Advanced ChatGPT-4.0
by: Pooja Deshpande, et al.
Published: (2025-06-01)

Comparative Performance of Medical Students, ChatGPT-3.5 and ChatGPT-4.0 in Answering Questions From a Brazilian National Medical Exam: Cross-Sectional Questionnaire Study
by: Mateus Rodrigues Alessi, et al.
Published: (2025-05-01)

Translating classical Arabic verse: human translation vs. AI large language models (Gemini and ChatGPT)
by: Mohammed Farghal, et al.
Published: (2024-12-01)

Speculative futures of education: utopian and dystopian scenarios envisioned by Chatgpt, Gemini, and Deepseek
by: Jessie Ming Sin Wong
Published: (2025-08-01)

INVESTIGATION AND COMPARISON OF CHATGPT AND GOOGLE GEMINI EFFICIENCY IN EDUCATION, DESIGN AND ENGINEERING ANALYSIS
by: H. Ghashochi Bargh, et al.
Published: (2025-03-01)

Evaluation of the accuracy of ChatGPT-4 and Gemini’s responses to the World Dental Federation’s frequently asked questions on oral health
by: Aysenur Arpaci, et al.
Published: (2025-08-01)

Subjective Assessment of a Built Environment by ChatGPT, Gemini and Grok: Comparison with Architecture, Engineering and Construction Expert Perception
by: Rachid Belaroussi
Published: (2025-04-01)

Evaluating Large Language Models for Preoperative Patient Education in Superior Capsular Reconstruction: Comparative Study of Claude, GPT, and Gemini
by: Yukang Liu, et al.
Published: (2025-06-01)

3,5-Dinitrobenzoate and 3,5-Dinitrobenzamide Derivatives: Mechanistic, Antifungal, and In Silico Studies
by: Allana B. S. Duarte, et al.
Published: (2022-01-01)