Evaluation of four chatbots in autoimmune liver disease: A comparative analysis

Introduction and Objectives: Autoimmune liver diseases (AILDs) are rare and require precise evaluation, which is often challenging for medical providers. Chatbots are innovative solutions to assist healthcare professionals in clinical management. In our study, ten liver specialists systematically ev...

Full description

Saved in:
Bibliographic Details
Main Authors: Jimmy Daza, Lucas Soares Bezerra, Laura Santamaría, Roberto Rueda-Esteban, Heike Bantel, Marcos Girala, Matthias Ebert, Florian Van Bömmel, Andreas Geier, Andres Gomez Aldana, Kevin Yau, Mario Alvares-da-Silva, Markus Peck-Radosavljevic, Ezequiel Ridruejo, Arndt Weinmann, Andreas Teufel
Format: Article
Language:English
Published: Elsevier 2025-01-01
Series:Annals of Hepatology
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1665268124003314
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849306663516897280
author Jimmy Daza
Lucas Soares Bezerra
Laura Santamaría
Roberto Rueda-Esteban
Heike Bantel
Marcos Girala
Matthias Ebert
Florian Van Bömmel
Andreas Geier
Andres Gomez Aldana
Kevin Yau
Mario Alvares-da-Silva
Markus Peck-Radosavljevic
Ezequiel Ridruejo
Arndt Weinmann
Andreas Teufel
author_facet Jimmy Daza
Lucas Soares Bezerra
Laura Santamaría
Roberto Rueda-Esteban
Heike Bantel
Marcos Girala
Matthias Ebert
Florian Van Bömmel
Andreas Geier
Andres Gomez Aldana
Kevin Yau
Mario Alvares-da-Silva
Markus Peck-Radosavljevic
Ezequiel Ridruejo
Arndt Weinmann
Andreas Teufel
author_sort Jimmy Daza
collection DOAJ
description Introduction and Objectives: Autoimmune liver diseases (AILDs) are rare and require precise evaluation, which is often challenging for medical providers. Chatbots are innovative solutions to assist healthcare professionals in clinical management. In our study, ten liver specialists systematically evaluated four chatbots to determine their utility as clinical decision support tools in the field of AILDs. Materials and Methods: We constructed a 56-question questionnaire focusing on AILD evaluation, diagnosis, and management of Autoimmune Hepatitis (AIH), Primary Biliary Cholangitis (PBC), and Primary Sclerosing Cholangitis (PSC). Four chatbots -ChatGPT 3.5, Claude, Microsoft Copilot, and Google Bard- were presented with the questions in their free tiers in December 2023. Responses underwent critical evaluation by ten liver specialists using a standardized 1 to 10 Likert scale. The analysis included mean scores, the number of highest-rated replies, and the identification of common shortcomings in chatbots performance. Results: Among the assessed chatbots, specialists rated Claude highest with a mean score of 7.37 (SD = 1.91), followed by ChatGPT (7.17, SD = 1.89), Microsoft Copilot (6.63, SD = 2.10), and Google Bard (6.52, SD = 2.27). Claude also excelled with 27 best-rated replies, outperforming ChatGPT (20), while Microsoft Copilot and Google Bard lagged with only 6 and 9, respectively. Common deficiencies included listing details over specific advice, limited dosing options, inaccuracies for pregnant patients, insufficient recent data, over-reliance on CT and MRI imaging, and inadequate discussion regarding off-label use and fibrates in PBC treatment. Notably, internet access for Microsoft Copilot and Google Bard did not enhance precision compared to pre-trained models. Conclusions: Chatbots hold promise in AILD support, but our study underscores key areas for improvement. Refinement is needed in providing specific advice, accuracy, and focused up-to-date information. Addressing these shortcomings is essential for enhancing the utility of chatbots in AILD management, guiding future development, and ensuring their effectiveness as clinical decision-support tools.
format Article
id doaj-art-fd13aa3532a542b2bc55722030cfe4fb
institution Kabale University
issn 1665-2681
language English
publishDate 2025-01-01
publisher Elsevier
record_format Article
series Annals of Hepatology
spelling doaj-art-fd13aa3532a542b2bc55722030cfe4fb2025-08-20T03:55:00ZengElsevierAnnals of Hepatology1665-26812025-01-0130110153710.1016/j.aohep.2024.101537Evaluation of four chatbots in autoimmune liver disease: A comparative analysisJimmy Daza0Lucas Soares Bezerra1Laura Santamaría2Roberto Rueda-Esteban3Heike Bantel4Marcos Girala5Matthias Ebert6Florian Van Bömmel7Andreas Geier8Andres Gomez Aldana9Kevin Yau10Mario Alvares-da-Silva11Markus Peck-Radosavljevic12Ezequiel Ridruejo13Arndt Weinmann14Andreas Teufel15Division of Hepatology, Division of Clinical Bioinformatics, Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, GermanyDivision of Hepatology, Division of Clinical Bioinformatics, Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, GermanyUniversidad de Los Andes School of Medicine, Bogotá, ColombiaUniversidad de Los Andes School of Medicine, Bogotá, ColombiaDepartment of Gastroenterology, Hepatology, Infectious Diseases and Endocrinology, Hannover Medical School, Hannover, GermanyDepartment of Gastroenterology, Hospital de Clínicas, Universidad Nacional de Asunción, Asunción, ParaguayDepartment of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, GermanyDepartment of Medicine II, Clinic of Gastroenterology, Hepatology, Infectious Diseases and Pneumology, Leipzig University Medical Center, Leipzig, GermanyDepartment of Internal Medicine II, Division of Hepatology, University Hospital Würzburg, Würzburg, GermanyTexas Liver Institute, University of Texas Health Science Center, San Antonio, United StatesDivision of Nephrology, Department of Medicine, University of Toronto, Toronto, Ontario, CanadaDepartment of Gastroenterology, Hospital de Clinicas de Porto Alegre, Universidade Federal do Rio Grande do Sul, Porto Alegre, BrazilInternal Medicine and Gastroenterology (IMuG), Clinic Klagenfurt am Woerthersee, Klagenfurt, AustriaDepartment of Medicine, Section of Hepatology, Centro de Educación Médica e Investigaciones Clínicas Norberto Quirno “CEMIC”, Buenos Aires, ArgentinaDepartment of Internal Medicine I, University Medical Center of the Johannes Gutenberg-University Mainz, Mainz, GermanyDivision of Hepatology, Division of Clinical Bioinformatics, Department of Medicine II, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany; Corresponding author.Introduction and Objectives: Autoimmune liver diseases (AILDs) are rare and require precise evaluation, which is often challenging for medical providers. Chatbots are innovative solutions to assist healthcare professionals in clinical management. In our study, ten liver specialists systematically evaluated four chatbots to determine their utility as clinical decision support tools in the field of AILDs. Materials and Methods: We constructed a 56-question questionnaire focusing on AILD evaluation, diagnosis, and management of Autoimmune Hepatitis (AIH), Primary Biliary Cholangitis (PBC), and Primary Sclerosing Cholangitis (PSC). Four chatbots -ChatGPT 3.5, Claude, Microsoft Copilot, and Google Bard- were presented with the questions in their free tiers in December 2023. Responses underwent critical evaluation by ten liver specialists using a standardized 1 to 10 Likert scale. The analysis included mean scores, the number of highest-rated replies, and the identification of common shortcomings in chatbots performance. Results: Among the assessed chatbots, specialists rated Claude highest with a mean score of 7.37 (SD = 1.91), followed by ChatGPT (7.17, SD = 1.89), Microsoft Copilot (6.63, SD = 2.10), and Google Bard (6.52, SD = 2.27). Claude also excelled with 27 best-rated replies, outperforming ChatGPT (20), while Microsoft Copilot and Google Bard lagged with only 6 and 9, respectively. Common deficiencies included listing details over specific advice, limited dosing options, inaccuracies for pregnant patients, insufficient recent data, over-reliance on CT and MRI imaging, and inadequate discussion regarding off-label use and fibrates in PBC treatment. Notably, internet access for Microsoft Copilot and Google Bard did not enhance precision compared to pre-trained models. Conclusions: Chatbots hold promise in AILD support, but our study underscores key areas for improvement. Refinement is needed in providing specific advice, accuracy, and focused up-to-date information. Addressing these shortcomings is essential for enhancing the utility of chatbots in AILD management, guiding future development, and ensuring their effectiveness as clinical decision-support tools.http://www.sciencedirect.com/science/article/pii/S1665268124003314Artificial intelligenceChatbotsClinical decision support toolsAutoimmune liver disease
spellingShingle Jimmy Daza
Lucas Soares Bezerra
Laura Santamaría
Roberto Rueda-Esteban
Heike Bantel
Marcos Girala
Matthias Ebert
Florian Van Bömmel
Andreas Geier
Andres Gomez Aldana
Kevin Yau
Mario Alvares-da-Silva
Markus Peck-Radosavljevic
Ezequiel Ridruejo
Arndt Weinmann
Andreas Teufel
Evaluation of four chatbots in autoimmune liver disease: A comparative analysis
Annals of Hepatology
Artificial intelligence
Chatbots
Clinical decision support tools
Autoimmune liver disease
title Evaluation of four chatbots in autoimmune liver disease: A comparative analysis
title_full Evaluation of four chatbots in autoimmune liver disease: A comparative analysis
title_fullStr Evaluation of four chatbots in autoimmune liver disease: A comparative analysis
title_full_unstemmed Evaluation of four chatbots in autoimmune liver disease: A comparative analysis
title_short Evaluation of four chatbots in autoimmune liver disease: A comparative analysis
title_sort evaluation of four chatbots in autoimmune liver disease a comparative analysis
topic Artificial intelligence
Chatbots
Clinical decision support tools
Autoimmune liver disease
url http://www.sciencedirect.com/science/article/pii/S1665268124003314
work_keys_str_mv AT jimmydaza evaluationoffourchatbotsinautoimmuneliverdiseaseacomparativeanalysis
AT lucassoaresbezerra evaluationoffourchatbotsinautoimmuneliverdiseaseacomparativeanalysis
AT laurasantamaria evaluationoffourchatbotsinautoimmuneliverdiseaseacomparativeanalysis
AT robertoruedaesteban evaluationoffourchatbotsinautoimmuneliverdiseaseacomparativeanalysis
AT heikebantel evaluationoffourchatbotsinautoimmuneliverdiseaseacomparativeanalysis
AT marcosgirala evaluationoffourchatbotsinautoimmuneliverdiseaseacomparativeanalysis
AT matthiasebert evaluationoffourchatbotsinautoimmuneliverdiseaseacomparativeanalysis
AT florianvanbommel evaluationoffourchatbotsinautoimmuneliverdiseaseacomparativeanalysis
AT andreasgeier evaluationoffourchatbotsinautoimmuneliverdiseaseacomparativeanalysis
AT andresgomezaldana evaluationoffourchatbotsinautoimmuneliverdiseaseacomparativeanalysis
AT kevinyau evaluationoffourchatbotsinautoimmuneliverdiseaseacomparativeanalysis
AT marioalvaresdasilva evaluationoffourchatbotsinautoimmuneliverdiseaseacomparativeanalysis
AT markuspeckradosavljevic evaluationoffourchatbotsinautoimmuneliverdiseaseacomparativeanalysis
AT ezequielridruejo evaluationoffourchatbotsinautoimmuneliverdiseaseacomparativeanalysis
AT arndtweinmann evaluationoffourchatbotsinautoimmuneliverdiseaseacomparativeanalysis
AT andreasteufel evaluationoffourchatbotsinautoimmuneliverdiseaseacomparativeanalysis