Protocol for human evaluation of generative artificial intelligence chatbots in clinical consultations.

<h4>Background</h4>Generative artificial intelligence (GenAI) has the potential to revolutionise healthcare delivery. The nuances of real-life clinical practice and complex clinical environments demand a rigorous, evidence-based approach to ensure safe and effective deployment of AI.<...

Full description

Saved in:

Bibliographic Details
Main Authors:	Edwin Kwan-Yeung Chiu, Tom Wai-Hin Chung
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2025-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0300487
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	<h4>Background</h4>Generative artificial intelligence (GenAI) has the potential to revolutionise healthcare delivery. The nuances of real-life clinical practice and complex clinical environments demand a rigorous, evidence-based approach to ensure safe and effective deployment of AI.<h4>Methods</h4>We present a protocol for the systematic evaluation of large language models (LLMs) as GenAI chatbots within the context of clinical microbiology and infectious diseases clinical consultations. We aim to critically assess recommendations produced by four leading GenAI models, including Claude 2, Gemini Pro, GPT-4.0, and a GPT-4.0-based custom AI chatbot.<h4>Discussion</h4>A standardised, healthcare-specific, universal prompt template is developed to elicit clinically impactful AI responses. Generated responses will be graded by two panels of practicing clinicians, encompassing a wide spectrum of domain expertise in clinical microbiology and virology, as well as infectious diseases. Evaluations will be performed using a 5-point Likert scale across four clinical domains: factual consistency, comprehensiveness, coherence, and medical harmfulness. Our study will offer insights into the feasibility, limitations, and boundaries of GenAI in clinical consultations, providing guidance for future research and clinical implementation. Ethical guidelines and safety guardrails should be developed to uphold patient safety and clinical standards.
ISSN:	1932-6203

Protocol for human evaluation of generative artificial intelligence chatbots in clinical consultations.

Similar Items