AI-generated draft replies to patient messages: exploring effects of implementation

IntroductionThe integration of Large Language Models (LLMs) in Electronic Health Records (EHRs) has the potential to reduce administrative burden. Validating these tools in real-world clinical settings is essential for responsible implementation. In this study, the effect of implementing LLM-generat...

Full description

Saved in:

Bibliographic Details
Main Authors:	Charlotte M. H. H. T. Bootsma-Robroeks, Jessica D. Workum, Stephanie C. E. Schuit, Anne Hoekman, Tarannom Mehri, Job N. Doornberg, Tom P. van der Laan, Rosanne C. Schoonbeek
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2025-06-01
Series:	Frontiers in Digital Health
Subjects:	large language model (LLM) inbasket messages adoption time saving LLM generated draft responses electronic health records
Online Access:	https://www.frontiersin.org/articles/10.3389/fdgth.2025.1588143/full
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	IntroductionThe integration of Large Language Models (LLMs) in Electronic Health Records (EHRs) has the potential to reduce administrative burden. Validating these tools in real-world clinical settings is essential for responsible implementation. In this study, the effect of implementing LLM-generated draft responses to patient questions in our EHR is evaluated with regard to adoption, use and potential time savings.Material and methodsPhysicians across 14 medical specialties in a non-English large academic hospital were invited to use LLM-generated draft replies during this prospective observational clinical cohort study of 16 weeks, choosing either the drafted or a blank reply. The adoption rate, the level of adjustments to the initial drafted responses compared to the final sent messages (using ROUGE-1 and BLEU-1 natural language processing scores), and the time spent on these adjustments were analyzed.ResultsA total of 919 messages by 100 physicians were evaluated. Clinicians used the LLM draft in 58% of replies. Of these, 43% used a large part of the suggested text for the final answer (≥10% match drafted responses: ROUGE-1: 86% similarity, vs. blank replies: ROUGE-1: 16%). Total response time did not significantly different when using a blank reply compared to using a drafted reply with ≥10% match (157 vs. 153 s, p = 0.69).DiscussionGeneral adoption of LLM-generated draft responses to patient messages was 58%, although the level of adjustments on the drafted message varied widely between medical specialties. This implicates safe use in a non-English, tertiary setting. The current implementation has not yet resulted in time savings, but a learning curve can be expected.Registration number19035.
ISSN:	2673-253X

AI-generated draft replies to patient messages: exploring effects of implementation

Similar Items