An evaluation framework for ambient digital scribing tools in clinical applications

Abstract Ambient digital scribing (ADS) tools alleviate clinician documentation burden, reducing burnout and enhancing efficiency. As AI-driven ADS tools integrate into clinical workflows, robust governance is essential for ethical and secure deployment. This study proposes a comprehensive ADS evalu...

Full description

Saved in:
Bibliographic Details
Main Authors: Haoyuan Wang, Rui Yang, Mahmoud Alwakeel, Ankit Kayastha, Anand Chowdhury, Joshua M. Biro, Anthony D. Sorrentino, Jessica L. Handley, Sarah Hantzmon, Sophia Bessias, Nicoleta J. Economou-Zavlanos, Armando Bedoya, Monica Agrawal, Raj M. Ratwani, Eric G. Poon, Michael J. Pencina, Kathryn I. Pollak, Chuan Hong
Format: Article
Language:English
Published: Nature Portfolio 2025-06-01
Series:npj Digital Medicine
Online Access:https://doi.org/10.1038/s41746-025-01622-1
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849691406618066944
author Haoyuan Wang
Rui Yang
Mahmoud Alwakeel
Ankit Kayastha
Anand Chowdhury
Joshua M. Biro
Anthony D. Sorrentino
Jessica L. Handley
Sarah Hantzmon
Sophia Bessias
Nicoleta J. Economou-Zavlanos
Armando Bedoya
Monica Agrawal
Raj M. Ratwani
Eric G. Poon
Michael J. Pencina
Kathryn I. Pollak
Chuan Hong
author_facet Haoyuan Wang
Rui Yang
Mahmoud Alwakeel
Ankit Kayastha
Anand Chowdhury
Joshua M. Biro
Anthony D. Sorrentino
Jessica L. Handley
Sarah Hantzmon
Sophia Bessias
Nicoleta J. Economou-Zavlanos
Armando Bedoya
Monica Agrawal
Raj M. Ratwani
Eric G. Poon
Michael J. Pencina
Kathryn I. Pollak
Chuan Hong
author_sort Haoyuan Wang
collection DOAJ
description Abstract Ambient digital scribing (ADS) tools alleviate clinician documentation burden, reducing burnout and enhancing efficiency. As AI-driven ADS tools integrate into clinical workflows, robust governance is essential for ethical and secure deployment. This study proposes a comprehensive ADS evaluation framework incorporating human evaluation, automated metrics, simulation testing, and large language models (LLMs) as evaluators. Our framework assesses transcription, diarization, and medical note generation across criteria such as fluency, completeness, and factuality. To demonstrate its effectiveness, we developed an ADS tool and applied our framework to evaluate the tool’s performance on 40 real clinical visit recordings. Our evaluation revealed strengths, such as fluency and clarity, but also highlighted weaknesses in factual accuracy and the ability to capture new medications. These findings underscore the value of structured ADS evaluation in improving healthcare delivery while emphasizing the need for strong governance to ensure safe, ethical integration.
format Article
id doaj-art-ad995c1b87864b238da3a2bb5d80d06d
institution DOAJ
issn 2398-6352
language English
publishDate 2025-06-01
publisher Nature Portfolio
record_format Article
series npj Digital Medicine
spelling doaj-art-ad995c1b87864b238da3a2bb5d80d06d2025-08-20T03:21:02ZengNature Portfolionpj Digital Medicine2398-63522025-06-018111310.1038/s41746-025-01622-1An evaluation framework for ambient digital scribing tools in clinical applicationsHaoyuan Wang0Rui Yang1Mahmoud Alwakeel2Ankit Kayastha3Anand Chowdhury4Joshua M. Biro5Anthony D. Sorrentino6Jessica L. Handley7Sarah Hantzmon8Sophia Bessias9Nicoleta J. Economou-Zavlanos10Armando Bedoya11Monica Agrawal12Raj M. Ratwani13Eric G. Poon14Michael J. Pencina15Kathryn I. Pollak16Chuan Hong17Department of Biostatistics and Bioinformatics, Duke University School of MedicineCentre for Quantitative Medicine, Duke-NUS Medical SchoolDepartment of Medicine, Duke University School of MedicineDepartment of Medicine, Duke University School of MedicineDepartment of Medicine, Duke University School of MedicineMedstar Health National Center for Human Factors in HealthcareDepartment of Medicine, Duke University School of MedicineMedstar Health National Center for Human Factors in HealthcareCancer Prevention and Control Research Program, Duke Cancer InstituteDuke Clinical and Translational Science Institute, Duke University School of MedicineDepartment of Biostatistics and Bioinformatics, Duke University School of MedicineDepartment of Biostatistics and Bioinformatics, Duke University School of MedicineDepartment of Biostatistics and Bioinformatics, Duke University School of MedicineMedstar Health National Center for Human Factors in HealthcareDepartment of Biostatistics and Bioinformatics, Duke University School of MedicineDepartment of Biostatistics and Bioinformatics, Duke University School of MedicineCancer Prevention and Control Research Program, Duke Cancer InstituteDepartment of Biostatistics and Bioinformatics, Duke University School of MedicineAbstract Ambient digital scribing (ADS) tools alleviate clinician documentation burden, reducing burnout and enhancing efficiency. As AI-driven ADS tools integrate into clinical workflows, robust governance is essential for ethical and secure deployment. This study proposes a comprehensive ADS evaluation framework incorporating human evaluation, automated metrics, simulation testing, and large language models (LLMs) as evaluators. Our framework assesses transcription, diarization, and medical note generation across criteria such as fluency, completeness, and factuality. To demonstrate its effectiveness, we developed an ADS tool and applied our framework to evaluate the tool’s performance on 40 real clinical visit recordings. Our evaluation revealed strengths, such as fluency and clarity, but also highlighted weaknesses in factual accuracy and the ability to capture new medications. These findings underscore the value of structured ADS evaluation in improving healthcare delivery while emphasizing the need for strong governance to ensure safe, ethical integration.https://doi.org/10.1038/s41746-025-01622-1
spellingShingle Haoyuan Wang
Rui Yang
Mahmoud Alwakeel
Ankit Kayastha
Anand Chowdhury
Joshua M. Biro
Anthony D. Sorrentino
Jessica L. Handley
Sarah Hantzmon
Sophia Bessias
Nicoleta J. Economou-Zavlanos
Armando Bedoya
Monica Agrawal
Raj M. Ratwani
Eric G. Poon
Michael J. Pencina
Kathryn I. Pollak
Chuan Hong
An evaluation framework for ambient digital scribing tools in clinical applications
npj Digital Medicine
title An evaluation framework for ambient digital scribing tools in clinical applications
title_full An evaluation framework for ambient digital scribing tools in clinical applications
title_fullStr An evaluation framework for ambient digital scribing tools in clinical applications
title_full_unstemmed An evaluation framework for ambient digital scribing tools in clinical applications
title_short An evaluation framework for ambient digital scribing tools in clinical applications
title_sort evaluation framework for ambient digital scribing tools in clinical applications
url https://doi.org/10.1038/s41746-025-01622-1
work_keys_str_mv AT haoyuanwang anevaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT ruiyang anevaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT mahmoudalwakeel anevaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT ankitkayastha anevaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT anandchowdhury anevaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT joshuambiro anevaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT anthonydsorrentino anevaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT jessicalhandley anevaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT sarahhantzmon anevaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT sophiabessias anevaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT nicoletajeconomouzavlanos anevaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT armandobedoya anevaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT monicaagrawal anevaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT rajmratwani anevaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT ericgpoon anevaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT michaeljpencina anevaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT kathrynipollak anevaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT chuanhong anevaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT haoyuanwang evaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT ruiyang evaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT mahmoudalwakeel evaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT ankitkayastha evaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT anandchowdhury evaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT joshuambiro evaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT anthonydsorrentino evaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT jessicalhandley evaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT sarahhantzmon evaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT sophiabessias evaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT nicoletajeconomouzavlanos evaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT armandobedoya evaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT monicaagrawal evaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT rajmratwani evaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT ericgpoon evaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT michaeljpencina evaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT kathrynipollak evaluationframeworkforambientdigitalscribingtoolsinclinicalapplications
AT chuanhong evaluationframeworkforambientdigitalscribingtoolsinclinicalapplications