Evaluating the Usability, Technical Performance, and Accuracy of Artificial Intelligence Scribes for Primary Care: Competitive Analysis

Abstract BackgroundPrimary care providers (PCPs) face significant burnout due to increasing administrative and documentation demands, contributing to job dissatisfaction and impacting care quality. Artificial intelligence (AI) scribes have emerged as potential solutions to red...

Full description

Saved in:

Bibliographic Details
Main Authors:	Emily Ha, Isabelle Choon-Kon-Yune, LaShawn Murray, Siying Luan, Enid Montague, Onil Bhattacharyya, Payal Agarwal
Format:	Article
Language:	English
Published:	JMIR Publications 2025-07-01
Series:	JMIR Human Factors
Online Access:	https://humanfactors.jmir.org/2025/1/e71434
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849246376465006592
author	Emily Ha Isabelle Choon-Kon-Yune LaShawn Murray Siying Luan Enid Montague Onil Bhattacharyya Payal Agarwal
author_facet	Emily Ha Isabelle Choon-Kon-Yune LaShawn Murray Siying Luan Enid Montague Onil Bhattacharyya Payal Agarwal
author_sort	Emily Ha
collection	DOAJ
description	Abstract BackgroundPrimary care providers (PCPs) face significant burnout due to increasing administrative and documentation demands, contributing to job dissatisfaction and impacting care quality. Artificial intelligence (AI) scribes have emerged as potential solutions to reduce administrative burden by automating clinical documentation of patient encounters. Although AI scribes are gaining popularity in primary care, there is limited information on their usability, effectiveness, and accuracy. ObjectiveThis study aimed to develop and apply an evaluation framework to systematically assess the usability, technical performance, and accuracy of various AI scribes used in primary care settings across Canada and the United States. MethodsWe conducted a systematic comparison of a suite of AI scribes using competitive analysis methods. An evaluation framework was developed using expert usability approaches and human factors engineering principles and comprises 3 domains: usability, effectiveness and technical performance, and accuracy and quality. Audio files from 4 standardized patient encounters were used to generate transcripts and SOAP (Subjective, Objective, Assessment, and Plan)–format medical notes from each AI scribe. A verbatim transcript, detailed case notes, and physician-written medical notes for each audio file served as a benchmark for comparison against the AI-generated outputs. Applicable items were rated on a 3-point Likert scale (1=poor, 2=good, 3=excellent). Additional insights were gathered from clinical experts, vendor questionnaires, and public resources to support usability, effectiveness, and quality findings. ResultsIn total, 6 AI scribes were evaluated, with notable performance differences. Most AI scribes could be accessed via various platforms (n=4) and launched within common electronic medical records, though data exchange capabilities were limited. Nearly all AI scribes generated SOAP-format notes in approximately 1 minute for a 15-minute standardized encounter (n=5), though documentation time increased with encounter length and topic complexity. While all AI scribes produced good to excellent quality medical notes, none were consistently error-free. Common errors included deletion, omission, and SOAP structure errors. Factors such as extraneous conversations and multiple speakers impacted the accuracy of both the transcript and medical note, with some AI scribes producing excellent notes despite minor transcript issues and vice versa. Limitations in usability, technical performance, and accuracy suggest areas for improvement to fully realize AI scribes’ potential in reducing administrative burden for PCPs. ConclusionsThis study offers one of the first systematic evaluations of the usability, effectiveness, and accuracy of a suite of AI scribes currently used in primary care, providing benchmark data for further research, policy, and practice. While AI scribes show promise in reducing documentation burdens, improvements and ongoing evaluations are essential to ensure safe and effective use. Future studies should assess AI scribe performance in real-world settings across diverse populations to support equitable and reliable applications.
format	Article
id	doaj-art-046e59e2cddd4ee48d6ff32623e0c7db
institution	Kabale University
issn	2292-9495
language	English
publishDate	2025-07-01
publisher	JMIR Publications
record_format	Article
series	JMIR Human Factors
spelling	doaj-art-046e59e2cddd4ee48d6ff32623e0c7db2025-08-20T03:58:31ZengJMIR PublicationsJMIR Human Factors2292-94952025-07-0112e71434e7143410.2196/71434Evaluating the Usability, Technical Performance, and Accuracy of Artificial Intelligence Scribes for Primary Care: Competitive AnalysisEmily Hahttp://orcid.org/0000-0003-4639-7894Isabelle Choon-Kon-Yunehttp://orcid.org/0009-0001-4357-3381LaShawn Murrayhttp://orcid.org/0009-0003-9710-5545Siying Luanhttp://orcid.org/0000-0002-6861-7010Enid Montaguehttp://orcid.org/0009-0005-7387-4248Onil Bhattacharyyahttp://orcid.org/0000-0001-5219-7288Payal Agarwalhttp://orcid.org/0000-0001-5625-7532 Abstract BackgroundPrimary care providers (PCPs) face significant burnout due to increasing administrative and documentation demands, contributing to job dissatisfaction and impacting care quality. Artificial intelligence (AI) scribes have emerged as potential solutions to reduce administrative burden by automating clinical documentation of patient encounters. Although AI scribes are gaining popularity in primary care, there is limited information on their usability, effectiveness, and accuracy. ObjectiveThis study aimed to develop and apply an evaluation framework to systematically assess the usability, technical performance, and accuracy of various AI scribes used in primary care settings across Canada and the United States. MethodsWe conducted a systematic comparison of a suite of AI scribes using competitive analysis methods. An evaluation framework was developed using expert usability approaches and human factors engineering principles and comprises 3 domains: usability, effectiveness and technical performance, and accuracy and quality. Audio files from 4 standardized patient encounters were used to generate transcripts and SOAP (Subjective, Objective, Assessment, and Plan)–format medical notes from each AI scribe. A verbatim transcript, detailed case notes, and physician-written medical notes for each audio file served as a benchmark for comparison against the AI-generated outputs. Applicable items were rated on a 3-point Likert scale (1=poor, 2=good, 3=excellent). Additional insights were gathered from clinical experts, vendor questionnaires, and public resources to support usability, effectiveness, and quality findings. ResultsIn total, 6 AI scribes were evaluated, with notable performance differences. Most AI scribes could be accessed via various platforms (n=4) and launched within common electronic medical records, though data exchange capabilities were limited. Nearly all AI scribes generated SOAP-format notes in approximately 1 minute for a 15-minute standardized encounter (n=5), though documentation time increased with encounter length and topic complexity. While all AI scribes produced good to excellent quality medical notes, none were consistently error-free. Common errors included deletion, omission, and SOAP structure errors. Factors such as extraneous conversations and multiple speakers impacted the accuracy of both the transcript and medical note, with some AI scribes producing excellent notes despite minor transcript issues and vice versa. Limitations in usability, technical performance, and accuracy suggest areas for improvement to fully realize AI scribes’ potential in reducing administrative burden for PCPs. ConclusionsThis study offers one of the first systematic evaluations of the usability, effectiveness, and accuracy of a suite of AI scribes currently used in primary care, providing benchmark data for further research, policy, and practice. While AI scribes show promise in reducing documentation burdens, improvements and ongoing evaluations are essential to ensure safe and effective use. Future studies should assess AI scribe performance in real-world settings across diverse populations to support equitable and reliable applications.https://humanfactors.jmir.org/2025/1/e71434
spellingShingle	Emily Ha Isabelle Choon-Kon-Yune LaShawn Murray Siying Luan Enid Montague Onil Bhattacharyya Payal Agarwal Evaluating the Usability, Technical Performance, and Accuracy of Artificial Intelligence Scribes for Primary Care: Competitive Analysis JMIR Human Factors
title	Evaluating the Usability, Technical Performance, and Accuracy of Artificial Intelligence Scribes for Primary Care: Competitive Analysis
title_full	Evaluating the Usability, Technical Performance, and Accuracy of Artificial Intelligence Scribes for Primary Care: Competitive Analysis
title_fullStr	Evaluating the Usability, Technical Performance, and Accuracy of Artificial Intelligence Scribes for Primary Care: Competitive Analysis
title_full_unstemmed	Evaluating the Usability, Technical Performance, and Accuracy of Artificial Intelligence Scribes for Primary Care: Competitive Analysis
title_short	Evaluating the Usability, Technical Performance, and Accuracy of Artificial Intelligence Scribes for Primary Care: Competitive Analysis
title_sort	evaluating the usability technical performance and accuracy of artificial intelligence scribes for primary care competitive analysis
url	https://humanfactors.jmir.org/2025/1/e71434
work_keys_str_mv	AT emilyha evaluatingtheusabilitytechnicalperformanceandaccuracyofartificialintelligencescribesforprimarycarecompetitiveanalysis AT isabellechoonkonyune evaluatingtheusabilitytechnicalperformanceandaccuracyofartificialintelligencescribesforprimarycarecompetitiveanalysis AT lashawnmurray evaluatingtheusabilitytechnicalperformanceandaccuracyofartificialintelligencescribesforprimarycarecompetitiveanalysis AT siyingluan evaluatingtheusabilitytechnicalperformanceandaccuracyofartificialintelligencescribesforprimarycarecompetitiveanalysis AT enidmontague evaluatingtheusabilitytechnicalperformanceandaccuracyofartificialintelligencescribesforprimarycarecompetitiveanalysis AT onilbhattacharyya evaluatingtheusabilitytechnicalperformanceandaccuracyofartificialintelligencescribesforprimarycarecompetitiveanalysis AT payalagarwal evaluatingtheusabilitytechnicalperformanceandaccuracyofartificialintelligencescribesforprimarycarecompetitiveanalysis

Evaluating the Usability, Technical Performance, and Accuracy of Artificial Intelligence Scribes for Primary Care: Competitive Analysis

Similar Items