Answering real-world clinical questions using large language model, retrieval-augmented generation, and agentic systems

Objective The practice of evidence-based medicine can be challenging when relevant data are lacking or difficult to contextualize for a specific patient. Large language models (LLMs) could potentially address both challenges by summarizing published literature or generating new studies using real-wo...

Full description

Saved in:
Bibliographic Details
Main Authors: Yen Sia Low, Michael L Jackson, Rebecca J Hyde, Robert E Brown, Neil M Sanghavi, Julian D Baldwin, C William Pike, Jananee Muralidharan, Gavin Hui, Natasha Alexander, Hadeel Hassan, Rahul V Nene, Morgan Pike, Courtney J Pokrzywa, Shivam Vedak, Adam Paul Yan, Dong-han Yao, Amy R Zipursky, Christina Dinh, Philip Ballentine, Dan C Derieg, Vladimir Polony, Rehan N Chawdry, Jordan Davies, Brigham B Hyde, Nigam H Shah, Saurabh Gombar
Format: Article
Language:English
Published: SAGE Publishing 2025-06-01
Series:Digital Health
Online Access:https://doi.org/10.1177/20552076251348850
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850212271623503872
author Yen Sia Low
Michael L Jackson
Rebecca J Hyde
Robert E Brown
Neil M Sanghavi
Julian D Baldwin
C William Pike
Jananee Muralidharan
Gavin Hui
Natasha Alexander
Hadeel Hassan
Rahul V Nene
Morgan Pike
Courtney J Pokrzywa
Shivam Vedak
Adam Paul Yan
Dong-han Yao
Amy R Zipursky
Christina Dinh
Philip Ballentine
Dan C Derieg
Vladimir Polony
Rehan N Chawdry
Jordan Davies
Brigham B Hyde
Nigam H Shah
Saurabh Gombar
author_facet Yen Sia Low
Michael L Jackson
Rebecca J Hyde
Robert E Brown
Neil M Sanghavi
Julian D Baldwin
C William Pike
Jananee Muralidharan
Gavin Hui
Natasha Alexander
Hadeel Hassan
Rahul V Nene
Morgan Pike
Courtney J Pokrzywa
Shivam Vedak
Adam Paul Yan
Dong-han Yao
Amy R Zipursky
Christina Dinh
Philip Ballentine
Dan C Derieg
Vladimir Polony
Rehan N Chawdry
Jordan Davies
Brigham B Hyde
Nigam H Shah
Saurabh Gombar
author_sort Yen Sia Low
collection DOAJ
description Objective The practice of evidence-based medicine can be challenging when relevant data are lacking or difficult to contextualize for a specific patient. Large language models (LLMs) could potentially address both challenges by summarizing published literature or generating new studies using real-world data. Materials and Methods We submitted 50 clinical questions to five LLM-based systems: OpenEvidence, which uses an LLM for retrieval-augmented generation (RAG); ChatRWD, which uses an LLM as an interface to a data extraction and analysis pipeline; and three general-purpose LLMs (ChatGPT-4, Claude 3 Opus, Gemini 1.5 Pro). Nine independent physicians evaluated the answers for relevance, quality of supporting evidence, and actionability (i.e., sufficient to justify or change clinical practice). Results General-purpose LLMs rarely produced relevant, evidence-based answers (2–10% of questions). In contrast, RAG-based and agentic LLM systems, respectively, produced relevant, evidence-based answers for 24% (OpenEvidence) to 58% (ChatRWD) of questions. OpenEvidence produced actionable results for 48% of questions with existing evidence, compared to 37% for ChatRWD and <5% for the general-purpose LLMs. ChatRWD provided actionable results for 52% of questions that lacked existing literature compared to <10% for other LLMs. Discussion Special-purpose LLM systems greatly outperformed general-purpose LLMs in producing answers to clinical questions. Retrieval-augmented generation-based LLM (OpenEvidence) performed well when existing data were available, while only the agentic ChatRWD was able to provide actionable answers when preexisting studies were lacking. Conclusion Synergistic systems combining RAG-based evidence summarization and agentic generation of novel evidence could improve the availability of pertinent evidence for patient care.
format Article
id doaj-art-e32cc1f79b9645f3899c1190f2646bf0
institution OA Journals
issn 2055-2076
language English
publishDate 2025-06-01
publisher SAGE Publishing
record_format Article
series Digital Health
spelling doaj-art-e32cc1f79b9645f3899c1190f2646bf02025-08-20T02:09:22ZengSAGE PublishingDigital Health2055-20762025-06-011110.1177/20552076251348850Answering real-world clinical questions using large language model, retrieval-augmented generation, and agentic systemsYen Sia Low0Michael L Jackson1Rebecca J Hyde2Robert E Brown3 Neil M Sanghavi4Julian D Baldwin5C William Pike6Jananee Muralidharan7Gavin Hui8Natasha Alexander9Hadeel Hassan10Rahul V Nene11Morgan Pike12Courtney J Pokrzywa13Shivam Vedak14Adam Paul Yan15Dong-han Yao16Amy R Zipursky17Christina Dinh18Philip Ballentine19Dan C Derieg20Vladimir Polony21Rehan N Chawdry22Jordan Davies23Brigham B Hyde24Nigam H Shah25Saurabh Gombar26 Atropos Health, New York, NY, USA Atropos Health, New York, NY, USA Atropos Health, New York, NY, USA Atropos Health, New York, NY, USA Atropos Health, New York, NY, USA Atropos Health, New York, NY, USA Atropos Health, New York, NY, USA Atropos Health, New York, NY, USA Department of Medicine, University of California, Los Angeles, CA, USA Department of Pediatrics, , Toronto, Ontario, Canada Program in Child Health Evaluative Sciences, Peter Gilgan Centre for Research and Learning, , Toronto, Ontario, Canada Department of Emergency Medicine, University of California, San Diego, CA, USA Department of Emergency Medicine, , Ann Arbor, MI, USA Department of Surgery, , New York, NY, USA Division of Clinical Informatics, , Stanford, CA, USA Department of Pediatrics, , Toronto, Ontario, Canada Department of Emergency Medicine, , Stanford, CA, USA Department of Pediatrics, , Toronto, Ontario, Canada Atropos Health, New York, NY, USA Atropos Health, New York, NY, USA Atropos Health, New York, NY, USA Atropos Health, New York, NY, USA Atropos Health, New York, NY, USA Atropos Health, New York, NY, USA Atropos Health, New York, NY, USA Division of Clinical Informatics, , Stanford, CA, USA Department of Pathology, , Stanford, CA, USAObjective The practice of evidence-based medicine can be challenging when relevant data are lacking or difficult to contextualize for a specific patient. Large language models (LLMs) could potentially address both challenges by summarizing published literature or generating new studies using real-world data. Materials and Methods We submitted 50 clinical questions to five LLM-based systems: OpenEvidence, which uses an LLM for retrieval-augmented generation (RAG); ChatRWD, which uses an LLM as an interface to a data extraction and analysis pipeline; and three general-purpose LLMs (ChatGPT-4, Claude 3 Opus, Gemini 1.5 Pro). Nine independent physicians evaluated the answers for relevance, quality of supporting evidence, and actionability (i.e., sufficient to justify or change clinical practice). Results General-purpose LLMs rarely produced relevant, evidence-based answers (2–10% of questions). In contrast, RAG-based and agentic LLM systems, respectively, produced relevant, evidence-based answers for 24% (OpenEvidence) to 58% (ChatRWD) of questions. OpenEvidence produced actionable results for 48% of questions with existing evidence, compared to 37% for ChatRWD and <5% for the general-purpose LLMs. ChatRWD provided actionable results for 52% of questions that lacked existing literature compared to <10% for other LLMs. Discussion Special-purpose LLM systems greatly outperformed general-purpose LLMs in producing answers to clinical questions. Retrieval-augmented generation-based LLM (OpenEvidence) performed well when existing data were available, while only the agentic ChatRWD was able to provide actionable answers when preexisting studies were lacking. Conclusion Synergistic systems combining RAG-based evidence summarization and agentic generation of novel evidence could improve the availability of pertinent evidence for patient care.https://doi.org/10.1177/20552076251348850
spellingShingle Yen Sia Low
Michael L Jackson
Rebecca J Hyde
Robert E Brown
Neil M Sanghavi
Julian D Baldwin
C William Pike
Jananee Muralidharan
Gavin Hui
Natasha Alexander
Hadeel Hassan
Rahul V Nene
Morgan Pike
Courtney J Pokrzywa
Shivam Vedak
Adam Paul Yan
Dong-han Yao
Amy R Zipursky
Christina Dinh
Philip Ballentine
Dan C Derieg
Vladimir Polony
Rehan N Chawdry
Jordan Davies
Brigham B Hyde
Nigam H Shah
Saurabh Gombar
Answering real-world clinical questions using large language model, retrieval-augmented generation, and agentic systems
Digital Health
title Answering real-world clinical questions using large language model, retrieval-augmented generation, and agentic systems
title_full Answering real-world clinical questions using large language model, retrieval-augmented generation, and agentic systems
title_fullStr Answering real-world clinical questions using large language model, retrieval-augmented generation, and agentic systems
title_full_unstemmed Answering real-world clinical questions using large language model, retrieval-augmented generation, and agentic systems
title_short Answering real-world clinical questions using large language model, retrieval-augmented generation, and agentic systems
title_sort answering real world clinical questions using large language model retrieval augmented generation and agentic systems
url https://doi.org/10.1177/20552076251348850
work_keys_str_mv AT yensialow answeringrealworldclinicalquestionsusinglargelanguagemodelretrievalaugmentedgenerationandagenticsystems
AT michaelljackson answeringrealworldclinicalquestionsusinglargelanguagemodelretrievalaugmentedgenerationandagenticsystems
AT rebeccajhyde answeringrealworldclinicalquestionsusinglargelanguagemodelretrievalaugmentedgenerationandagenticsystems
AT robertebrown answeringrealworldclinicalquestionsusinglargelanguagemodelretrievalaugmentedgenerationandagenticsystems
AT neilmsanghavi answeringrealworldclinicalquestionsusinglargelanguagemodelretrievalaugmentedgenerationandagenticsystems
AT juliandbaldwin answeringrealworldclinicalquestionsusinglargelanguagemodelretrievalaugmentedgenerationandagenticsystems
AT cwilliampike answeringrealworldclinicalquestionsusinglargelanguagemodelretrievalaugmentedgenerationandagenticsystems
AT jananeemuralidharan answeringrealworldclinicalquestionsusinglargelanguagemodelretrievalaugmentedgenerationandagenticsystems
AT gavinhui answeringrealworldclinicalquestionsusinglargelanguagemodelretrievalaugmentedgenerationandagenticsystems
AT natashaalexander answeringrealworldclinicalquestionsusinglargelanguagemodelretrievalaugmentedgenerationandagenticsystems
AT hadeelhassan answeringrealworldclinicalquestionsusinglargelanguagemodelretrievalaugmentedgenerationandagenticsystems
AT rahulvnene answeringrealworldclinicalquestionsusinglargelanguagemodelretrievalaugmentedgenerationandagenticsystems
AT morganpike answeringrealworldclinicalquestionsusinglargelanguagemodelretrievalaugmentedgenerationandagenticsystems
AT courtneyjpokrzywa answeringrealworldclinicalquestionsusinglargelanguagemodelretrievalaugmentedgenerationandagenticsystems
AT shivamvedak answeringrealworldclinicalquestionsusinglargelanguagemodelretrievalaugmentedgenerationandagenticsystems
AT adampaulyan answeringrealworldclinicalquestionsusinglargelanguagemodelretrievalaugmentedgenerationandagenticsystems
AT donghanyao answeringrealworldclinicalquestionsusinglargelanguagemodelretrievalaugmentedgenerationandagenticsystems
AT amyrzipursky answeringrealworldclinicalquestionsusinglargelanguagemodelretrievalaugmentedgenerationandagenticsystems
AT christinadinh answeringrealworldclinicalquestionsusinglargelanguagemodelretrievalaugmentedgenerationandagenticsystems
AT philipballentine answeringrealworldclinicalquestionsusinglargelanguagemodelretrievalaugmentedgenerationandagenticsystems
AT dancderieg answeringrealworldclinicalquestionsusinglargelanguagemodelretrievalaugmentedgenerationandagenticsystems
AT vladimirpolony answeringrealworldclinicalquestionsusinglargelanguagemodelretrievalaugmentedgenerationandagenticsystems
AT rehannchawdry answeringrealworldclinicalquestionsusinglargelanguagemodelretrievalaugmentedgenerationandagenticsystems
AT jordandavies answeringrealworldclinicalquestionsusinglargelanguagemodelretrievalaugmentedgenerationandagenticsystems
AT brighambhyde answeringrealworldclinicalquestionsusinglargelanguagemodelretrievalaugmentedgenerationandagenticsystems
AT nigamhshah answeringrealworldclinicalquestionsusinglargelanguagemodelretrievalaugmentedgenerationandagenticsystems
AT saurabhgombar answeringrealworldclinicalquestionsusinglargelanguagemodelretrievalaugmentedgenerationandagenticsystems