Capacidade do ChatGPT, Deepseek e Gemini em prever as principais interações medicamentosas potenciais em adultos internados em Unidade de Terapia Intensiva

Objective: evaluate the ability of the ChatGPT v.3.5, DeepSeek v-3, and Gemini 2.0 flash to accurately predict major potential drug interactions (DIs) in critically ill patients. Methods: A list of 20 DIs was compiled from previously published literature. The Micromedex and Drugs.com databases were...

Full description

Saved in:
Bibliographic Details
Main Author: Tácio Mendonça LIMA
Format: Article
Language:English
Published: Sociedade Brasileira de Farmácia Hospitalar e Serviços de Saúde 2025-03-01
Series:Revista Brasileira de Farmácia Hospitalar e Serviços de Saúde
Online Access:https://jhphs.org/sbrafh/article/view/1262
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Objective: evaluate the ability of the ChatGPT v.3.5, DeepSeek v-3, and Gemini 2.0 flash to accurately predict major potential drug interactions (DIs) in critically ill patients. Methods: A list of 20 DIs was compiled from previously published literature. The Micromedex and Drugs.com databases were used as references. A specific prompt was designed to interact with the tools. The generated responses were stored for subsequent analysis by a pharmacist. Specificity, sensitivity, negative predictive value (NPV), positive predictive value (PPV), accuracy, and agreement were calculated for each tool based on the responses regarding DDI severity, which were categorized into five levels: contraindicated, major, moderate, minor, and no interaction. Additionally, the responses related to the mechanism of action and recommended management for each DDI were categorized as “adequate and accurate,” “adequate but inaccurate”, and “inadequate.” Results: When the Micromedex was used as a reference, ChatGPT performed better, achieving an accuracy rate of 75%, while DeepSeek and Gemini scored 70% and 65%, respectively. Overall, there was an improvement in the performance of all tools when Drugs.com was used as the reference, with accuracy rates of 80% for DeepSeek and 75% for both ChatGPT and Gemini. However, the agreement on the severity of DDIs between the tools and references was 0.354 (weak) for Drugs.com and 0.410 (moderate) for Micromedex. In general, two “inadequate” responses and 10 “adequate but inaccurate” responses regarding the mechanism of action and recommended management were observed when compared with Micromedex (14 DDIs analyzed), while eight “inadequate” responses and 21 “adequate but inaccurate” responses were found when compared with Drugs.com (17 DDIs analyzed). Conclusion: The tools analyzed show promise to assist healthcare professionals in predicting DDI in adults hospitalized in the intensive care unit (ICU). However, their use should be approached with caution, as they may generate incorrect/inaccurate information. Additional advancements are required to ensure their reliable application in clinical practice.
ISSN:2179-5924
2316-7750