Generative Artificial Intelligence and Risk Appetite in Medical Decisions in Rheumatoid Arthritis

With Generative AI (GenAI) entering medicine, understanding its decision-making under uncertainty is important. It is well known that human subjective risk appetite influences medical decisions. This study investigated whether the risk appetite of GenAI can be evaluated and if established human risk...

Full description

Saved in:
Bibliographic Details
Main Authors: Florian Berghea, Dan Andras, Elena Camelia Berghea
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/10/5700
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:With Generative AI (GenAI) entering medicine, understanding its decision-making under uncertainty is important. It is well known that human subjective risk appetite influences medical decisions. This study investigated whether the risk appetite of GenAI can be evaluated and if established human risk assessment tools are applicable for this purpose in a medical context. Five GenAI systems (ChatGPT 4.5, Gemini 2.0, Qwen 2.5 MAX, DeepSeek-V3, and Perplexity) were evaluated using Rheumatoid Arthritis (RA) clinical scenarios. We employed two methods adapted from human risk assessment: the General Risk Propensity Scale (GRiPS) and the Time Trade-Off (TTO) technique. Queries involving RA cases with varying prognoses and hypothetical treatment choices were posed repeatedly to assess risk profiles and response consistency. All GenAIs consistently identified the same RA cases for the best and worst prognoses. However, the two risk assessment methodologies yielded varied results. The adapted GRiPS showed significant differences in general risk propensity among GenAIs (ChatGPT being the least risk-averse and Qwen/DeepSeek the most), though these differences diminished in specific prognostic contexts. Conversely, the TTO method indicated a strong general risk aversion (unwillingness to trade lifespan for pain relief) across systems yet revealed Perplexity as significantly more risk-tolerant than Gemini. The variability in risk profiles obtained using the GRiPS versus the TTO for the same AI systems raises questions about tool applicability. This discrepancy suggests that these human-centric instruments may not adequately or consistently capture the nuances of risk processing in Artificial Intelligence. The findings imply that current tools might be insufficient, highlighting the need for methodologies specifically tailored for evaluating AI decision-making under medical uncertainty.
ISSN:2076-3417