Generative Artificial Intelligence and Risk Appetite in Medical Decisions in Rheumatoid Arthritis
With Generative AI (GenAI) entering medicine, understanding its decision-making under uncertainty is important. It is well known that human subjective risk appetite influences medical decisions. This study investigated whether the risk appetite of GenAI can be evaluated and if established human risk...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/10/5700 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | With Generative AI (GenAI) entering medicine, understanding its decision-making under uncertainty is important. It is well known that human subjective risk appetite influences medical decisions. This study investigated whether the risk appetite of GenAI can be evaluated and if established human risk assessment tools are applicable for this purpose in a medical context. Five GenAI systems (ChatGPT 4.5, Gemini 2.0, Qwen 2.5 MAX, DeepSeek-V3, and Perplexity) were evaluated using Rheumatoid Arthritis (RA) clinical scenarios. We employed two methods adapted from human risk assessment: the General Risk Propensity Scale (GRiPS) and the Time Trade-Off (TTO) technique. Queries involving RA cases with varying prognoses and hypothetical treatment choices were posed repeatedly to assess risk profiles and response consistency. All GenAIs consistently identified the same RA cases for the best and worst prognoses. However, the two risk assessment methodologies yielded varied results. The adapted GRiPS showed significant differences in general risk propensity among GenAIs (ChatGPT being the least risk-averse and Qwen/DeepSeek the most), though these differences diminished in specific prognostic contexts. Conversely, the TTO method indicated a strong general risk aversion (unwillingness to trade lifespan for pain relief) across systems yet revealed Perplexity as significantly more risk-tolerant than Gemini. The variability in risk profiles obtained using the GRiPS versus the TTO for the same AI systems raises questions about tool applicability. This discrepancy suggests that these human-centric instruments may not adequately or consistently capture the nuances of risk processing in Artificial Intelligence. The findings imply that current tools might be insufficient, highlighting the need for methodologies specifically tailored for evaluating AI decision-making under medical uncertainty. |
|---|---|
| ISSN: | 2076-3417 |