Medical triage as an AI ethics benchmark

Abstract We present the TRIAGE benchmark, a novel machine ethics benchmark designed to evaluate the ethical decision-making abilities of large language models (LLMs) in mass casualty scenarios. TRIAGE uses medical dilemmas created by healthcare professionals to evaluate the ethical decision-making o...

Full description

Saved in:
Bibliographic Details
Main Authors: Nathalie Maria Kirch, Konstantin Hebenstreit, Matthias Samwald
Format: Article
Language:English
Published: Nature Portfolio 2025-08-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-025-16716-9
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849226433480622080
author Nathalie Maria Kirch
Konstantin Hebenstreit
Matthias Samwald
author_facet Nathalie Maria Kirch
Konstantin Hebenstreit
Matthias Samwald
author_sort Nathalie Maria Kirch
collection DOAJ
description Abstract We present the TRIAGE benchmark, a novel machine ethics benchmark designed to evaluate the ethical decision-making abilities of large language models (LLMs) in mass casualty scenarios. TRIAGE uses medical dilemmas created by healthcare professionals to evaluate the ethical decision-making of AI systems in real-world, high-stakes scenarios. We evaluated six major LLMs on TRIAGE, examining how different ethical and adversarial prompts influence model behavior. Our results show that most models consistently outperformed random guessing, with open source models making more serious ethical errors than proprietary models. Providing guiding ethical principles to LLMs degraded performance on TRIAGE, which stand in contrast to results from other machine ethics benchmarks where explicating ethical principles improved results. Adversarial prompts significantly decreased accuracy. By demonstrating the influence of context and ethical framing on the performance of LLMs, we provide critical insights into the current capabilities and limitations of AI in high-stakes ethical decision making in medicine.
format Article
id doaj-art-8741fff6fb634849b36dd36ca5d12e23
institution Kabale University
issn 2045-2322
language English
publishDate 2025-08-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-8741fff6fb634849b36dd36ca5d12e232025-08-24T11:19:17ZengNature PortfolioScientific Reports2045-23222025-08-011511810.1038/s41598-025-16716-9Medical triage as an AI ethics benchmarkNathalie Maria Kirch0Konstantin Hebenstreit1Matthias Samwald2Institute of Artificial Intelligence, Medical University of ViennaInstitute of Artificial Intelligence, Medical University of ViennaInstitute of Artificial Intelligence, Medical University of ViennaAbstract We present the TRIAGE benchmark, a novel machine ethics benchmark designed to evaluate the ethical decision-making abilities of large language models (LLMs) in mass casualty scenarios. TRIAGE uses medical dilemmas created by healthcare professionals to evaluate the ethical decision-making of AI systems in real-world, high-stakes scenarios. We evaluated six major LLMs on TRIAGE, examining how different ethical and adversarial prompts influence model behavior. Our results show that most models consistently outperformed random guessing, with open source models making more serious ethical errors than proprietary models. Providing guiding ethical principles to LLMs degraded performance on TRIAGE, which stand in contrast to results from other machine ethics benchmarks where explicating ethical principles improved results. Adversarial prompts significantly decreased accuracy. By demonstrating the influence of context and ethical framing on the performance of LLMs, we provide critical insights into the current capabilities and limitations of AI in high-stakes ethical decision making in medicine.https://doi.org/10.1038/s41598-025-16716-9
spellingShingle Nathalie Maria Kirch
Konstantin Hebenstreit
Matthias Samwald
Medical triage as an AI ethics benchmark
Scientific Reports
title Medical triage as an AI ethics benchmark
title_full Medical triage as an AI ethics benchmark
title_fullStr Medical triage as an AI ethics benchmark
title_full_unstemmed Medical triage as an AI ethics benchmark
title_short Medical triage as an AI ethics benchmark
title_sort medical triage as an ai ethics benchmark
url https://doi.org/10.1038/s41598-025-16716-9
work_keys_str_mv AT nathaliemariakirch medicaltriageasanaiethicsbenchmark
AT konstantinhebenstreit medicaltriageasanaiethicsbenchmark
AT matthiassamwald medicaltriageasanaiethicsbenchmark