Medical triage as an AI ethics benchmark

Abstract We present the TRIAGE benchmark, a novel machine ethics benchmark designed to evaluate the ethical decision-making abilities of large language models (LLMs) in mass casualty scenarios. TRIAGE uses medical dilemmas created by healthcare professionals to evaluate the ethical decision-making o...

Full description

Saved in:

Bibliographic Details
Main Authors:	Nathalie Maria Kirch, Konstantin Hebenstreit, Matthias Samwald
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-08-01
Series:	Scientific Reports
Online Access:	https://doi.org/10.1038/s41598-025-16716-9
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849226433480622080
author	Nathalie Maria Kirch Konstantin Hebenstreit Matthias Samwald
author_facet	Nathalie Maria Kirch Konstantin Hebenstreit Matthias Samwald
author_sort	Nathalie Maria Kirch
collection	DOAJ
description	Abstract We present the TRIAGE benchmark, a novel machine ethics benchmark designed to evaluate the ethical decision-making abilities of large language models (LLMs) in mass casualty scenarios. TRIAGE uses medical dilemmas created by healthcare professionals to evaluate the ethical decision-making of AI systems in real-world, high-stakes scenarios. We evaluated six major LLMs on TRIAGE, examining how different ethical and adversarial prompts influence model behavior. Our results show that most models consistently outperformed random guessing, with open source models making more serious ethical errors than proprietary models. Providing guiding ethical principles to LLMs degraded performance on TRIAGE, which stand in contrast to results from other machine ethics benchmarks where explicating ethical principles improved results. Adversarial prompts significantly decreased accuracy. By demonstrating the influence of context and ethical framing on the performance of LLMs, we provide critical insights into the current capabilities and limitations of AI in high-stakes ethical decision making in medicine.
format	Article
id	doaj-art-8741fff6fb634849b36dd36ca5d12e23
institution	Kabale University
issn	2045-2322
language	English
publishDate	2025-08-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-8741fff6fb634849b36dd36ca5d12e232025-08-24T11:19:17ZengNature PortfolioScientific Reports2045-23222025-08-011511810.1038/s41598-025-16716-9Medical triage as an AI ethics benchmarkNathalie Maria Kirch0Konstantin Hebenstreit1Matthias Samwald2Institute of Artificial Intelligence, Medical University of ViennaInstitute of Artificial Intelligence, Medical University of ViennaInstitute of Artificial Intelligence, Medical University of ViennaAbstract We present the TRIAGE benchmark, a novel machine ethics benchmark designed to evaluate the ethical decision-making abilities of large language models (LLMs) in mass casualty scenarios. TRIAGE uses medical dilemmas created by healthcare professionals to evaluate the ethical decision-making of AI systems in real-world, high-stakes scenarios. We evaluated six major LLMs on TRIAGE, examining how different ethical and adversarial prompts influence model behavior. Our results show that most models consistently outperformed random guessing, with open source models making more serious ethical errors than proprietary models. Providing guiding ethical principles to LLMs degraded performance on TRIAGE, which stand in contrast to results from other machine ethics benchmarks where explicating ethical principles improved results. Adversarial prompts significantly decreased accuracy. By demonstrating the influence of context and ethical framing on the performance of LLMs, we provide critical insights into the current capabilities and limitations of AI in high-stakes ethical decision making in medicine.https://doi.org/10.1038/s41598-025-16716-9
spellingShingle	Nathalie Maria Kirch Konstantin Hebenstreit Matthias Samwald Medical triage as an AI ethics benchmark Scientific Reports
title	Medical triage as an AI ethics benchmark
title_full	Medical triage as an AI ethics benchmark
title_fullStr	Medical triage as an AI ethics benchmark
title_full_unstemmed	Medical triage as an AI ethics benchmark
title_short	Medical triage as an AI ethics benchmark
title_sort	medical triage as an ai ethics benchmark
url	https://doi.org/10.1038/s41598-025-16716-9
work_keys_str_mv	AT nathaliemariakirch medicaltriageasanaiethicsbenchmark AT konstantinhebenstreit medicaltriageasanaiethicsbenchmark AT matthiassamwald medicaltriageasanaiethicsbenchmark

Medical triage as an AI ethics benchmark

Similar Items