Robustness of large language models in moral judgements

With the advent of large language models (LLMs), there has been a growing interest in analysing the preferences encoded in LLMs in the context of morality. Recent work has tested LLMs on various moral judgement tasks and drawn conclusions regarding the alignment between LLMs and humans. The present...

Full description

Saved in:

Bibliographic Details
Main Authors:	Soyoung Oh, Vera Demberg
Format:	Article
Language:	English
Published:	The Royal Society 2025-04-01
Series:	Royal Society Open Science
Subjects:	large language model moral reasoning robustness
Online Access:	https://royalsocietypublishing.org/doi/10.1098/rsos.241229
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850155279926165504
author	Soyoung Oh Vera Demberg
author_facet	Soyoung Oh Vera Demberg
author_sort	Soyoung Oh
collection	DOAJ
description	With the advent of large language models (LLMs), there has been a growing interest in analysing the preferences encoded in LLMs in the context of morality. Recent work has tested LLMs on various moral judgement tasks and drawn conclusions regarding the alignment between LLMs and humans. The present contribution critically assesses the validity of the method and results employed in previous work for eliciting moral judgements from LLMs. We find that previous results are confounded by biases in the presentation of the options in moral judgement tasks and that LLM responses are highly sensitive to prompt formulation variants as simple as changing ‘Case 1’ and ‘Case 2’ to ‘(A)’ and ‘(B)’. Our results hence indicate that previous conclusions on moral judgements of LLMs cannot be upheld. We make recommendations for more sound methodological setups for future studies.
format	Article
id	doaj-art-750fc51a839440a4bfcdea2dccbc5c26
institution	OA Journals
issn	2054-5703
language	English
publishDate	2025-04-01
publisher	The Royal Society
record_format	Article
series	Royal Society Open Science
spelling	doaj-art-750fc51a839440a4bfcdea2dccbc5c262025-08-20T02:24:58ZengThe Royal SocietyRoyal Society Open Science2054-57032025-04-0112410.1098/rsos.241229Robustness of large language models in moral judgementsSoyoung Oh0Vera Demberg1Department of Computer Science, Language Science and Technology, Saarland University, Saarbrücken, GermanyDepartment of Computer Science, Language Science and Technology, Saarland University, Saarbrücken, GermanyWith the advent of large language models (LLMs), there has been a growing interest in analysing the preferences encoded in LLMs in the context of morality. Recent work has tested LLMs on various moral judgement tasks and drawn conclusions regarding the alignment between LLMs and humans. The present contribution critically assesses the validity of the method and results employed in previous work for eliciting moral judgements from LLMs. We find that previous results are confounded by biases in the presentation of the options in moral judgement tasks and that LLM responses are highly sensitive to prompt formulation variants as simple as changing ‘Case 1’ and ‘Case 2’ to ‘(A)’ and ‘(B)’. Our results hence indicate that previous conclusions on moral judgements of LLMs cannot be upheld. We make recommendations for more sound methodological setups for future studies.https://royalsocietypublishing.org/doi/10.1098/rsos.241229large language modelmoral reasoningrobustness
spellingShingle	Soyoung Oh Vera Demberg Robustness of large language models in moral judgements Royal Society Open Science large language model moral reasoning robustness
title	Robustness of large language models in moral judgements
title_full	Robustness of large language models in moral judgements
title_fullStr	Robustness of large language models in moral judgements
title_full_unstemmed	Robustness of large language models in moral judgements
title_short	Robustness of large language models in moral judgements
title_sort	robustness of large language models in moral judgements
topic	large language model moral reasoning robustness
url	https://royalsocietypublishing.org/doi/10.1098/rsos.241229
work_keys_str_mv	AT soyoungoh robustnessoflargelanguagemodelsinmoraljudgements AT verademberg robustnessoflargelanguagemodelsinmoraljudgements

Robustness of large language models in moral judgements

Similar Items