Robustness of large language models in moral judgements

With the advent of large language models (LLMs), there has been a growing interest in analysing the preferences encoded in LLMs in the context of morality. Recent work has tested LLMs on various moral judgement tasks and drawn conclusions regarding the alignment between LLMs and humans. The present...

Full description

Saved in:
Bibliographic Details
Main Authors: Soyoung Oh, Vera Demberg
Format: Article
Language:English
Published: The Royal Society 2025-04-01
Series:Royal Society Open Science
Subjects:
Online Access:https://royalsocietypublishing.org/doi/10.1098/rsos.241229
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850155279926165504
author Soyoung Oh
Vera Demberg
author_facet Soyoung Oh
Vera Demberg
author_sort Soyoung Oh
collection DOAJ
description With the advent of large language models (LLMs), there has been a growing interest in analysing the preferences encoded in LLMs in the context of morality. Recent work has tested LLMs on various moral judgement tasks and drawn conclusions regarding the alignment between LLMs and humans. The present contribution critically assesses the validity of the method and results employed in previous work for eliciting moral judgements from LLMs. We find that previous results are confounded by biases in the presentation of the options in moral judgement tasks and that LLM responses are highly sensitive to prompt formulation variants as simple as changing ‘Case 1’ and ‘Case 2’ to ‘(A)’ and ‘(B)’. Our results hence indicate that previous conclusions on moral judgements of LLMs cannot be upheld. We make recommendations for more sound methodological setups for future studies.
format Article
id doaj-art-750fc51a839440a4bfcdea2dccbc5c26
institution OA Journals
issn 2054-5703
language English
publishDate 2025-04-01
publisher The Royal Society
record_format Article
series Royal Society Open Science
spelling doaj-art-750fc51a839440a4bfcdea2dccbc5c262025-08-20T02:24:58ZengThe Royal SocietyRoyal Society Open Science2054-57032025-04-0112410.1098/rsos.241229Robustness of large language models in moral judgementsSoyoung Oh0Vera Demberg1Department of Computer Science, Language Science and Technology, Saarland University, Saarbrücken, GermanyDepartment of Computer Science, Language Science and Technology, Saarland University, Saarbrücken, GermanyWith the advent of large language models (LLMs), there has been a growing interest in analysing the preferences encoded in LLMs in the context of morality. Recent work has tested LLMs on various moral judgement tasks and drawn conclusions regarding the alignment between LLMs and humans. The present contribution critically assesses the validity of the method and results employed in previous work for eliciting moral judgements from LLMs. We find that previous results are confounded by biases in the presentation of the options in moral judgement tasks and that LLM responses are highly sensitive to prompt formulation variants as simple as changing ‘Case 1’ and ‘Case 2’ to ‘(A)’ and ‘(B)’. Our results hence indicate that previous conclusions on moral judgements of LLMs cannot be upheld. We make recommendations for more sound methodological setups for future studies.https://royalsocietypublishing.org/doi/10.1098/rsos.241229large language modelmoral reasoningrobustness
spellingShingle Soyoung Oh
Vera Demberg
Robustness of large language models in moral judgements
Royal Society Open Science
large language model
moral reasoning
robustness
title Robustness of large language models in moral judgements
title_full Robustness of large language models in moral judgements
title_fullStr Robustness of large language models in moral judgements
title_full_unstemmed Robustness of large language models in moral judgements
title_short Robustness of large language models in moral judgements
title_sort robustness of large language models in moral judgements
topic large language model
moral reasoning
robustness
url https://royalsocietypublishing.org/doi/10.1098/rsos.241229
work_keys_str_mv AT soyoungoh robustnessoflargelanguagemodelsinmoraljudgements
AT verademberg robustnessoflargelanguagemodelsinmoraljudgements