Robustness of large language models in moral judgements
With the advent of large language models (LLMs), there has been a growing interest in analysing the preferences encoded in LLMs in the context of morality. Recent work has tested LLMs on various moral judgement tasks and drawn conclusions regarding the alignment between LLMs and humans. The present...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
The Royal Society
2025-04-01
|
| Series: | Royal Society Open Science |
| Subjects: | |
| Online Access: | https://royalsocietypublishing.org/doi/10.1098/rsos.241229 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850155279926165504 |
|---|---|
| author | Soyoung Oh Vera Demberg |
| author_facet | Soyoung Oh Vera Demberg |
| author_sort | Soyoung Oh |
| collection | DOAJ |
| description | With the advent of large language models (LLMs), there has been a growing interest in analysing the preferences encoded in LLMs in the context of morality. Recent work has tested LLMs on various moral judgement tasks and drawn conclusions regarding the alignment between LLMs and humans. The present contribution critically assesses the validity of the method and results employed in previous work for eliciting moral judgements from LLMs. We find that previous results are confounded by biases in the presentation of the options in moral judgement tasks and that LLM responses are highly sensitive to prompt formulation variants as simple as changing ‘Case 1’ and ‘Case 2’ to ‘(A)’ and ‘(B)’. Our results hence indicate that previous conclusions on moral judgements of LLMs cannot be upheld. We make recommendations for more sound methodological setups for future studies. |
| format | Article |
| id | doaj-art-750fc51a839440a4bfcdea2dccbc5c26 |
| institution | OA Journals |
| issn | 2054-5703 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | The Royal Society |
| record_format | Article |
| series | Royal Society Open Science |
| spelling | doaj-art-750fc51a839440a4bfcdea2dccbc5c262025-08-20T02:24:58ZengThe Royal SocietyRoyal Society Open Science2054-57032025-04-0112410.1098/rsos.241229Robustness of large language models in moral judgementsSoyoung Oh0Vera Demberg1Department of Computer Science, Language Science and Technology, Saarland University, Saarbrücken, GermanyDepartment of Computer Science, Language Science and Technology, Saarland University, Saarbrücken, GermanyWith the advent of large language models (LLMs), there has been a growing interest in analysing the preferences encoded in LLMs in the context of morality. Recent work has tested LLMs on various moral judgement tasks and drawn conclusions regarding the alignment between LLMs and humans. The present contribution critically assesses the validity of the method and results employed in previous work for eliciting moral judgements from LLMs. We find that previous results are confounded by biases in the presentation of the options in moral judgement tasks and that LLM responses are highly sensitive to prompt formulation variants as simple as changing ‘Case 1’ and ‘Case 2’ to ‘(A)’ and ‘(B)’. Our results hence indicate that previous conclusions on moral judgements of LLMs cannot be upheld. We make recommendations for more sound methodological setups for future studies.https://royalsocietypublishing.org/doi/10.1098/rsos.241229large language modelmoral reasoningrobustness |
| spellingShingle | Soyoung Oh Vera Demberg Robustness of large language models in moral judgements Royal Society Open Science large language model moral reasoning robustness |
| title | Robustness of large language models in moral judgements |
| title_full | Robustness of large language models in moral judgements |
| title_fullStr | Robustness of large language models in moral judgements |
| title_full_unstemmed | Robustness of large language models in moral judgements |
| title_short | Robustness of large language models in moral judgements |
| title_sort | robustness of large language models in moral judgements |
| topic | large language model moral reasoning robustness |
| url | https://royalsocietypublishing.org/doi/10.1098/rsos.241229 |
| work_keys_str_mv | AT soyoungoh robustnessoflargelanguagemodelsinmoraljudgements AT verademberg robustnessoflargelanguagemodelsinmoraljudgements |