The Effects of (Dis)similarities Between the Creator and the Assessor on Assessing Creativity: A Comparison of Humans and LLMs

Current research predominantly involves human subjects to evaluate AI creativity. In this explorative study, we questioned the validity of this practice and examined how creator–assessor (dis)similarity—namely to what extent the creator and the assessor were alike—along two dimensions of culture (We...

Full description

Saved in:

Bibliographic Details
Main Authors:	Martin op ‘t Hof, Ke Hu, Song Tong, Honghong Bai
Format:	Article
Language:	English
Published:	MDPI AG 2025-07-01
Series:	Journal of Intelligence
Subjects:	creativity assessment large language models cross-cultural comparison
Online Access:	https://www.mdpi.com/2079-3200/13/7/80
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849732835778232320
author	Martin op ‘t Hof Ke Hu Song Tong Honghong Bai
author_facet	Martin op ‘t Hof Ke Hu Song Tong Honghong Bai
author_sort	Martin op ‘t Hof
collection	DOAJ
description	Current research predominantly involves human subjects to evaluate AI creativity. In this explorative study, we questioned the validity of this practice and examined how creator–assessor (dis)similarity—namely to what extent the creator and the assessor were alike—along two dimensions of culture (Western and English-speaking vs. Eastern and Chinese-speaking) and agency (human vs. AI) influences the assessment of creativity. We first asked four types of subjects to create stories, including Eastern participants (university students from China), Eastern AI (Kimi from China), Western participants (university students from The Netherlands), and Western AI (ChatGPT 3.5 from the US). Both Eastern participants and AI created stories in Chinese, which were then translated into English, while both Western participants and AI created stories in English, which were then translated into Chinese. A subset of these stories (2 creative and 2 uncreative per creator type, in total 16 stories) was then randomly selected as assessment materials. Adopting a within-subject design, we then asked new subjects from the same four types (<i>n</i> = 120, 30 per type) to assess these stories on creativity, originality, and appropriateness. The results confirmed that similarities in both dimensions of culture and agency influence the assessment of originality and appropriateness. As for the agency dimension, human assessors preferred human-created stories for originality, while AI assessors showed no preference. Conversely, AI assessors rated AI-generated stories higher in appropriateness, whereas human assessors showed no preference. Culturally, both Eastern and Western assessors favored Eastern-created stories in originality. And as for appropriateness, the assessors always preferred stories from the creators with the same cultural backgrounds. The present study is significant in attempting to ask an often-overlooked question and provides the first empirical evidence to underscore the need for more discussion on using humans to judge AI agents’ creativity or the other way around.
format	Article
id	doaj-art-bc9338f8ae8b4f3f915e5c865bdec7cd
institution	DOAJ
issn	2079-3200
language	English
publishDate	2025-07-01
publisher	MDPI AG
record_format	Article
series	Journal of Intelligence
spelling	doaj-art-bc9338f8ae8b4f3f915e5c865bdec7cd2025-08-20T03:08:12ZengMDPI AGJournal of Intelligence2079-32002025-07-011378010.3390/jintelligence13070080The Effects of (Dis)similarities Between the Creator and the Assessor on Assessing Creativity: A Comparison of Humans and LLMsMartin op ‘t Hof0Ke Hu1Song Tong2Honghong Bai3School of Articifical Intelligence, Radboud University, 6500 HE Nijmegen, The NetherlandsDepartment of Psychological and Cognitive Sciences, Tsinghua University, Beijing 100084, ChinaDepartment of Psychology, Faculty of Arts and Sciences, Beijing Normal University at Zhuhai, Zhuhai 519087, ChinaBehavioural Science Institute & Orthopedagogics: Learning and Development, Radboud University, 6500 HE Nijmegen, The NetherlandsCurrent research predominantly involves human subjects to evaluate AI creativity. In this explorative study, we questioned the validity of this practice and examined how creator–assessor (dis)similarity—namely to what extent the creator and the assessor were alike—along two dimensions of culture (Western and English-speaking vs. Eastern and Chinese-speaking) and agency (human vs. AI) influences the assessment of creativity. We first asked four types of subjects to create stories, including Eastern participants (university students from China), Eastern AI (Kimi from China), Western participants (university students from The Netherlands), and Western AI (ChatGPT 3.5 from the US). Both Eastern participants and AI created stories in Chinese, which were then translated into English, while both Western participants and AI created stories in English, which were then translated into Chinese. A subset of these stories (2 creative and 2 uncreative per creator type, in total 16 stories) was then randomly selected as assessment materials. Adopting a within-subject design, we then asked new subjects from the same four types (<i>n</i> = 120, 30 per type) to assess these stories on creativity, originality, and appropriateness. The results confirmed that similarities in both dimensions of culture and agency influence the assessment of originality and appropriateness. As for the agency dimension, human assessors preferred human-created stories for originality, while AI assessors showed no preference. Conversely, AI assessors rated AI-generated stories higher in appropriateness, whereas human assessors showed no preference. Culturally, both Eastern and Western assessors favored Eastern-created stories in originality. And as for appropriateness, the assessors always preferred stories from the creators with the same cultural backgrounds. The present study is significant in attempting to ask an often-overlooked question and provides the first empirical evidence to underscore the need for more discussion on using humans to judge AI agents’ creativity or the other way around.https://www.mdpi.com/2079-3200/13/7/80creativity assessmentlarge language modelscross-cultural comparison
spellingShingle	Martin op ‘t Hof Ke Hu Song Tong Honghong Bai The Effects of (Dis)similarities Between the Creator and the Assessor on Assessing Creativity: A Comparison of Humans and LLMs Journal of Intelligence creativity assessment large language models cross-cultural comparison
title	The Effects of (Dis)similarities Between the Creator and the Assessor on Assessing Creativity: A Comparison of Humans and LLMs
title_full	The Effects of (Dis)similarities Between the Creator and the Assessor on Assessing Creativity: A Comparison of Humans and LLMs
title_fullStr	The Effects of (Dis)similarities Between the Creator and the Assessor on Assessing Creativity: A Comparison of Humans and LLMs
title_full_unstemmed	The Effects of (Dis)similarities Between the Creator and the Assessor on Assessing Creativity: A Comparison of Humans and LLMs
title_short	The Effects of (Dis)similarities Between the Creator and the Assessor on Assessing Creativity: A Comparison of Humans and LLMs
title_sort	effects of dis similarities between the creator and the assessor on assessing creativity a comparison of humans and llms
topic	creativity assessment large language models cross-cultural comparison
url	https://www.mdpi.com/2079-3200/13/7/80
work_keys_str_mv	AT martinopthof theeffectsofdissimilaritiesbetweenthecreatorandtheassessoronassessingcreativityacomparisonofhumansandllms AT kehu theeffectsofdissimilaritiesbetweenthecreatorandtheassessoronassessingcreativityacomparisonofhumansandllms AT songtong theeffectsofdissimilaritiesbetweenthecreatorandtheassessoronassessingcreativityacomparisonofhumansandllms AT honghongbai theeffectsofdissimilaritiesbetweenthecreatorandtheassessoronassessingcreativityacomparisonofhumansandllms AT martinopthof effectsofdissimilaritiesbetweenthecreatorandtheassessoronassessingcreativityacomparisonofhumansandllms AT kehu effectsofdissimilaritiesbetweenthecreatorandtheassessoronassessingcreativityacomparisonofhumansandllms AT songtong effectsofdissimilaritiesbetweenthecreatorandtheassessoronassessingcreativityacomparisonofhumansandllms AT honghongbai effectsofdissimilaritiesbetweenthecreatorandtheassessoronassessingcreativityacomparisonofhumansandllms

The Effects of (Dis)similarities Between the Creator and the Assessor on Assessing Creativity: A Comparison of Humans and LLMs

Similar Items