The “LLM World of Words” English free association norms generated by large language models

Abstract Free associations have been extensively used in psychology and linguistics for studying how conceptual knowledge is organized. Recently, the potential of applying a similar approach for investigating the knowledge encoded in LLMs has emerged, specifically as a method for investigating LLM b...

Full description

Saved in:
Bibliographic Details
Main Authors: Katherine Abramski, Riccardo Improta, Giulio Rossetti, Massimo Stella
Format: Article
Language:English
Published: Nature Portfolio 2025-05-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-025-05156-9
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849470226354143232
author Katherine Abramski
Riccardo Improta
Giulio Rossetti
Massimo Stella
author_facet Katherine Abramski
Riccardo Improta
Giulio Rossetti
Massimo Stella
author_sort Katherine Abramski
collection DOAJ
description Abstract Free associations have been extensively used in psychology and linguistics for studying how conceptual knowledge is organized. Recently, the potential of applying a similar approach for investigating the knowledge encoded in LLMs has emerged, specifically as a method for investigating LLM biases. However, the absence of large-scale LLM-generated free association norms that are comparable with human-generated norms is an obstacle to this research direction. To address this, we create a new dataset of LLM-generated free association norms modeled after the “Small World of Words”(SWOW) human-generated norms with nearly 12,000 cue words. We prompt three LLMs (Mistral, Llama3, and Haiku) with the same cues as those in SWOW to generate three novel comparable datasets, the “LLM World of Words” (LWOW). From the datasets, we construct network models of semantic memory that represent the conceptual knowledge possessed by humans and LLMs. We validate the datasets by simulating semantic priming within the network models, and we briefly discuss how the datasets can be used for investigating implicit biases in humans and LLMs.
format Article
id doaj-art-4e5f26d0c2554d10aa5d1208650d1ded
institution Kabale University
issn 2052-4463
language English
publishDate 2025-05-01
publisher Nature Portfolio
record_format Article
series Scientific Data
spelling doaj-art-4e5f26d0c2554d10aa5d1208650d1ded2025-08-20T03:25:12ZengNature PortfolioScientific Data2052-44632025-05-011211910.1038/s41597-025-05156-9The “LLM World of Words” English free association norms generated by large language modelsKatherine Abramski0Riccardo Improta1Giulio Rossetti2Massimo Stella3University of Pisa, Department of Computer ScienceUniversity of Trento, Department of Psychology and Cognitive ScienceNational Research Council of Italy, Institute of Information Science and TechnologiesUniversity of Trento, Department of Psychology and Cognitive ScienceAbstract Free associations have been extensively used in psychology and linguistics for studying how conceptual knowledge is organized. Recently, the potential of applying a similar approach for investigating the knowledge encoded in LLMs has emerged, specifically as a method for investigating LLM biases. However, the absence of large-scale LLM-generated free association norms that are comparable with human-generated norms is an obstacle to this research direction. To address this, we create a new dataset of LLM-generated free association norms modeled after the “Small World of Words”(SWOW) human-generated norms with nearly 12,000 cue words. We prompt three LLMs (Mistral, Llama3, and Haiku) with the same cues as those in SWOW to generate three novel comparable datasets, the “LLM World of Words” (LWOW). From the datasets, we construct network models of semantic memory that represent the conceptual knowledge possessed by humans and LLMs. We validate the datasets by simulating semantic priming within the network models, and we briefly discuss how the datasets can be used for investigating implicit biases in humans and LLMs.https://doi.org/10.1038/s41597-025-05156-9
spellingShingle Katherine Abramski
Riccardo Improta
Giulio Rossetti
Massimo Stella
The “LLM World of Words” English free association norms generated by large language models
Scientific Data
title The “LLM World of Words” English free association norms generated by large language models
title_full The “LLM World of Words” English free association norms generated by large language models
title_fullStr The “LLM World of Words” English free association norms generated by large language models
title_full_unstemmed The “LLM World of Words” English free association norms generated by large language models
title_short The “LLM World of Words” English free association norms generated by large language models
title_sort llm world of words english free association norms generated by large language models
url https://doi.org/10.1038/s41597-025-05156-9
work_keys_str_mv AT katherineabramski thellmworldofwordsenglishfreeassociationnormsgeneratedbylargelanguagemodels
AT riccardoimprota thellmworldofwordsenglishfreeassociationnormsgeneratedbylargelanguagemodels
AT giuliorossetti thellmworldofwordsenglishfreeassociationnormsgeneratedbylargelanguagemodels
AT massimostella thellmworldofwordsenglishfreeassociationnormsgeneratedbylargelanguagemodels
AT katherineabramski llmworldofwordsenglishfreeassociationnormsgeneratedbylargelanguagemodels
AT riccardoimprota llmworldofwordsenglishfreeassociationnormsgeneratedbylargelanguagemodels
AT giuliorossetti llmworldofwordsenglishfreeassociationnormsgeneratedbylargelanguagemodels
AT massimostella llmworldofwordsenglishfreeassociationnormsgeneratedbylargelanguagemodels