The “LLM World of Words” English free association norms generated by large language models
Abstract Free associations have been extensively used in psychology and linguistics for studying how conceptual knowledge is organized. Recently, the potential of applying a similar approach for investigating the knowledge encoded in LLMs has emerged, specifically as a method for investigating LLM b...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-05-01
|
| Series: | Scientific Data |
| Online Access: | https://doi.org/10.1038/s41597-025-05156-9 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849470226354143232 |
|---|---|
| author | Katherine Abramski Riccardo Improta Giulio Rossetti Massimo Stella |
| author_facet | Katherine Abramski Riccardo Improta Giulio Rossetti Massimo Stella |
| author_sort | Katherine Abramski |
| collection | DOAJ |
| description | Abstract Free associations have been extensively used in psychology and linguistics for studying how conceptual knowledge is organized. Recently, the potential of applying a similar approach for investigating the knowledge encoded in LLMs has emerged, specifically as a method for investigating LLM biases. However, the absence of large-scale LLM-generated free association norms that are comparable with human-generated norms is an obstacle to this research direction. To address this, we create a new dataset of LLM-generated free association norms modeled after the “Small World of Words”(SWOW) human-generated norms with nearly 12,000 cue words. We prompt three LLMs (Mistral, Llama3, and Haiku) with the same cues as those in SWOW to generate three novel comparable datasets, the “LLM World of Words” (LWOW). From the datasets, we construct network models of semantic memory that represent the conceptual knowledge possessed by humans and LLMs. We validate the datasets by simulating semantic priming within the network models, and we briefly discuss how the datasets can be used for investigating implicit biases in humans and LLMs. |
| format | Article |
| id | doaj-art-4e5f26d0c2554d10aa5d1208650d1ded |
| institution | Kabale University |
| issn | 2052-4463 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Data |
| spelling | doaj-art-4e5f26d0c2554d10aa5d1208650d1ded2025-08-20T03:25:12ZengNature PortfolioScientific Data2052-44632025-05-011211910.1038/s41597-025-05156-9The “LLM World of Words” English free association norms generated by large language modelsKatherine Abramski0Riccardo Improta1Giulio Rossetti2Massimo Stella3University of Pisa, Department of Computer ScienceUniversity of Trento, Department of Psychology and Cognitive ScienceNational Research Council of Italy, Institute of Information Science and TechnologiesUniversity of Trento, Department of Psychology and Cognitive ScienceAbstract Free associations have been extensively used in psychology and linguistics for studying how conceptual knowledge is organized. Recently, the potential of applying a similar approach for investigating the knowledge encoded in LLMs has emerged, specifically as a method for investigating LLM biases. However, the absence of large-scale LLM-generated free association norms that are comparable with human-generated norms is an obstacle to this research direction. To address this, we create a new dataset of LLM-generated free association norms modeled after the “Small World of Words”(SWOW) human-generated norms with nearly 12,000 cue words. We prompt three LLMs (Mistral, Llama3, and Haiku) with the same cues as those in SWOW to generate three novel comparable datasets, the “LLM World of Words” (LWOW). From the datasets, we construct network models of semantic memory that represent the conceptual knowledge possessed by humans and LLMs. We validate the datasets by simulating semantic priming within the network models, and we briefly discuss how the datasets can be used for investigating implicit biases in humans and LLMs.https://doi.org/10.1038/s41597-025-05156-9 |
| spellingShingle | Katherine Abramski Riccardo Improta Giulio Rossetti Massimo Stella The “LLM World of Words” English free association norms generated by large language models Scientific Data |
| title | The “LLM World of Words” English free association norms generated by large language models |
| title_full | The “LLM World of Words” English free association norms generated by large language models |
| title_fullStr | The “LLM World of Words” English free association norms generated by large language models |
| title_full_unstemmed | The “LLM World of Words” English free association norms generated by large language models |
| title_short | The “LLM World of Words” English free association norms generated by large language models |
| title_sort | llm world of words english free association norms generated by large language models |
| url | https://doi.org/10.1038/s41597-025-05156-9 |
| work_keys_str_mv | AT katherineabramski thellmworldofwordsenglishfreeassociationnormsgeneratedbylargelanguagemodels AT riccardoimprota thellmworldofwordsenglishfreeassociationnormsgeneratedbylargelanguagemodels AT giuliorossetti thellmworldofwordsenglishfreeassociationnormsgeneratedbylargelanguagemodels AT massimostella thellmworldofwordsenglishfreeassociationnormsgeneratedbylargelanguagemodels AT katherineabramski llmworldofwordsenglishfreeassociationnormsgeneratedbylargelanguagemodels AT riccardoimprota llmworldofwordsenglishfreeassociationnormsgeneratedbylargelanguagemodels AT giuliorossetti llmworldofwordsenglishfreeassociationnormsgeneratedbylargelanguagemodels AT massimostella llmworldofwordsenglishfreeassociationnormsgeneratedbylargelanguagemodels |