Comparing the semantic structures of lexicon of Mandarin and English

This paper presents a cross-language study of lexical semantics within the framework of distributional semantics. We used a wide range of predefined semantic categories in Mandarin and English and compared the clusterings of these categories using FastText word embeddings. Three techniques of dimens...

Full description

Saved in:
Bibliographic Details
Main Authors: Yi Yang, R. Harald Baayen
Format: Article
Language:English
Published: Cambridge University Press 2025-01-01
Series:Language and Cognition
Subjects:
Online Access:https://www.cambridge.org/core/product/identifier/S1866980824000474/type/journal_article
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper presents a cross-language study of lexical semantics within the framework of distributional semantics. We used a wide range of predefined semantic categories in Mandarin and English and compared the clusterings of these categories using FastText word embeddings. Three techniques of dimensionality reduction were applied to mapping 300-dimensional FastText vectors into two-dimensional planes: multidimensional scaling, principal components analysis, and t-distributed stochastic neighbor embedding. The results show that t-SNE provides the clearest clustering of semantic categories, improving markedly on PCA and MDS. In both languages, we observed similar differentiation between verbs, adjectives, and nouns as well as between concrete and abstract words. In addition, the methods applied in this study, especially Procrustes analysis, make it possible to trace subtle differences in the structure of the semantic lexicons of Mandarin and English.
ISSN:1866-9808
1866-9859