Mathematical features of semantic projections and word embeddings for automatic linguistic analysis

Embeddings in normed spaces are a widely used tool in automatic linguistic analysis, as they help model semantic structures. They map words, phrases, or even entire sentences into vectors within a high-dimensional space, where the geometric proximity of vectors corresponds to the semantic similarity...

Full description

Saved in:
Bibliographic Details
Main Authors: Pedro Fernández de Córdoba, Carlos A. Reyes Pérez, Enrique A. Sánchez Pérez
Format: Article
Language:English
Published: AIMS Press 2025-02-01
Series:AIMS Mathematics
Subjects:
Online Access:https://www.aimspress.com/article/doi/10.3934/math.2025185
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Embeddings in normed spaces are a widely used tool in automatic linguistic analysis, as they help model semantic structures. They map words, phrases, or even entire sentences into vectors within a high-dimensional space, where the geometric proximity of vectors corresponds to the semantic similarity between the corresponding terms. This allows systems to perform various tasks like word analogy, similarity comparison, and clustering. However, the proximity of two points in such embeddings merely reflects metric similarity, which could fail to capture specific features relevant to a particular comparison, such as the price when comparing two cars or the size of different dog breeds. These specific features are typically modeled as linear functionals acting on the vectors of the normed space representing the terms, sometimes referred to as semantic projections. These functionals project the high-dimensional vectors onto lower-dimensional spaces that highlight particular attributes, such as the price, age, or brand. However, this approach may not always be ideal, as the assumption of linearity imposes a significant constraint. Many real-world relationships are nonlinear, and imposing linearity could overlook important non-linear interactions between features. This limitation has motivated research into non-linear embeddings and alternative models that can better capture the complex and multifaceted nature of semantic relationships, offering a more flexible and accurate representation of meaning in natural language processing.
ISSN:2473-6988