Fine-grained entity disambiguation through numeric pattern awareness in transformer models

Abstract Knowledge base question answering systems rely on entity linking to connect textual mentions in natural language with corresponding entities in a structured knowledge base. While conventional methods perform well in static or generic scenarios, they often struggle in dynamic contexts that r...

Full description

Saved in:
Bibliographic Details
Main Authors: Jaeeun Jang, Sangmin Kim, Howon Moon, Sang Heon Shin, Mira Yun, Charles Wiseman
Format: Article
Language:English
Published: Springer 2025-05-01
Series:Complex & Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s40747-025-01936-3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850207338028335104
author Jaeeun Jang
Sangmin Kim
Howon Moon
Sang Heon Shin
Mira Yun
Charles Wiseman
author_facet Jaeeun Jang
Sangmin Kim
Howon Moon
Sang Heon Shin
Mira Yun
Charles Wiseman
author_sort Jaeeun Jang
collection DOAJ
description Abstract Knowledge base question answering systems rely on entity linking to connect textual mentions in natural language with corresponding entities in a structured knowledge base. While conventional methods perform well in static or generic scenarios, they often struggle in dynamic contexts that require fine-grained disambiguation between specific instances. This limitation is particularly evident in domains such as military situation reports, where accurate identification of individual instances is essential for timely and precise information retrieval. This work introduces a novel framework for instance-level entity linking that enables transformer-based language models to distinguish between similarly described but distinct instances of the same entity. The proposed framework integrates three core innovations: (1) a context-aware masking strategy that reduces over-reliance on surface-level entity mentions by encouraging the model to leverage surrounding contextual cues, (2) a multi-task learning component that improves the model’s understanding of relative spatial and temporal information, and (3) a position-aware representation technique that restructures numerical data into token sequences that preserve structural integrity, addressing inherent limitations of subword tokenization in current language models. Experimental evaluations on simulated Korean military reports demonstrate substantial performance gains, particularly in challenging cases involving ambiguous or overlapping entity descriptions. The enhanced model shows a markedly improved ability to differentiate between instances based on subtle temporal and geospatial cues. These advancements extend beyond entity linking, offering broader insights into how transformer-based models can more effectively interpret numerical information. This research establishes a new direction for precise entity disambiguation in dynamic environments and contributes practical methods for enhancing language model performance across critical domains, including medical diagnostics, financial analysis, scientific research, autonomous systems, and numerical reasoning in smaller Large Language Models, where our techniques can potentially bridge the performance gap with larger models without requiring extensive computational resources.
format Article
id doaj-art-e854bc66e53940f8b762983b4a5e615d
institution OA Journals
issn 2199-4536
2198-6053
language English
publishDate 2025-05-01
publisher Springer
record_format Article
series Complex & Intelligent Systems
spelling doaj-art-e854bc66e53940f8b762983b4a5e615d2025-08-20T02:10:32ZengSpringerComplex & Intelligent Systems2199-45362198-60532025-05-0111712110.1007/s40747-025-01936-3Fine-grained entity disambiguation through numeric pattern awareness in transformer modelsJaeeun Jang0Sangmin Kim1Howon Moon2Sang Heon Shin3Mira Yun4Charles Wiseman5AI Research Team, Hanwha SystemsAI Research Team, Hanwha SystemsAI Research Team, Hanwha SystemsAI Research Team, Hanwha SystemsDepartment of Computer Science, Boston CollegeDepartment of Computer Science, Boston CollegeAbstract Knowledge base question answering systems rely on entity linking to connect textual mentions in natural language with corresponding entities in a structured knowledge base. While conventional methods perform well in static or generic scenarios, they often struggle in dynamic contexts that require fine-grained disambiguation between specific instances. This limitation is particularly evident in domains such as military situation reports, where accurate identification of individual instances is essential for timely and precise information retrieval. This work introduces a novel framework for instance-level entity linking that enables transformer-based language models to distinguish between similarly described but distinct instances of the same entity. The proposed framework integrates three core innovations: (1) a context-aware masking strategy that reduces over-reliance on surface-level entity mentions by encouraging the model to leverage surrounding contextual cues, (2) a multi-task learning component that improves the model’s understanding of relative spatial and temporal information, and (3) a position-aware representation technique that restructures numerical data into token sequences that preserve structural integrity, addressing inherent limitations of subword tokenization in current language models. Experimental evaluations on simulated Korean military reports demonstrate substantial performance gains, particularly in challenging cases involving ambiguous or overlapping entity descriptions. The enhanced model shows a markedly improved ability to differentiate between instances based on subtle temporal and geospatial cues. These advancements extend beyond entity linking, offering broader insights into how transformer-based models can more effectively interpret numerical information. This research establishes a new direction for precise entity disambiguation in dynamic environments and contributes practical methods for enhancing language model performance across critical domains, including medical diagnostics, financial analysis, scientific research, autonomous systems, and numerical reasoning in smaller Large Language Models, where our techniques can potentially bridge the performance gap with larger models without requiring extensive computational resources.https://doi.org/10.1007/s40747-025-01936-3Entity LinkingKnowledge Base Question AnsweringInstance-level DisambiguationNumerical Information RecognitionMulti-task learning
spellingShingle Jaeeun Jang
Sangmin Kim
Howon Moon
Sang Heon Shin
Mira Yun
Charles Wiseman
Fine-grained entity disambiguation through numeric pattern awareness in transformer models
Complex & Intelligent Systems
Entity Linking
Knowledge Base Question Answering
Instance-level Disambiguation
Numerical Information Recognition
Multi-task learning
title Fine-grained entity disambiguation through numeric pattern awareness in transformer models
title_full Fine-grained entity disambiguation through numeric pattern awareness in transformer models
title_fullStr Fine-grained entity disambiguation through numeric pattern awareness in transformer models
title_full_unstemmed Fine-grained entity disambiguation through numeric pattern awareness in transformer models
title_short Fine-grained entity disambiguation through numeric pattern awareness in transformer models
title_sort fine grained entity disambiguation through numeric pattern awareness in transformer models
topic Entity Linking
Knowledge Base Question Answering
Instance-level Disambiguation
Numerical Information Recognition
Multi-task learning
url https://doi.org/10.1007/s40747-025-01936-3
work_keys_str_mv AT jaeeunjang finegrainedentitydisambiguationthroughnumericpatternawarenessintransformermodels
AT sangminkim finegrainedentitydisambiguationthroughnumericpatternawarenessintransformermodels
AT howonmoon finegrainedentitydisambiguationthroughnumericpatternawarenessintransformermodels
AT sangheonshin finegrainedentitydisambiguationthroughnumericpatternawarenessintransformermodels
AT mirayun finegrainedentitydisambiguationthroughnumericpatternawarenessintransformermodels
AT charleswiseman finegrainedentitydisambiguationthroughnumericpatternawarenessintransformermodels