An Advanced Natural Language Processing Framework for Arabic Named Entity Recognition: A Novel Approach to Handling Morphological Richness and Nested Entities

Named Entity Recognition (NER) is a fundamental task in Natural Language Processing (NLP) that supports applications such as information retrieval, sentiment analysis, and text summarization. While substantial progress has been made in NER for widely studied languages like English, Arabic presents u...

Full description

Saved in:
Bibliographic Details
Main Author: Saleh Albahli
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/6/3073
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Named Entity Recognition (NER) is a fundamental task in Natural Language Processing (NLP) that supports applications such as information retrieval, sentiment analysis, and text summarization. While substantial progress has been made in NER for widely studied languages like English, Arabic presents unique challenges due to its morphological richness, orthographic ambiguity, and the frequent occurrence of nested and overlapping entities. This paper introduces a novel Arabic NER framework that addresses these complexities through architectural innovations. The proposed model incorporates a Hybrid Feature Fusion Layer, which integrates external lexical features using a cross-attention mechanism and a Gated Lexical Unit (GLU) to filter noise, while a Compound Span Representation Layer employs Rotary Positional Encoding (RoPE) and Bidirectional GRUs to enhance the detection of complex entity structures. Additionally, an Enhanced Multi-Label Classification Layer improves the disambiguation of overlapping spans and assigns multiple entity types where applicable. The model is evaluated on three benchmark datasets—ANERcorp, ACE 2005, and a custom biomedical dataset—achieving an F1-score of 93.0% on ANERcorp and 89.6% on ACE 2005, significantly outperforming state-of-the-art methods. A case study further highlights the model’s real-world applicability in handling compound and nested entities with high confidence. By establishing a new benchmark for Arabic NER, this work provides a robust foundation for advancing NLP research in morphologically rich languages.
ISSN:2076-3417