Deep neural architectures for Kashmiri-English machine translation

Abstract This paper presents the first comprehensive deep learning-based Neural Machine Translation (NMT) framework for the Kashmiri-English language pair. We introduce a high-quality parallel corpus of 270,000 sentence pairs and evaluate three NMT architectures: a basic encoder-decoder model, an at...

Full description

Saved in:
Bibliographic Details
Main Authors: Syed Matla Ul Qumar, Muzaffar Azim, S. M. K. Quadri, Mohannad Alkanan, Mohammad Shuaib Mir, Yonis Gulzar
Format: Article
Language:English
Published: Nature Portfolio 2025-08-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-025-14177-8
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract This paper presents the first comprehensive deep learning-based Neural Machine Translation (NMT) framework for the Kashmiri-English language pair. We introduce a high-quality parallel corpus of 270,000 sentence pairs and evaluate three NMT architectures: a basic encoder-decoder model, an attention-enhanced model, and a Transformer-based model. All models are trained from scratch using byte-pair encoded vocabularies and evaluated using BLEU, GLEU, ROUGE, and ChrF + + metrics. The Transformer architecture outperforms RNN-based baselines, achieving a BLEU-4 score of 0.2965 and demonstrating superior handling of long-range dependencies and Kashmiri’s morphological complexity. We further provide a structured linguistic error analysis and validate the significance of performance differences through bootstrap resampling. This work establishes the first NMT benchmark for Kashmiri-English translation and contributes a reusable dataset, baseline models, and evaluation methodology for future research in low-resource neural translation.
ISSN:2045-2322