A Novel Approach to Continual Knowledge Transfer in Multilingual Neural Machine Translation Using Autoregressive and Non-Autoregressive Models for Indic Languages

Recent progress in multilingual pre-trained models has significantly improved translation quality for Indic languages. However, extending these models to new languages via fine-tuning or retraining remains computationally costly and often leads to parameter interference, degrading performance on pre...

Full description

Saved in:
Bibliographic Details
Main Authors: Shailashree K. Sheshadri, Deepa Gupta, Biswajit Paul, J. Siva Bhavani
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11005970/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849717975869816832
author Shailashree K. Sheshadri
Deepa Gupta
Biswajit Paul
J. Siva Bhavani
author_facet Shailashree K. Sheshadri
Deepa Gupta
Biswajit Paul
J. Siva Bhavani
author_sort Shailashree K. Sheshadri
collection DOAJ
description Recent progress in multilingual pre-trained models has significantly improved translation quality for Indic languages. However, extending these models to new languages via fine-tuning or retraining remains computationally costly and often leads to parameter interference, degrading performance on previously learned or typologically distant languages. While continual learning offers a promising alternative for incremental language addition, its application in Indic contexts is still limited and faces challenges in generalization across diverse linguistic settings. To overcome these issues, we propose a Continual Knowledge Transfer (CKT) framework for efficient and scalable multilingual adaptation. CKT is realized in both autoregressive (MNMT) and non-autoregressive (Switch-GLAT) architectures, yielding two variants: MNMT+CKT and Switch-GLAT+CKT. Rather than retraining the entire model, CKT freezes the multilingual base and updates only parameters relevant to the newly added language. Key innovations include gradient-based knowledge pruning, sequential teacher integration, and dynamic vocabulary expansion for minimizing interference and maximizing cross-lingual retention. Comprehensive evaluations on the IN22-Conv and IN22-Gen benchmark datasets demonstrate that both MNMT+CKT and Switch-GLAT+CKT consistently outperform established baselines, such as IndicTrans2, Google Translate, GPT-4-32K, LLaMA-2-17B, and NLIP-LAB-IITH. The proposed multi-step distillation approach, MNMT+CKT, consistently outperforms conventional fine-tuning and Knowledge Transfer (MNMT+KT) strategies for incremental adaptation of linguistically diverse Indic languages. On IN22-Conv, BLEU improvements range from +4.93 (Kashmiri <inline-formula> <tex-math notation="LaTeX">$\rightarrow $ </tex-math></inline-formula> English) to +11.48 (Assamese <inline-formula> <tex-math notation="LaTeX">$\rightarrow $ </tex-math></inline-formula> English) and similar improvements are seen for IN22-Gen. The method also achieves substantial reductions in trainable parameters&#x2014;19.80% (Nepali) to 66.87% (Kannada)&#x2014;while enabling up to <inline-formula> <tex-math notation="LaTeX">$4x$ </tex-math></inline-formula> faster inference when integrated with the Switch-GLAT architecture. Among the two, Switch- GLAT+CKT achieves the highest BLEU scores across all language pairs. In the English <inline-formula> <tex-math notation="LaTeX">$\rightarrow $ </tex-math></inline-formula> Indic translation direction, BLEU gains range from +9.12 (Kannada) to +26.00 (Nepali), while in the Indic <inline-formula> <tex-math notation="LaTeX">$\rightarrow $ </tex-math></inline-formula> English direction, gains range from +0.07 (Odia) to +3.78 (Assamese). Furthermore, ablation studies and sequential integration of multilingual teacher models reveal that CKT significantly reduces the number of trainable parameters required for each incremental step.
format Article
id doaj-art-044ca0be367640e2aa03baf1dabc2a59
institution DOAJ
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-044ca0be367640e2aa03baf1dabc2a592025-08-20T03:12:31ZengIEEEIEEE Access2169-35362025-01-0113897758981010.1109/ACCESS.2025.357069911005970A Novel Approach to Continual Knowledge Transfer in Multilingual Neural Machine Translation Using Autoregressive and Non-Autoregressive Models for Indic LanguagesShailashree K. Sheshadri0Deepa Gupta1https://orcid.org/0000-0002-1041-5125Biswajit Paul2J. Siva Bhavani3Department of Computer Science and Engineering, Amrita School of Computing, Bengaluru, Amrita Vishwa Vidyapeetham, IndiaDepartment of Computer Science and Engineering, Amrita School of Computing, Bengaluru, Amrita Vishwa Vidyapeetham, IndiaCentre for Artificial Intelligence and Robotics, DRDO, CV Raman Nagar, Bangalore, IndiaCentre for Artificial Intelligence and Robotics, DRDO, CV Raman Nagar, Bangalore, IndiaRecent progress in multilingual pre-trained models has significantly improved translation quality for Indic languages. However, extending these models to new languages via fine-tuning or retraining remains computationally costly and often leads to parameter interference, degrading performance on previously learned or typologically distant languages. While continual learning offers a promising alternative for incremental language addition, its application in Indic contexts is still limited and faces challenges in generalization across diverse linguistic settings. To overcome these issues, we propose a Continual Knowledge Transfer (CKT) framework for efficient and scalable multilingual adaptation. CKT is realized in both autoregressive (MNMT) and non-autoregressive (Switch-GLAT) architectures, yielding two variants: MNMT+CKT and Switch-GLAT+CKT. Rather than retraining the entire model, CKT freezes the multilingual base and updates only parameters relevant to the newly added language. Key innovations include gradient-based knowledge pruning, sequential teacher integration, and dynamic vocabulary expansion for minimizing interference and maximizing cross-lingual retention. Comprehensive evaluations on the IN22-Conv and IN22-Gen benchmark datasets demonstrate that both MNMT+CKT and Switch-GLAT+CKT consistently outperform established baselines, such as IndicTrans2, Google Translate, GPT-4-32K, LLaMA-2-17B, and NLIP-LAB-IITH. The proposed multi-step distillation approach, MNMT+CKT, consistently outperforms conventional fine-tuning and Knowledge Transfer (MNMT+KT) strategies for incremental adaptation of linguistically diverse Indic languages. On IN22-Conv, BLEU improvements range from +4.93 (Kashmiri <inline-formula> <tex-math notation="LaTeX">$\rightarrow $ </tex-math></inline-formula> English) to +11.48 (Assamese <inline-formula> <tex-math notation="LaTeX">$\rightarrow $ </tex-math></inline-formula> English) and similar improvements are seen for IN22-Gen. The method also achieves substantial reductions in trainable parameters&#x2014;19.80% (Nepali) to 66.87% (Kannada)&#x2014;while enabling up to <inline-formula> <tex-math notation="LaTeX">$4x$ </tex-math></inline-formula> faster inference when integrated with the Switch-GLAT architecture. Among the two, Switch- GLAT+CKT achieves the highest BLEU scores across all language pairs. In the English <inline-formula> <tex-math notation="LaTeX">$\rightarrow $ </tex-math></inline-formula> Indic translation direction, BLEU gains range from +9.12 (Kannada) to +26.00 (Nepali), while in the Indic <inline-formula> <tex-math notation="LaTeX">$\rightarrow $ </tex-math></inline-formula> English direction, gains range from +0.07 (Odia) to +3.78 (Assamese). Furthermore, ablation studies and sequential integration of multilingual teacher models reveal that CKT significantly reduces the number of trainable parameters required for each incremental step.https://ieeexplore.ieee.org/document/11005970/Multilingual neural machine translationcontinual learningincremental language adaptationcontinual knowledge transferIndic MT
spellingShingle Shailashree K. Sheshadri
Deepa Gupta
Biswajit Paul
J. Siva Bhavani
A Novel Approach to Continual Knowledge Transfer in Multilingual Neural Machine Translation Using Autoregressive and Non-Autoregressive Models for Indic Languages
IEEE Access
Multilingual neural machine translation
continual learning
incremental language adaptation
continual knowledge transfer
Indic MT
title A Novel Approach to Continual Knowledge Transfer in Multilingual Neural Machine Translation Using Autoregressive and Non-Autoregressive Models for Indic Languages
title_full A Novel Approach to Continual Knowledge Transfer in Multilingual Neural Machine Translation Using Autoregressive and Non-Autoregressive Models for Indic Languages
title_fullStr A Novel Approach to Continual Knowledge Transfer in Multilingual Neural Machine Translation Using Autoregressive and Non-Autoregressive Models for Indic Languages
title_full_unstemmed A Novel Approach to Continual Knowledge Transfer in Multilingual Neural Machine Translation Using Autoregressive and Non-Autoregressive Models for Indic Languages
title_short A Novel Approach to Continual Knowledge Transfer in Multilingual Neural Machine Translation Using Autoregressive and Non-Autoregressive Models for Indic Languages
title_sort novel approach to continual knowledge transfer in multilingual neural machine translation using autoregressive and non autoregressive models for indic languages
topic Multilingual neural machine translation
continual learning
incremental language adaptation
continual knowledge transfer
Indic MT
url https://ieeexplore.ieee.org/document/11005970/
work_keys_str_mv AT shailashreeksheshadri anovelapproachtocontinualknowledgetransferinmultilingualneuralmachinetranslationusingautoregressiveandnonautoregressivemodelsforindiclanguages
AT deepagupta anovelapproachtocontinualknowledgetransferinmultilingualneuralmachinetranslationusingautoregressiveandnonautoregressivemodelsforindiclanguages
AT biswajitpaul anovelapproachtocontinualknowledgetransferinmultilingualneuralmachinetranslationusingautoregressiveandnonautoregressivemodelsforindiclanguages
AT jsivabhavani anovelapproachtocontinualknowledgetransferinmultilingualneuralmachinetranslationusingautoregressiveandnonautoregressivemodelsforindiclanguages
AT shailashreeksheshadri novelapproachtocontinualknowledgetransferinmultilingualneuralmachinetranslationusingautoregressiveandnonautoregressivemodelsforindiclanguages
AT deepagupta novelapproachtocontinualknowledgetransferinmultilingualneuralmachinetranslationusingautoregressiveandnonautoregressivemodelsforindiclanguages
AT biswajitpaul novelapproachtocontinualknowledgetransferinmultilingualneuralmachinetranslationusingautoregressiveandnonautoregressivemodelsforindiclanguages
AT jsivabhavani novelapproachtocontinualknowledgetransferinmultilingualneuralmachinetranslationusingautoregressiveandnonautoregressivemodelsforindiclanguages