A Novel Approach to Continual Knowledge Transfer in Multilingual Neural Machine Translation Using Autoregressive and Non-Autoregressive Models for Indic Languages
Recent progress in multilingual pre-trained models has significantly improved translation quality for Indic languages. However, extending these models to new languages via fine-tuning or retraining remains computationally costly and often leads to parameter interference, degrading performance on pre...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11005970/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849717975869816832 |
|---|---|
| author | Shailashree K. Sheshadri Deepa Gupta Biswajit Paul J. Siva Bhavani |
| author_facet | Shailashree K. Sheshadri Deepa Gupta Biswajit Paul J. Siva Bhavani |
| author_sort | Shailashree K. Sheshadri |
| collection | DOAJ |
| description | Recent progress in multilingual pre-trained models has significantly improved translation quality for Indic languages. However, extending these models to new languages via fine-tuning or retraining remains computationally costly and often leads to parameter interference, degrading performance on previously learned or typologically distant languages. While continual learning offers a promising alternative for incremental language addition, its application in Indic contexts is still limited and faces challenges in generalization across diverse linguistic settings. To overcome these issues, we propose a Continual Knowledge Transfer (CKT) framework for efficient and scalable multilingual adaptation. CKT is realized in both autoregressive (MNMT) and non-autoregressive (Switch-GLAT) architectures, yielding two variants: MNMT+CKT and Switch-GLAT+CKT. Rather than retraining the entire model, CKT freezes the multilingual base and updates only parameters relevant to the newly added language. Key innovations include gradient-based knowledge pruning, sequential teacher integration, and dynamic vocabulary expansion for minimizing interference and maximizing cross-lingual retention. Comprehensive evaluations on the IN22-Conv and IN22-Gen benchmark datasets demonstrate that both MNMT+CKT and Switch-GLAT+CKT consistently outperform established baselines, such as IndicTrans2, Google Translate, GPT-4-32K, LLaMA-2-17B, and NLIP-LAB-IITH. The proposed multi-step distillation approach, MNMT+CKT, consistently outperforms conventional fine-tuning and Knowledge Transfer (MNMT+KT) strategies for incremental adaptation of linguistically diverse Indic languages. On IN22-Conv, BLEU improvements range from +4.93 (Kashmiri <inline-formula> <tex-math notation="LaTeX">$\rightarrow $ </tex-math></inline-formula> English) to +11.48 (Assamese <inline-formula> <tex-math notation="LaTeX">$\rightarrow $ </tex-math></inline-formula> English) and similar improvements are seen for IN22-Gen. The method also achieves substantial reductions in trainable parameters—19.80% (Nepali) to 66.87% (Kannada)—while enabling up to <inline-formula> <tex-math notation="LaTeX">$4x$ </tex-math></inline-formula> faster inference when integrated with the Switch-GLAT architecture. Among the two, Switch- GLAT+CKT achieves the highest BLEU scores across all language pairs. In the English <inline-formula> <tex-math notation="LaTeX">$\rightarrow $ </tex-math></inline-formula> Indic translation direction, BLEU gains range from +9.12 (Kannada) to +26.00 (Nepali), while in the Indic <inline-formula> <tex-math notation="LaTeX">$\rightarrow $ </tex-math></inline-formula> English direction, gains range from +0.07 (Odia) to +3.78 (Assamese). Furthermore, ablation studies and sequential integration of multilingual teacher models reveal that CKT significantly reduces the number of trainable parameters required for each incremental step. |
| format | Article |
| id | doaj-art-044ca0be367640e2aa03baf1dabc2a59 |
| institution | DOAJ |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-044ca0be367640e2aa03baf1dabc2a592025-08-20T03:12:31ZengIEEEIEEE Access2169-35362025-01-0113897758981010.1109/ACCESS.2025.357069911005970A Novel Approach to Continual Knowledge Transfer in Multilingual Neural Machine Translation Using Autoregressive and Non-Autoregressive Models for Indic LanguagesShailashree K. Sheshadri0Deepa Gupta1https://orcid.org/0000-0002-1041-5125Biswajit Paul2J. Siva Bhavani3Department of Computer Science and Engineering, Amrita School of Computing, Bengaluru, Amrita Vishwa Vidyapeetham, IndiaDepartment of Computer Science and Engineering, Amrita School of Computing, Bengaluru, Amrita Vishwa Vidyapeetham, IndiaCentre for Artificial Intelligence and Robotics, DRDO, CV Raman Nagar, Bangalore, IndiaCentre for Artificial Intelligence and Robotics, DRDO, CV Raman Nagar, Bangalore, IndiaRecent progress in multilingual pre-trained models has significantly improved translation quality for Indic languages. However, extending these models to new languages via fine-tuning or retraining remains computationally costly and often leads to parameter interference, degrading performance on previously learned or typologically distant languages. While continual learning offers a promising alternative for incremental language addition, its application in Indic contexts is still limited and faces challenges in generalization across diverse linguistic settings. To overcome these issues, we propose a Continual Knowledge Transfer (CKT) framework for efficient and scalable multilingual adaptation. CKT is realized in both autoregressive (MNMT) and non-autoregressive (Switch-GLAT) architectures, yielding two variants: MNMT+CKT and Switch-GLAT+CKT. Rather than retraining the entire model, CKT freezes the multilingual base and updates only parameters relevant to the newly added language. Key innovations include gradient-based knowledge pruning, sequential teacher integration, and dynamic vocabulary expansion for minimizing interference and maximizing cross-lingual retention. Comprehensive evaluations on the IN22-Conv and IN22-Gen benchmark datasets demonstrate that both MNMT+CKT and Switch-GLAT+CKT consistently outperform established baselines, such as IndicTrans2, Google Translate, GPT-4-32K, LLaMA-2-17B, and NLIP-LAB-IITH. The proposed multi-step distillation approach, MNMT+CKT, consistently outperforms conventional fine-tuning and Knowledge Transfer (MNMT+KT) strategies for incremental adaptation of linguistically diverse Indic languages. On IN22-Conv, BLEU improvements range from +4.93 (Kashmiri <inline-formula> <tex-math notation="LaTeX">$\rightarrow $ </tex-math></inline-formula> English) to +11.48 (Assamese <inline-formula> <tex-math notation="LaTeX">$\rightarrow $ </tex-math></inline-formula> English) and similar improvements are seen for IN22-Gen. The method also achieves substantial reductions in trainable parameters—19.80% (Nepali) to 66.87% (Kannada)—while enabling up to <inline-formula> <tex-math notation="LaTeX">$4x$ </tex-math></inline-formula> faster inference when integrated with the Switch-GLAT architecture. Among the two, Switch- GLAT+CKT achieves the highest BLEU scores across all language pairs. In the English <inline-formula> <tex-math notation="LaTeX">$\rightarrow $ </tex-math></inline-formula> Indic translation direction, BLEU gains range from +9.12 (Kannada) to +26.00 (Nepali), while in the Indic <inline-formula> <tex-math notation="LaTeX">$\rightarrow $ </tex-math></inline-formula> English direction, gains range from +0.07 (Odia) to +3.78 (Assamese). Furthermore, ablation studies and sequential integration of multilingual teacher models reveal that CKT significantly reduces the number of trainable parameters required for each incremental step.https://ieeexplore.ieee.org/document/11005970/Multilingual neural machine translationcontinual learningincremental language adaptationcontinual knowledge transferIndic MT |
| spellingShingle | Shailashree K. Sheshadri Deepa Gupta Biswajit Paul J. Siva Bhavani A Novel Approach to Continual Knowledge Transfer in Multilingual Neural Machine Translation Using Autoregressive and Non-Autoregressive Models for Indic Languages IEEE Access Multilingual neural machine translation continual learning incremental language adaptation continual knowledge transfer Indic MT |
| title | A Novel Approach to Continual Knowledge Transfer in Multilingual Neural Machine Translation Using Autoregressive and Non-Autoregressive Models for Indic Languages |
| title_full | A Novel Approach to Continual Knowledge Transfer in Multilingual Neural Machine Translation Using Autoregressive and Non-Autoregressive Models for Indic Languages |
| title_fullStr | A Novel Approach to Continual Knowledge Transfer in Multilingual Neural Machine Translation Using Autoregressive and Non-Autoregressive Models for Indic Languages |
| title_full_unstemmed | A Novel Approach to Continual Knowledge Transfer in Multilingual Neural Machine Translation Using Autoregressive and Non-Autoregressive Models for Indic Languages |
| title_short | A Novel Approach to Continual Knowledge Transfer in Multilingual Neural Machine Translation Using Autoregressive and Non-Autoregressive Models for Indic Languages |
| title_sort | novel approach to continual knowledge transfer in multilingual neural machine translation using autoregressive and non autoregressive models for indic languages |
| topic | Multilingual neural machine translation continual learning incremental language adaptation continual knowledge transfer Indic MT |
| url | https://ieeexplore.ieee.org/document/11005970/ |
| work_keys_str_mv | AT shailashreeksheshadri anovelapproachtocontinualknowledgetransferinmultilingualneuralmachinetranslationusingautoregressiveandnonautoregressivemodelsforindiclanguages AT deepagupta anovelapproachtocontinualknowledgetransferinmultilingualneuralmachinetranslationusingautoregressiveandnonautoregressivemodelsforindiclanguages AT biswajitpaul anovelapproachtocontinualknowledgetransferinmultilingualneuralmachinetranslationusingautoregressiveandnonautoregressivemodelsforindiclanguages AT jsivabhavani anovelapproachtocontinualknowledgetransferinmultilingualneuralmachinetranslationusingautoregressiveandnonautoregressivemodelsforindiclanguages AT shailashreeksheshadri novelapproachtocontinualknowledgetransferinmultilingualneuralmachinetranslationusingautoregressiveandnonautoregressivemodelsforindiclanguages AT deepagupta novelapproachtocontinualknowledgetransferinmultilingualneuralmachinetranslationusingautoregressiveandnonautoregressivemodelsforindiclanguages AT biswajitpaul novelapproachtocontinualknowledgetransferinmultilingualneuralmachinetranslationusingautoregressiveandnonautoregressivemodelsforindiclanguages AT jsivabhavani novelapproachtocontinualknowledgetransferinmultilingualneuralmachinetranslationusingautoregressiveandnonautoregressivemodelsforindiclanguages |