Improving Low-Resource Neural Machine Translation With Teacher-Free Knowledge Distillation

Knowledge Distillation (KD) aims to distill the knowledge of a cumbersome teacher model into a lightweight student model. Its success is generally attributed to the privileged information on similarities among categories provided by the teacher model, and in this sense, only strong teacher models ar...

Full description

Saved in:
Bibliographic Details
Main Authors: Xinlu Zhang, Xiao Li, Yating Yang, Rui Dong
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9257421/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850157149182754816
author Xinlu Zhang
Xiao Li
Yating Yang
Rui Dong
author_facet Xinlu Zhang
Xiao Li
Yating Yang
Rui Dong
author_sort Xinlu Zhang
collection DOAJ
description Knowledge Distillation (KD) aims to distill the knowledge of a cumbersome teacher model into a lightweight student model. Its success is generally attributed to the privileged information on similarities among categories provided by the teacher model, and in this sense, only strong teacher models are deployed to teach weaker students in practice. However, in low-resource neural machine translation, a stronger teacher model is not available. To counteract this, We therefore propose a novel Teacher-free Knowledge Distillation framework for low-resource neural machine translation, where the model learns from manually designed regularization distribution as a virtual teacher model. The prior distribution of artificial design can not only obtain the similarity information between words, but also provide effective regularity for model training. Experimental results show that the proposed method has improved performance in low-resource language effectively.
format Article
id doaj-art-235779ca011d4329b08486a59975cbee
institution OA Journals
issn 2169-3536
language English
publishDate 2020-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-235779ca011d4329b08486a59975cbee2025-08-20T02:24:15ZengIEEEIEEE Access2169-35362020-01-01820663820664510.1109/ACCESS.2020.30378219257421Improving Low-Resource Neural Machine Translation With Teacher-Free Knowledge DistillationXinlu Zhang0https://orcid.org/0000-0003-3553-5956Xiao Li1Yating Yang2https://orcid.org/0000-0002-2639-3944Rui Dong3https://orcid.org/0000-0002-4110-3976Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, ChinaXinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, ChinaXinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, ChinaXinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, ChinaKnowledge Distillation (KD) aims to distill the knowledge of a cumbersome teacher model into a lightweight student model. Its success is generally attributed to the privileged information on similarities among categories provided by the teacher model, and in this sense, only strong teacher models are deployed to teach weaker students in practice. However, in low-resource neural machine translation, a stronger teacher model is not available. To counteract this, We therefore propose a novel Teacher-free Knowledge Distillation framework for low-resource neural machine translation, where the model learns from manually designed regularization distribution as a virtual teacher model. The prior distribution of artificial design can not only obtain the similarity information between words, but also provide effective regularity for model training. Experimental results show that the proposed method has improved performance in low-resource language effectively.https://ieeexplore.ieee.org/document/9257421/Neural machine translationknowledge distillationprior knowledge
spellingShingle Xinlu Zhang
Xiao Li
Yating Yang
Rui Dong
Improving Low-Resource Neural Machine Translation With Teacher-Free Knowledge Distillation
IEEE Access
Neural machine translation
knowledge distillation
prior knowledge
title Improving Low-Resource Neural Machine Translation With Teacher-Free Knowledge Distillation
title_full Improving Low-Resource Neural Machine Translation With Teacher-Free Knowledge Distillation
title_fullStr Improving Low-Resource Neural Machine Translation With Teacher-Free Knowledge Distillation
title_full_unstemmed Improving Low-Resource Neural Machine Translation With Teacher-Free Knowledge Distillation
title_short Improving Low-Resource Neural Machine Translation With Teacher-Free Knowledge Distillation
title_sort improving low resource neural machine translation with teacher free knowledge distillation
topic Neural machine translation
knowledge distillation
prior knowledge
url https://ieeexplore.ieee.org/document/9257421/
work_keys_str_mv AT xinluzhang improvinglowresourceneuralmachinetranslationwithteacherfreeknowledgedistillation
AT xiaoli improvinglowresourceneuralmachinetranslationwithteacherfreeknowledgedistillation
AT yatingyang improvinglowresourceneuralmachinetranslationwithteacherfreeknowledgedistillation
AT ruidong improvinglowresourceneuralmachinetranslationwithteacherfreeknowledgedistillation