Sublemma-Based Neural Machine Translation

Powerful deep learning approach frees us from feature engineering in many artificial intelligence tasks. The approach is able to extract efficient representations from the input data, if the data are large enough. Unfortunately, it is not always possible to collect large and quality data. For tasks...

Full description

Saved in:

Bibliographic Details
Main Authors:	Thien Nguyen, Huu Nguyen, Phuoc Tran
Format:	Article
Language:	English
Published:	Wiley 2021-01-01
Series:	Complexity
Online Access:	http://dx.doi.org/10.1155/2021/5935958
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832561418398859264
author	Thien Nguyen Huu Nguyen Phuoc Tran
author_facet	Thien Nguyen Huu Nguyen Phuoc Tran
author_sort	Thien Nguyen
collection	DOAJ
description	Powerful deep learning approach frees us from feature engineering in many artificial intelligence tasks. The approach is able to extract efficient representations from the input data, if the data are large enough. Unfortunately, it is not always possible to collect large and quality data. For tasks in low-resource contexts, such as the Russian ⟶ Vietnamese machine translation, insights into the data can compensate for their humble size. In this study of modelling Russian ⟶ Vietnamese translation, we leverage the input Russian words by decomposing them into not only features but also subfeatures. First, we break down a Russian word into a set of linguistic features: part-of-speech, morphology, dependency labels, and lemma. Second, the lemma feature is further divided into subfeatures labelled with tags corresponding to their positions in the lemma. Being consistent with the source side, Vietnamese target sentences are represented as sequences of subtokens. Sublemma-based neural machine translation proves itself in our experiments on Russian-Vietnamese bilingual data collected from TED talks. Experiment results reveal that the proposed model outperforms the best available Russian ⟶ Vietnamese model by 0.97 BLEU. In addition, automatic machine judgment on the experiment results is verified by human judgment. The proposed sublemma-based model provides an alternative to existing models when we build translation systems from an inflectionally rich language, such as Russian, Czech, or Bulgarian, in low-resource contexts.
format	Article
id	doaj-art-3f03023756fc4488ad6ed91b51502b44
institution	Kabale University
issn	1076-2787 1099-0526
language	English
publishDate	2021-01-01
publisher	Wiley
record_format	Article
series	Complexity
spelling	doaj-art-3f03023756fc4488ad6ed91b51502b442025-02-03T01:25:02ZengWileyComplexity1076-27871099-05262021-01-01202110.1155/2021/59359585935958Sublemma-Based Neural Machine TranslationThien Nguyen0Huu Nguyen1Phuoc Tran2Natural Language Processing and Knowledge Discovery Laboratory, Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, VietnamFaculty of Information Technology, Ho Chi Minh City University of Food Industry, Ho Chi Minh City, VietnamNatural Language Processing and Knowledge Discovery Laboratory, Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, VietnamPowerful deep learning approach frees us from feature engineering in many artificial intelligence tasks. The approach is able to extract efficient representations from the input data, if the data are large enough. Unfortunately, it is not always possible to collect large and quality data. For tasks in low-resource contexts, such as the Russian ⟶ Vietnamese machine translation, insights into the data can compensate for their humble size. In this study of modelling Russian ⟶ Vietnamese translation, we leverage the input Russian words by decomposing them into not only features but also subfeatures. First, we break down a Russian word into a set of linguistic features: part-of-speech, morphology, dependency labels, and lemma. Second, the lemma feature is further divided into subfeatures labelled with tags corresponding to their positions in the lemma. Being consistent with the source side, Vietnamese target sentences are represented as sequences of subtokens. Sublemma-based neural machine translation proves itself in our experiments on Russian-Vietnamese bilingual data collected from TED talks. Experiment results reveal that the proposed model outperforms the best available Russian ⟶ Vietnamese model by 0.97 BLEU. In addition, automatic machine judgment on the experiment results is verified by human judgment. The proposed sublemma-based model provides an alternative to existing models when we build translation systems from an inflectionally rich language, such as Russian, Czech, or Bulgarian, in low-resource contexts.http://dx.doi.org/10.1155/2021/5935958
spellingShingle	Thien Nguyen Huu Nguyen Phuoc Tran Sublemma-Based Neural Machine Translation Complexity
title	Sublemma-Based Neural Machine Translation
title_full	Sublemma-Based Neural Machine Translation
title_fullStr	Sublemma-Based Neural Machine Translation
title_full_unstemmed	Sublemma-Based Neural Machine Translation
title_short	Sublemma-Based Neural Machine Translation
title_sort	sublemma based neural machine translation
url	http://dx.doi.org/10.1155/2021/5935958
work_keys_str_mv	AT thiennguyen sublemmabasedneuralmachinetranslation AT huunguyen sublemmabasedneuralmachinetranslation AT phuoctran sublemmabasedneuralmachinetranslation

Sublemma-Based Neural Machine Translation

Similar Items