Data Poisoning Attack on Black-Box Neural Machine Translation to Truncate Translation

Neural machine translation (NMT) systems have achieved outstanding performance and have been widely deployed in the real world. However, the undertranslation problem caused by the distribution of high-translation-entropy words in source sentences still exists, and can be aggravated by poisoning atta...

Full description

Saved in:
Bibliographic Details
Main Authors: Lingfang Li, Weijian Hu, Mingxing Luo
Format: Article
Language:English
Published: MDPI AG 2024-12-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/26/12/1081
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850050858262200320
author Lingfang Li
Weijian Hu
Mingxing Luo
author_facet Lingfang Li
Weijian Hu
Mingxing Luo
author_sort Lingfang Li
collection DOAJ
description Neural machine translation (NMT) systems have achieved outstanding performance and have been widely deployed in the real world. However, the undertranslation problem caused by the distribution of high-translation-entropy words in source sentences still exists, and can be aggravated by poisoning attacks. In this paper, we propose a new backdoor attack on NMT models by poisoning a small fraction of parallel training data. Our attack increases the translation entropy of words after injecting a backdoor trigger, making them more easily discarded by NMT. The final translation is part of target translation, and the position of the injected trigger poison affects the scope of the truncation. Moreover, we also propose a defense method, Backdoor Defense by Sematic Representation Change (BDSRC), against our attack. Specifically, we selected backdoor candidates based on the similarity between the semantic representation of words in a sentence and the overall sentence representation. Then, the injected backdoor is identified through computing the semantic deviation caused by backdoor candidates. The experiments show that our attack strategy can achieve a nearly 100% attack success rate, and the functionality of main translation tasks is almost unaffected in models having performance degradation that is less than 1 BLEU. Nonetheless, our defense method can effectively identify backdoor triggers and alleviate performance degradation.
format Article
id doaj-art-ed739dc2135942e48dd1fd9cf5680efa
institution DOAJ
issn 1099-4300
language English
publishDate 2024-12-01
publisher MDPI AG
record_format Article
series Entropy
spelling doaj-art-ed739dc2135942e48dd1fd9cf5680efa2025-08-20T02:53:19ZengMDPI AGEntropy1099-43002024-12-012612108110.3390/e26121081Data Poisoning Attack on Black-Box Neural Machine Translation to Truncate TranslationLingfang Li0Weijian Hu1Mingxing Luo2School of Information Science and Technology, Southwest Jiaotong University, Chengdu 610032, ChinaSchool of Information Engineer, Inner Mongolia University of Science & Technology, Hohhot 010021, ChinaSchool of Information Science and Technology, Southwest Jiaotong University, Chengdu 610032, ChinaNeural machine translation (NMT) systems have achieved outstanding performance and have been widely deployed in the real world. However, the undertranslation problem caused by the distribution of high-translation-entropy words in source sentences still exists, and can be aggravated by poisoning attacks. In this paper, we propose a new backdoor attack on NMT models by poisoning a small fraction of parallel training data. Our attack increases the translation entropy of words after injecting a backdoor trigger, making them more easily discarded by NMT. The final translation is part of target translation, and the position of the injected trigger poison affects the scope of the truncation. Moreover, we also propose a defense method, Backdoor Defense by Sematic Representation Change (BDSRC), against our attack. Specifically, we selected backdoor candidates based on the similarity between the semantic representation of words in a sentence and the overall sentence representation. Then, the injected backdoor is identified through computing the semantic deviation caused by backdoor candidates. The experiments show that our attack strategy can achieve a nearly 100% attack success rate, and the functionality of main translation tasks is almost unaffected in models having performance degradation that is less than 1 BLEU. Nonetheless, our defense method can effectively identify backdoor triggers and alleviate performance degradation.https://www.mdpi.com/1099-4300/26/12/1081neural machine translationdata poisoningbackdoor attack
spellingShingle Lingfang Li
Weijian Hu
Mingxing Luo
Data Poisoning Attack on Black-Box Neural Machine Translation to Truncate Translation
Entropy
neural machine translation
data poisoning
backdoor attack
title Data Poisoning Attack on Black-Box Neural Machine Translation to Truncate Translation
title_full Data Poisoning Attack on Black-Box Neural Machine Translation to Truncate Translation
title_fullStr Data Poisoning Attack on Black-Box Neural Machine Translation to Truncate Translation
title_full_unstemmed Data Poisoning Attack on Black-Box Neural Machine Translation to Truncate Translation
title_short Data Poisoning Attack on Black-Box Neural Machine Translation to Truncate Translation
title_sort data poisoning attack on black box neural machine translation to truncate translation
topic neural machine translation
data poisoning
backdoor attack
url https://www.mdpi.com/1099-4300/26/12/1081
work_keys_str_mv AT lingfangli datapoisoningattackonblackboxneuralmachinetranslationtotruncatetranslation
AT weijianhu datapoisoningattackonblackboxneuralmachinetranslationtotruncatetranslation
AT mingxingluo datapoisoningattackonblackboxneuralmachinetranslationtotruncatetranslation