Data Poisoning Attack on Black-Box Neural Machine Translation to Truncate Translation

Neural machine translation (NMT) systems have achieved outstanding performance and have been widely deployed in the real world. However, the undertranslation problem caused by the distribution of high-translation-entropy words in source sentences still exists, and can be aggravated by poisoning atta...

Full description

Saved in:
Bibliographic Details
Main Authors: Lingfang Li, Weijian Hu, Mingxing Luo
Format: Article
Language:English
Published: MDPI AG 2024-12-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/26/12/1081
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Neural machine translation (NMT) systems have achieved outstanding performance and have been widely deployed in the real world. However, the undertranslation problem caused by the distribution of high-translation-entropy words in source sentences still exists, and can be aggravated by poisoning attacks. In this paper, we propose a new backdoor attack on NMT models by poisoning a small fraction of parallel training data. Our attack increases the translation entropy of words after injecting a backdoor trigger, making them more easily discarded by NMT. The final translation is part of target translation, and the position of the injected trigger poison affects the scope of the truncation. Moreover, we also propose a defense method, Backdoor Defense by Sematic Representation Change (BDSRC), against our attack. Specifically, we selected backdoor candidates based on the similarity between the semantic representation of words in a sentence and the overall sentence representation. Then, the injected backdoor is identified through computing the semantic deviation caused by backdoor candidates. The experiments show that our attack strategy can achieve a nearly 100% attack success rate, and the functionality of main translation tasks is almost unaffected in models having performance degradation that is less than 1 BLEU. Nonetheless, our defense method can effectively identify backdoor triggers and alleviate performance degradation.
ISSN:1099-4300