Multi-Targeted Textual Backdoor Attack: Model-Specific Misrecognition via Trigger Position and Word Choice
Deep neural networks excel in tasks like image recognition and natural language processing, but they are susceptible to backdoor attacks. These attacks involve training a target model with samples that include specific triggers, causing the model to misclassify these triggered samples while still co...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10938796/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Deep neural networks excel in tasks like image recognition and natural language processing, but they are susceptible to backdoor attacks. These attacks involve training a target model with samples that include specific triggers, causing the model to misclassify these triggered samples while still correctly classifying regular samples. While many studies have concentrated on single-target misclassifications caused by backdoor attacks, this paper introduces a new method for multi-targeted backdoor attacks in text. This approach causes misclassifications in models based on both the words and positions of the triggers. We targeted the Bidirectional Encoder Representations from Transformers (BERT) model using four trigger words: “ATTACK,” “WOW, ” “HELP,” and “GOOD,” placed at the beginning, end, and middle of sentences. Our experiments were conducted using the AG News dataset. The results demonstrate that the attacker can manipulate a specific model to misclassify based on the trigger position and the specific trigger word. Our experimental results indicate that with a backdoor sample size of 2% and triggers placed at the beginning or end of sentences, the backdoor sample was misclassified by the targeted model with an average success rate of 98.92%, while maintaining an average accuracy of 94.15% for the original samples. Additionally, when the backdoor sample size was increased to 5% and triggers were placed in the middle of sentences, the backdoor sample was misclassified with an average success rate of 95.46%, with the original sample accuracy at 93.93%, using the targeted model. |
|---|---|
| ISSN: | 2169-3536 |