Backdoor Attack Against Dataset Distillation in Natural Language Processing
Dataset distillation has become an important technique for enhancing the efficiency of data when training machine learning models. It finds extensive applications across various fields, including computer vision (CV) and natural language processing (NLP). However, it essentially consists of a deep n...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2024-12-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/14/23/11425 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850261449098657792 |
|---|---|
| author | Yuhao Chen Weida Xu Sicong Zhang Yang Xu |
| author_facet | Yuhao Chen Weida Xu Sicong Zhang Yang Xu |
| author_sort | Yuhao Chen |
| collection | DOAJ |
| description | Dataset distillation has become an important technique for enhancing the efficiency of data when training machine learning models. It finds extensive applications across various fields, including computer vision (CV) and natural language processing (NLP). However, it essentially consists of a deep neural network (DNN) model, which remain susceptible to security and privacy vulnerabilities (e.g., backdoor attacks). Existing studies have primarily focused on optimizing the balance between computational efficiency and model performance, overlooking the accompanying security and privacy risks. This study presents the first backdoor attack targeting NLP models trained on distilled datasets. We introduce malicious triggers into synthetic data during the distillation phase to execute a backdoor attack on downstream models trained with these data. We employ several widely used datasets to assess how different architectures and dataset distillation techniques withstand our attack. The experimental findings reveal that the attack achieves strong performance with a high (above 0.9 and up to 1.0) attack success rate (ASR) in most cases. For backdoor attacks, high attack performance often comes at the cost of reduced model utility. Our attack maintains high ASR while maximizing the preservation of downstream model utility, as evidenced by results showing that the clean test accuracy (CTA) of the backdoored model is very close to that of the clean model. Additionally, we performed comprehensive ablation studies to identify key factors affecting attack performance. We tested our attack method against five defense strategies, including NAD, Neural Cleanse, ONION, SCPD, and RAP. The experimental results show that these defense methods are unable to reduce the attack success rate without compromising the model’s performance on normal tasks. Therefore, these methods cannot effectively defend against our attack. |
| format | Article |
| id | doaj-art-fee83108917d47adade7c0ec70677cd9 |
| institution | OA Journals |
| issn | 2076-3417 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-fee83108917d47adade7c0ec70677cd92025-08-20T01:55:26ZengMDPI AGApplied Sciences2076-34172024-12-0114231142510.3390/app142311425Backdoor Attack Against Dataset Distillation in Natural Language ProcessingYuhao Chen0Weida Xu1Sicong Zhang2Yang Xu3School of Cyber Science and Technology, Guizhou Normal University, Guiyang 550001, ChinaSchool of Cyber Science and Technology, Guizhou Normal University, Guiyang 550001, ChinaSchool of Cyber Science and Technology, Guizhou Normal University, Guiyang 550001, ChinaSchool of Cyber Science and Technology, Guizhou Normal University, Guiyang 550001, ChinaDataset distillation has become an important technique for enhancing the efficiency of data when training machine learning models. It finds extensive applications across various fields, including computer vision (CV) and natural language processing (NLP). However, it essentially consists of a deep neural network (DNN) model, which remain susceptible to security and privacy vulnerabilities (e.g., backdoor attacks). Existing studies have primarily focused on optimizing the balance between computational efficiency and model performance, overlooking the accompanying security and privacy risks. This study presents the first backdoor attack targeting NLP models trained on distilled datasets. We introduce malicious triggers into synthetic data during the distillation phase to execute a backdoor attack on downstream models trained with these data. We employ several widely used datasets to assess how different architectures and dataset distillation techniques withstand our attack. The experimental findings reveal that the attack achieves strong performance with a high (above 0.9 and up to 1.0) attack success rate (ASR) in most cases. For backdoor attacks, high attack performance often comes at the cost of reduced model utility. Our attack maintains high ASR while maximizing the preservation of downstream model utility, as evidenced by results showing that the clean test accuracy (CTA) of the backdoored model is very close to that of the clean model. Additionally, we performed comprehensive ablation studies to identify key factors affecting attack performance. We tested our attack method against five defense strategies, including NAD, Neural Cleanse, ONION, SCPD, and RAP. The experimental results show that these defense methods are unable to reduce the attack success rate without compromising the model’s performance on normal tasks. Therefore, these methods cannot effectively defend against our attack.https://www.mdpi.com/2076-3417/14/23/11425machine learningdeep neural networknatural language processingdataset distillationbackdoor attack |
| spellingShingle | Yuhao Chen Weida Xu Sicong Zhang Yang Xu Backdoor Attack Against Dataset Distillation in Natural Language Processing Applied Sciences machine learning deep neural network natural language processing dataset distillation backdoor attack |
| title | Backdoor Attack Against Dataset Distillation in Natural Language Processing |
| title_full | Backdoor Attack Against Dataset Distillation in Natural Language Processing |
| title_fullStr | Backdoor Attack Against Dataset Distillation in Natural Language Processing |
| title_full_unstemmed | Backdoor Attack Against Dataset Distillation in Natural Language Processing |
| title_short | Backdoor Attack Against Dataset Distillation in Natural Language Processing |
| title_sort | backdoor attack against dataset distillation in natural language processing |
| topic | machine learning deep neural network natural language processing dataset distillation backdoor attack |
| url | https://www.mdpi.com/2076-3417/14/23/11425 |
| work_keys_str_mv | AT yuhaochen backdoorattackagainstdatasetdistillationinnaturallanguageprocessing AT weidaxu backdoorattackagainstdatasetdistillationinnaturallanguageprocessing AT sicongzhang backdoorattackagainstdatasetdistillationinnaturallanguageprocessing AT yangxu backdoorattackagainstdatasetdistillationinnaturallanguageprocessing |