Backdoor Attack Against Dataset Distillation in Natural Language Processing

Dataset distillation has become an important technique for enhancing the efficiency of data when training machine learning models. It finds extensive applications across various fields, including computer vision (CV) and natural language processing (NLP). However, it essentially consists of a deep n...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yuhao Chen, Weida Xu, Sicong Zhang, Yang Xu
Format:	Article
Language:	English
Published:	MDPI AG 2024-12-01
Series:	Applied Sciences
Subjects:	machine learning deep neural network natural language processing dataset distillation backdoor attack
Online Access:	https://www.mdpi.com/2076-3417/14/23/11425
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850261449098657792
author	Yuhao Chen Weida Xu Sicong Zhang Yang Xu
author_facet	Yuhao Chen Weida Xu Sicong Zhang Yang Xu
author_sort	Yuhao Chen
collection	DOAJ
description	Dataset distillation has become an important technique for enhancing the efficiency of data when training machine learning models. It finds extensive applications across various fields, including computer vision (CV) and natural language processing (NLP). However, it essentially consists of a deep neural network (DNN) model, which remain susceptible to security and privacy vulnerabilities (e.g., backdoor attacks). Existing studies have primarily focused on optimizing the balance between computational efficiency and model performance, overlooking the accompanying security and privacy risks. This study presents the first backdoor attack targeting NLP models trained on distilled datasets. We introduce malicious triggers into synthetic data during the distillation phase to execute a backdoor attack on downstream models trained with these data. We employ several widely used datasets to assess how different architectures and dataset distillation techniques withstand our attack. The experimental findings reveal that the attack achieves strong performance with a high (above 0.9 and up to 1.0) attack success rate (ASR) in most cases. For backdoor attacks, high attack performance often comes at the cost of reduced model utility. Our attack maintains high ASR while maximizing the preservation of downstream model utility, as evidenced by results showing that the clean test accuracy (CTA) of the backdoored model is very close to that of the clean model. Additionally, we performed comprehensive ablation studies to identify key factors affecting attack performance. We tested our attack method against five defense strategies, including NAD, Neural Cleanse, ONION, SCPD, and RAP. The experimental results show that these defense methods are unable to reduce the attack success rate without compromising the model’s performance on normal tasks. Therefore, these methods cannot effectively defend against our attack.
format	Article
id	doaj-art-fee83108917d47adade7c0ec70677cd9
institution	OA Journals
issn	2076-3417
language	English
publishDate	2024-12-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj-art-fee83108917d47adade7c0ec70677cd92025-08-20T01:55:26ZengMDPI AGApplied Sciences2076-34172024-12-0114231142510.3390/app142311425Backdoor Attack Against Dataset Distillation in Natural Language ProcessingYuhao Chen0Weida Xu1Sicong Zhang2Yang Xu3School of Cyber Science and Technology, Guizhou Normal University, Guiyang 550001, ChinaSchool of Cyber Science and Technology, Guizhou Normal University, Guiyang 550001, ChinaSchool of Cyber Science and Technology, Guizhou Normal University, Guiyang 550001, ChinaSchool of Cyber Science and Technology, Guizhou Normal University, Guiyang 550001, ChinaDataset distillation has become an important technique for enhancing the efficiency of data when training machine learning models. It finds extensive applications across various fields, including computer vision (CV) and natural language processing (NLP). However, it essentially consists of a deep neural network (DNN) model, which remain susceptible to security and privacy vulnerabilities (e.g., backdoor attacks). Existing studies have primarily focused on optimizing the balance between computational efficiency and model performance, overlooking the accompanying security and privacy risks. This study presents the first backdoor attack targeting NLP models trained on distilled datasets. We introduce malicious triggers into synthetic data during the distillation phase to execute a backdoor attack on downstream models trained with these data. We employ several widely used datasets to assess how different architectures and dataset distillation techniques withstand our attack. The experimental findings reveal that the attack achieves strong performance with a high (above 0.9 and up to 1.0) attack success rate (ASR) in most cases. For backdoor attacks, high attack performance often comes at the cost of reduced model utility. Our attack maintains high ASR while maximizing the preservation of downstream model utility, as evidenced by results showing that the clean test accuracy (CTA) of the backdoored model is very close to that of the clean model. Additionally, we performed comprehensive ablation studies to identify key factors affecting attack performance. We tested our attack method against five defense strategies, including NAD, Neural Cleanse, ONION, SCPD, and RAP. The experimental results show that these defense methods are unable to reduce the attack success rate without compromising the model’s performance on normal tasks. Therefore, these methods cannot effectively defend against our attack.https://www.mdpi.com/2076-3417/14/23/11425machine learningdeep neural networknatural language processingdataset distillationbackdoor attack
spellingShingle	Yuhao Chen Weida Xu Sicong Zhang Yang Xu Backdoor Attack Against Dataset Distillation in Natural Language Processing Applied Sciences machine learning deep neural network natural language processing dataset distillation backdoor attack
title	Backdoor Attack Against Dataset Distillation in Natural Language Processing
title_full	Backdoor Attack Against Dataset Distillation in Natural Language Processing
title_fullStr	Backdoor Attack Against Dataset Distillation in Natural Language Processing
title_full_unstemmed	Backdoor Attack Against Dataset Distillation in Natural Language Processing
title_short	Backdoor Attack Against Dataset Distillation in Natural Language Processing
title_sort	backdoor attack against dataset distillation in natural language processing
topic	machine learning deep neural network natural language processing dataset distillation backdoor attack
url	https://www.mdpi.com/2076-3417/14/23/11425
work_keys_str_mv	AT yuhaochen backdoorattackagainstdatasetdistillationinnaturallanguageprocessing AT weidaxu backdoorattackagainstdatasetdistillationinnaturallanguageprocessing AT sicongzhang backdoorattackagainstdatasetdistillationinnaturallanguageprocessing AT yangxu backdoorattackagainstdatasetdistillationinnaturallanguageprocessing

Backdoor Attack Against Dataset Distillation in Natural Language Processing

Similar Items