Backdoor Attack Against Dataset Distillation in Natural Language Processing

Dataset distillation has become an important technique for enhancing the efficiency of data when training machine learning models. It finds extensive applications across various fields, including computer vision (CV) and natural language processing (NLP). However, it essentially consists of a deep n...

Full description

Saved in:
Bibliographic Details
Main Authors: Yuhao Chen, Weida Xu, Sicong Zhang, Yang Xu
Format: Article
Language:English
Published: MDPI AG 2024-12-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/14/23/11425
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850261449098657792
author Yuhao Chen
Weida Xu
Sicong Zhang
Yang Xu
author_facet Yuhao Chen
Weida Xu
Sicong Zhang
Yang Xu
author_sort Yuhao Chen
collection DOAJ
description Dataset distillation has become an important technique for enhancing the efficiency of data when training machine learning models. It finds extensive applications across various fields, including computer vision (CV) and natural language processing (NLP). However, it essentially consists of a deep neural network (DNN) model, which remain susceptible to security and privacy vulnerabilities (e.g., backdoor attacks). Existing studies have primarily focused on optimizing the balance between computational efficiency and model performance, overlooking the accompanying security and privacy risks. This study presents the first backdoor attack targeting NLP models trained on distilled datasets. We introduce malicious triggers into synthetic data during the distillation phase to execute a backdoor attack on downstream models trained with these data. We employ several widely used datasets to assess how different architectures and dataset distillation techniques withstand our attack. The experimental findings reveal that the attack achieves strong performance with a high (above 0.9 and up to 1.0) attack success rate (ASR) in most cases. For backdoor attacks, high attack performance often comes at the cost of reduced model utility. Our attack maintains high ASR while maximizing the preservation of downstream model utility, as evidenced by results showing that the clean test accuracy (CTA) of the backdoored model is very close to that of the clean model. Additionally, we performed comprehensive ablation studies to identify key factors affecting attack performance. We tested our attack method against five defense strategies, including NAD, Neural Cleanse, ONION, SCPD, and RAP. The experimental results show that these defense methods are unable to reduce the attack success rate without compromising the model’s performance on normal tasks. Therefore, these methods cannot effectively defend against our attack.
format Article
id doaj-art-fee83108917d47adade7c0ec70677cd9
institution OA Journals
issn 2076-3417
language English
publishDate 2024-12-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-fee83108917d47adade7c0ec70677cd92025-08-20T01:55:26ZengMDPI AGApplied Sciences2076-34172024-12-0114231142510.3390/app142311425Backdoor Attack Against Dataset Distillation in Natural Language ProcessingYuhao Chen0Weida Xu1Sicong Zhang2Yang Xu3School of Cyber Science and Technology, Guizhou Normal University, Guiyang 550001, ChinaSchool of Cyber Science and Technology, Guizhou Normal University, Guiyang 550001, ChinaSchool of Cyber Science and Technology, Guizhou Normal University, Guiyang 550001, ChinaSchool of Cyber Science and Technology, Guizhou Normal University, Guiyang 550001, ChinaDataset distillation has become an important technique for enhancing the efficiency of data when training machine learning models. It finds extensive applications across various fields, including computer vision (CV) and natural language processing (NLP). However, it essentially consists of a deep neural network (DNN) model, which remain susceptible to security and privacy vulnerabilities (e.g., backdoor attacks). Existing studies have primarily focused on optimizing the balance between computational efficiency and model performance, overlooking the accompanying security and privacy risks. This study presents the first backdoor attack targeting NLP models trained on distilled datasets. We introduce malicious triggers into synthetic data during the distillation phase to execute a backdoor attack on downstream models trained with these data. We employ several widely used datasets to assess how different architectures and dataset distillation techniques withstand our attack. The experimental findings reveal that the attack achieves strong performance with a high (above 0.9 and up to 1.0) attack success rate (ASR) in most cases. For backdoor attacks, high attack performance often comes at the cost of reduced model utility. Our attack maintains high ASR while maximizing the preservation of downstream model utility, as evidenced by results showing that the clean test accuracy (CTA) of the backdoored model is very close to that of the clean model. Additionally, we performed comprehensive ablation studies to identify key factors affecting attack performance. We tested our attack method against five defense strategies, including NAD, Neural Cleanse, ONION, SCPD, and RAP. The experimental results show that these defense methods are unable to reduce the attack success rate without compromising the model’s performance on normal tasks. Therefore, these methods cannot effectively defend against our attack.https://www.mdpi.com/2076-3417/14/23/11425machine learningdeep neural networknatural language processingdataset distillationbackdoor attack
spellingShingle Yuhao Chen
Weida Xu
Sicong Zhang
Yang Xu
Backdoor Attack Against Dataset Distillation in Natural Language Processing
Applied Sciences
machine learning
deep neural network
natural language processing
dataset distillation
backdoor attack
title Backdoor Attack Against Dataset Distillation in Natural Language Processing
title_full Backdoor Attack Against Dataset Distillation in Natural Language Processing
title_fullStr Backdoor Attack Against Dataset Distillation in Natural Language Processing
title_full_unstemmed Backdoor Attack Against Dataset Distillation in Natural Language Processing
title_short Backdoor Attack Against Dataset Distillation in Natural Language Processing
title_sort backdoor attack against dataset distillation in natural language processing
topic machine learning
deep neural network
natural language processing
dataset distillation
backdoor attack
url https://www.mdpi.com/2076-3417/14/23/11425
work_keys_str_mv AT yuhaochen backdoorattackagainstdatasetdistillationinnaturallanguageprocessing
AT weidaxu backdoorattackagainstdatasetdistillationinnaturallanguageprocessing
AT sicongzhang backdoorattackagainstdatasetdistillationinnaturallanguageprocessing
AT yangxu backdoorattackagainstdatasetdistillationinnaturallanguageprocessing