BDEKD: mitigating backdoor attacks in NLP models via ensemble knowledge distillation
Abstract Backdoor attacks present significant risks to the security of deep neural networks (DNNs) in NLP domain, as the attackers can covertly manipulate the model’s output behavior either by poisoning the training data or tampering model’s training process. This paper introduces a novel backdoor d...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-07-01
|
| Series: | Complex & Intelligent Systems |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s40747-025-02006-4 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849388809733537792 |
|---|---|
| author | Zijie Zhang Xinyuan Miao Chenyu Zhou Chenming Shang Xi Chen Xianglong Kong Wei Huang Yi Cao |
| author_facet | Zijie Zhang Xinyuan Miao Chenyu Zhou Chenming Shang Xi Chen Xianglong Kong Wei Huang Yi Cao |
| author_sort | Zijie Zhang |
| collection | DOAJ |
| description | Abstract Backdoor attacks present significant risks to the security of deep neural networks (DNNs) in NLP domain, as the attackers can covertly manipulate the model’s output behavior either by poisoning the training data or tampering model’s training process. This paper introduces a novel backdoor defence strategy, Backdoor Defense via Ensemble Knowledge Distillation (BDEKD), to mitigate various types of backdoors in compromised DNNs. It is marked as the first utilization of ensemble methods in enhancing backdoor mitigation. The BDEKD framework only requires a minimal subset of clean data to clean the compromised model, generating several relatively heterogeneous and backdoor-cleaned teacher models. This process is followed by an enhancement of the training data through augmentation, and the implementation of an ensemble distillation technique specifically designed to mitigate the backdoor from the model. Our empirical analysis demonstrates that BDEKD effectively lowers the success rate of six sophisticated backdoor attacks to approximately 17%, while only requiring 20% of the training data. Crucially, it preserves the model’s accuracy on clean data around 85%, ensuring minimal impact on its intended functionality. Our code is available at https://github.com/quanzhuangdefujinan/BDEKD-Research/tree/BDEKD . Graphical abstract |
| format | Article |
| id | doaj-art-87f0d5aef0d64402b5a78ed866c9cb2b |
| institution | Kabale University |
| issn | 2199-4536 2198-6053 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Springer |
| record_format | Article |
| series | Complex & Intelligent Systems |
| spelling | doaj-art-87f0d5aef0d64402b5a78ed866c9cb2b2025-08-20T03:42:10ZengSpringerComplex & Intelligent Systems2199-45362198-60532025-07-0111911710.1007/s40747-025-02006-4BDEKD: mitigating backdoor attacks in NLP models via ensemble knowledge distillationZijie Zhang0Xinyuan Miao1Chenyu Zhou2Chenming Shang3Xi Chen4Xianglong Kong5Wei Huang6Yi Cao7Southeast UniversityPurple Mountain LaboratoriesSoutheast UniversityTsinghua UniversityPurple Mountain LaboratoriesPurple Mountain LaboratoriesPurple Mountain LaboratoriesPurple Mountain LaboratoriesAbstract Backdoor attacks present significant risks to the security of deep neural networks (DNNs) in NLP domain, as the attackers can covertly manipulate the model’s output behavior either by poisoning the training data or tampering model’s training process. This paper introduces a novel backdoor defence strategy, Backdoor Defense via Ensemble Knowledge Distillation (BDEKD), to mitigate various types of backdoors in compromised DNNs. It is marked as the first utilization of ensemble methods in enhancing backdoor mitigation. The BDEKD framework only requires a minimal subset of clean data to clean the compromised model, generating several relatively heterogeneous and backdoor-cleaned teacher models. This process is followed by an enhancement of the training data through augmentation, and the implementation of an ensemble distillation technique specifically designed to mitigate the backdoor from the model. Our empirical analysis demonstrates that BDEKD effectively lowers the success rate of six sophisticated backdoor attacks to approximately 17%, while only requiring 20% of the training data. Crucially, it preserves the model’s accuracy on clean data around 85%, ensuring minimal impact on its intended functionality. Our code is available at https://github.com/quanzhuangdefujinan/BDEKD-Research/tree/BDEKD . Graphical abstracthttps://doi.org/10.1007/s40747-025-02006-4Backdoor defenseNatural language processingKnowledge distillationEnsemble learning |
| spellingShingle | Zijie Zhang Xinyuan Miao Chenyu Zhou Chenming Shang Xi Chen Xianglong Kong Wei Huang Yi Cao BDEKD: mitigating backdoor attacks in NLP models via ensemble knowledge distillation Complex & Intelligent Systems Backdoor defense Natural language processing Knowledge distillation Ensemble learning |
| title | BDEKD: mitigating backdoor attacks in NLP models via ensemble knowledge distillation |
| title_full | BDEKD: mitigating backdoor attacks in NLP models via ensemble knowledge distillation |
| title_fullStr | BDEKD: mitigating backdoor attacks in NLP models via ensemble knowledge distillation |
| title_full_unstemmed | BDEKD: mitigating backdoor attacks in NLP models via ensemble knowledge distillation |
| title_short | BDEKD: mitigating backdoor attacks in NLP models via ensemble knowledge distillation |
| title_sort | bdekd mitigating backdoor attacks in nlp models via ensemble knowledge distillation |
| topic | Backdoor defense Natural language processing Knowledge distillation Ensemble learning |
| url | https://doi.org/10.1007/s40747-025-02006-4 |
| work_keys_str_mv | AT zijiezhang bdekdmitigatingbackdoorattacksinnlpmodelsviaensembleknowledgedistillation AT xinyuanmiao bdekdmitigatingbackdoorattacksinnlpmodelsviaensembleknowledgedistillation AT chenyuzhou bdekdmitigatingbackdoorattacksinnlpmodelsviaensembleknowledgedistillation AT chenmingshang bdekdmitigatingbackdoorattacksinnlpmodelsviaensembleknowledgedistillation AT xichen bdekdmitigatingbackdoorattacksinnlpmodelsviaensembleknowledgedistillation AT xianglongkong bdekdmitigatingbackdoorattacksinnlpmodelsviaensembleknowledgedistillation AT weihuang bdekdmitigatingbackdoorattacksinnlpmodelsviaensembleknowledgedistillation AT yicao bdekdmitigatingbackdoorattacksinnlpmodelsviaensembleknowledgedistillation |