Adversarial Training for Mitigating Insider-Driven XAI-Based Backdoor Attacks
The study investigates how adversarial training techniques can be used to introduce backdoors into deep learning models by an insider with privileged access to training data. The research demonstrates an insider-driven poison-label backdoor approach in which triggers are introduced into the training...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Future Internet |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1999-5903/17/5/209 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849327193716424704 |
|---|---|
| author | R. G. Gayathri Atul Sajjanhar Yong Xiang |
| author_facet | R. G. Gayathri Atul Sajjanhar Yong Xiang |
| author_sort | R. G. Gayathri |
| collection | DOAJ |
| description | The study investigates how adversarial training techniques can be used to introduce backdoors into deep learning models by an insider with privileged access to training data. The research demonstrates an insider-driven poison-label backdoor approach in which triggers are introduced into the training dataset. These triggers misclassify poisoned inputs while maintaining standard classification on clean data. An adversary can improve the stealth and effectiveness of such attacks by utilizing XAI techniques, which makes the detection of such attacks more difficult. The study uses publicly available datasets to evaluate the robustness of the deep learning models in this situation. Our experiments show that adversarial training considerably reduces backdoor attacks. These results are verified using various performance metrics, revealing model vulnerabilities and possible countermeasures. The findings demonstrate the importance of robust training techniques and effective adversarial defenses to improve the security of deep learning models against insider-driven backdoor attacks. |
| format | Article |
| id | doaj-art-dd04da092de243cbb4b9296135b9ce60 |
| institution | Kabale University |
| issn | 1999-5903 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Future Internet |
| spelling | doaj-art-dd04da092de243cbb4b9296135b9ce602025-08-20T03:47:57ZengMDPI AGFuture Internet1999-59032025-05-0117520910.3390/fi17050209Adversarial Training for Mitigating Insider-Driven XAI-Based Backdoor AttacksR. G. Gayathri0Atul Sajjanhar1Yong Xiang2School of Information Technology, Deakin University, Geelong, VIC 3217, AustraliaSchool of Information Technology, Deakin University, Geelong, VIC 3217, AustraliaSchool of Information Technology, Deakin University, Geelong, VIC 3217, AustraliaThe study investigates how adversarial training techniques can be used to introduce backdoors into deep learning models by an insider with privileged access to training data. The research demonstrates an insider-driven poison-label backdoor approach in which triggers are introduced into the training dataset. These triggers misclassify poisoned inputs while maintaining standard classification on clean data. An adversary can improve the stealth and effectiveness of such attacks by utilizing XAI techniques, which makes the detection of such attacks more difficult. The study uses publicly available datasets to evaluate the robustness of the deep learning models in this situation. Our experiments show that adversarial training considerably reduces backdoor attacks. These results are verified using various performance metrics, revealing model vulnerabilities and possible countermeasures. The findings demonstrate the importance of robust training techniques and effective adversarial defenses to improve the security of deep learning models against insider-driven backdoor attacks.https://www.mdpi.com/1999-5903/17/5/209adversarial trainingbackdoor attacksdata poisoninginsider threatgenerative modelsexplainable AI |
| spellingShingle | R. G. Gayathri Atul Sajjanhar Yong Xiang Adversarial Training for Mitigating Insider-Driven XAI-Based Backdoor Attacks Future Internet adversarial training backdoor attacks data poisoning insider threat generative models explainable AI |
| title | Adversarial Training for Mitigating Insider-Driven XAI-Based Backdoor Attacks |
| title_full | Adversarial Training for Mitigating Insider-Driven XAI-Based Backdoor Attacks |
| title_fullStr | Adversarial Training for Mitigating Insider-Driven XAI-Based Backdoor Attacks |
| title_full_unstemmed | Adversarial Training for Mitigating Insider-Driven XAI-Based Backdoor Attacks |
| title_short | Adversarial Training for Mitigating Insider-Driven XAI-Based Backdoor Attacks |
| title_sort | adversarial training for mitigating insider driven xai based backdoor attacks |
| topic | adversarial training backdoor attacks data poisoning insider threat generative models explainable AI |
| url | https://www.mdpi.com/1999-5903/17/5/209 |
| work_keys_str_mv | AT rggayathri adversarialtrainingformitigatinginsiderdrivenxaibasedbackdoorattacks AT atulsajjanhar adversarialtrainingformitigatinginsiderdrivenxaibasedbackdoorattacks AT yongxiang adversarialtrainingformitigatinginsiderdrivenxaibasedbackdoorattacks |