Adversarial Training for Mitigating Insider-Driven XAI-Based Backdoor Attacks

The study investigates how adversarial training techniques can be used to introduce backdoors into deep learning models by an insider with privileged access to training data. The research demonstrates an insider-driven poison-label backdoor approach in which triggers are introduced into the training...

Full description

Saved in:

Bibliographic Details
Main Authors:	R. G. Gayathri, Atul Sajjanhar, Yong Xiang
Format:	Article
Language:	English
Published:	MDPI AG 2025-05-01
Series:	Future Internet
Subjects:	adversarial training backdoor attacks data poisoning insider threat generative models explainable AI
Online Access:	https://www.mdpi.com/1999-5903/17/5/209
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849327193716424704
author	R. G. Gayathri Atul Sajjanhar Yong Xiang
author_facet	R. G. Gayathri Atul Sajjanhar Yong Xiang
author_sort	R. G. Gayathri
collection	DOAJ
description	The study investigates how adversarial training techniques can be used to introduce backdoors into deep learning models by an insider with privileged access to training data. The research demonstrates an insider-driven poison-label backdoor approach in which triggers are introduced into the training dataset. These triggers misclassify poisoned inputs while maintaining standard classification on clean data. An adversary can improve the stealth and effectiveness of such attacks by utilizing XAI techniques, which makes the detection of such attacks more difficult. The study uses publicly available datasets to evaluate the robustness of the deep learning models in this situation. Our experiments show that adversarial training considerably reduces backdoor attacks. These results are verified using various performance metrics, revealing model vulnerabilities and possible countermeasures. The findings demonstrate the importance of robust training techniques and effective adversarial defenses to improve the security of deep learning models against insider-driven backdoor attacks.
format	Article
id	doaj-art-dd04da092de243cbb4b9296135b9ce60
institution	Kabale University
issn	1999-5903
language	English
publishDate	2025-05-01
publisher	MDPI AG
record_format	Article
series	Future Internet
spelling	doaj-art-dd04da092de243cbb4b9296135b9ce602025-08-20T03:47:57ZengMDPI AGFuture Internet1999-59032025-05-0117520910.3390/fi17050209Adversarial Training for Mitigating Insider-Driven XAI-Based Backdoor AttacksR. G. Gayathri0Atul Sajjanhar1Yong Xiang2School of Information Technology, Deakin University, Geelong, VIC 3217, AustraliaSchool of Information Technology, Deakin University, Geelong, VIC 3217, AustraliaSchool of Information Technology, Deakin University, Geelong, VIC 3217, AustraliaThe study investigates how adversarial training techniques can be used to introduce backdoors into deep learning models by an insider with privileged access to training data. The research demonstrates an insider-driven poison-label backdoor approach in which triggers are introduced into the training dataset. These triggers misclassify poisoned inputs while maintaining standard classification on clean data. An adversary can improve the stealth and effectiveness of such attacks by utilizing XAI techniques, which makes the detection of such attacks more difficult. The study uses publicly available datasets to evaluate the robustness of the deep learning models in this situation. Our experiments show that adversarial training considerably reduces backdoor attacks. These results are verified using various performance metrics, revealing model vulnerabilities and possible countermeasures. The findings demonstrate the importance of robust training techniques and effective adversarial defenses to improve the security of deep learning models against insider-driven backdoor attacks.https://www.mdpi.com/1999-5903/17/5/209adversarial trainingbackdoor attacksdata poisoninginsider threatgenerative modelsexplainable AI
spellingShingle	R. G. Gayathri Atul Sajjanhar Yong Xiang Adversarial Training for Mitigating Insider-Driven XAI-Based Backdoor Attacks Future Internet adversarial training backdoor attacks data poisoning insider threat generative models explainable AI
title	Adversarial Training for Mitigating Insider-Driven XAI-Based Backdoor Attacks
title_full	Adversarial Training for Mitigating Insider-Driven XAI-Based Backdoor Attacks
title_fullStr	Adversarial Training for Mitigating Insider-Driven XAI-Based Backdoor Attacks
title_full_unstemmed	Adversarial Training for Mitigating Insider-Driven XAI-Based Backdoor Attacks
title_short	Adversarial Training for Mitigating Insider-Driven XAI-Based Backdoor Attacks
title_sort	adversarial training for mitigating insider driven xai based backdoor attacks
topic	adversarial training backdoor attacks data poisoning insider threat generative models explainable AI
url	https://www.mdpi.com/1999-5903/17/5/209
work_keys_str_mv	AT rggayathri adversarialtrainingformitigatinginsiderdrivenxaibasedbackdoorattacks AT atulsajjanhar adversarialtrainingformitigatinginsiderdrivenxaibasedbackdoorattacks AT yongxiang adversarialtrainingformitigatinginsiderdrivenxaibasedbackdoorattacks

Adversarial Training for Mitigating Insider-Driven XAI-Based Backdoor Attacks

Similar Items