Adversarial Training for Mitigating Insider-Driven XAI-Based Backdoor Attacks

The study investigates how adversarial training techniques can be used to introduce backdoors into deep learning models by an insider with privileged access to training data. The research demonstrates an insider-driven poison-label backdoor approach in which triggers are introduced into the training...

Full description

Saved in:
Bibliographic Details
Main Authors: R. G. Gayathri, Atul Sajjanhar, Yong Xiang
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Future Internet
Subjects:
Online Access:https://www.mdpi.com/1999-5903/17/5/209
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849327193716424704
author R. G. Gayathri
Atul Sajjanhar
Yong Xiang
author_facet R. G. Gayathri
Atul Sajjanhar
Yong Xiang
author_sort R. G. Gayathri
collection DOAJ
description The study investigates how adversarial training techniques can be used to introduce backdoors into deep learning models by an insider with privileged access to training data. The research demonstrates an insider-driven poison-label backdoor approach in which triggers are introduced into the training dataset. These triggers misclassify poisoned inputs while maintaining standard classification on clean data. An adversary can improve the stealth and effectiveness of such attacks by utilizing XAI techniques, which makes the detection of such attacks more difficult. The study uses publicly available datasets to evaluate the robustness of the deep learning models in this situation. Our experiments show that adversarial training considerably reduces backdoor attacks. These results are verified using various performance metrics, revealing model vulnerabilities and possible countermeasures. The findings demonstrate the importance of robust training techniques and effective adversarial defenses to improve the security of deep learning models against insider-driven backdoor attacks.
format Article
id doaj-art-dd04da092de243cbb4b9296135b9ce60
institution Kabale University
issn 1999-5903
language English
publishDate 2025-05-01
publisher MDPI AG
record_format Article
series Future Internet
spelling doaj-art-dd04da092de243cbb4b9296135b9ce602025-08-20T03:47:57ZengMDPI AGFuture Internet1999-59032025-05-0117520910.3390/fi17050209Adversarial Training for Mitigating Insider-Driven XAI-Based Backdoor AttacksR. G. Gayathri0Atul Sajjanhar1Yong Xiang2School of Information Technology, Deakin University, Geelong, VIC 3217, AustraliaSchool of Information Technology, Deakin University, Geelong, VIC 3217, AustraliaSchool of Information Technology, Deakin University, Geelong, VIC 3217, AustraliaThe study investigates how adversarial training techniques can be used to introduce backdoors into deep learning models by an insider with privileged access to training data. The research demonstrates an insider-driven poison-label backdoor approach in which triggers are introduced into the training dataset. These triggers misclassify poisoned inputs while maintaining standard classification on clean data. An adversary can improve the stealth and effectiveness of such attacks by utilizing XAI techniques, which makes the detection of such attacks more difficult. The study uses publicly available datasets to evaluate the robustness of the deep learning models in this situation. Our experiments show that adversarial training considerably reduces backdoor attacks. These results are verified using various performance metrics, revealing model vulnerabilities and possible countermeasures. The findings demonstrate the importance of robust training techniques and effective adversarial defenses to improve the security of deep learning models against insider-driven backdoor attacks.https://www.mdpi.com/1999-5903/17/5/209adversarial trainingbackdoor attacksdata poisoninginsider threatgenerative modelsexplainable AI
spellingShingle R. G. Gayathri
Atul Sajjanhar
Yong Xiang
Adversarial Training for Mitigating Insider-Driven XAI-Based Backdoor Attacks
Future Internet
adversarial training
backdoor attacks
data poisoning
insider threat
generative models
explainable AI
title Adversarial Training for Mitigating Insider-Driven XAI-Based Backdoor Attacks
title_full Adversarial Training for Mitigating Insider-Driven XAI-Based Backdoor Attacks
title_fullStr Adversarial Training for Mitigating Insider-Driven XAI-Based Backdoor Attacks
title_full_unstemmed Adversarial Training for Mitigating Insider-Driven XAI-Based Backdoor Attacks
title_short Adversarial Training for Mitigating Insider-Driven XAI-Based Backdoor Attacks
title_sort adversarial training for mitigating insider driven xai based backdoor attacks
topic adversarial training
backdoor attacks
data poisoning
insider threat
generative models
explainable AI
url https://www.mdpi.com/1999-5903/17/5/209
work_keys_str_mv AT rggayathri adversarialtrainingformitigatinginsiderdrivenxaibasedbackdoorattacks
AT atulsajjanhar adversarialtrainingformitigatinginsiderdrivenxaibasedbackdoorattacks
AT yongxiang adversarialtrainingformitigatinginsiderdrivenxaibasedbackdoorattacks