Adversarial Sparse Teacher: Defense Against Distillation-Based Model Stealing Attacks Using Adversarial Examples

We introduce Adversarial Sparse Teacher (AST), a robust defense method against distillation-based model stealing attacks. Our approach trains a teacher model using adversarial examples to produce sparse logit responses and increase the entropy of the output distribution. Typically, a model generates...

Full description

Saved in:
Bibliographic Details
Main Authors: Eda Yilmaz, Hacer Yalim Keles
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11014106/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850230011268694016
author Eda Yilmaz
Hacer Yalim Keles
author_facet Eda Yilmaz
Hacer Yalim Keles
author_sort Eda Yilmaz
collection DOAJ
description We introduce Adversarial Sparse Teacher (AST), a robust defense method against distillation-based model stealing attacks. Our approach trains a teacher model using adversarial examples to produce sparse logit responses and increase the entropy of the output distribution. Typically, a model generates a peak in its output corresponding to its prediction. By leveraging adversarial examples, AST modifies the teacher model&#x2019;s original response, embedding a few altered logits into the output, while keeping the primary response slightly higher. Concurrently, all remaining logits are elevated to further increase the output distribution&#x2019;s entropy. All these complex manipulations are performed using an optimization function with our proposed Exponential Predictive Divergence (EPD) loss function. EPD allows us to maintain higher entropy levels compared to traditional KL divergence, effectively confusing attackers. Experiments on the CIFAR-10 and CIFAR-100 datasets demonstrate that AST outperforms state-of-the-art methods, providing effective defense against model stealing, while preserving high accuracy. The source codes are publicly available at <uri>https://github.com/codeofanon/AdversarialSparseTeacher</uri>
format Article
id doaj-art-3396a96e2c1b4e53a45448d438011d42
institution OA Journals
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-3396a96e2c1b4e53a45448d438011d422025-08-20T02:04:00ZengIEEEIEEE Access2169-35362025-01-0113920749208510.1109/ACCESS.2025.357310511014106Adversarial Sparse Teacher: Defense Against Distillation-Based Model Stealing Attacks Using Adversarial ExamplesEda Yilmaz0https://orcid.org/0009-0006-0138-0641Hacer Yalim Keles1https://orcid.org/0000-0002-1671-4126Computer Engineering Department, Hacettepe University, Ankara, T&#x00FC;rkiyeComputer Engineering Department, Hacettepe University, Ankara, T&#x00FC;rkiyeWe introduce Adversarial Sparse Teacher (AST), a robust defense method against distillation-based model stealing attacks. Our approach trains a teacher model using adversarial examples to produce sparse logit responses and increase the entropy of the output distribution. Typically, a model generates a peak in its output corresponding to its prediction. By leveraging adversarial examples, AST modifies the teacher model&#x2019;s original response, embedding a few altered logits into the output, while keeping the primary response slightly higher. Concurrently, all remaining logits are elevated to further increase the output distribution&#x2019;s entropy. All these complex manipulations are performed using an optimization function with our proposed Exponential Predictive Divergence (EPD) loss function. EPD allows us to maintain higher entropy levels compared to traditional KL divergence, effectively confusing attackers. Experiments on the CIFAR-10 and CIFAR-100 datasets demonstrate that AST outperforms state-of-the-art methods, providing effective defense against model stealing, while preserving high accuracy. The source codes are publicly available at <uri>https://github.com/codeofanon/AdversarialSparseTeacher</uri>https://ieeexplore.ieee.org/document/11014106/Adversarial examplesexponential predictive divergence (EPD)knowledge distillationmodel stealing defense
spellingShingle Eda Yilmaz
Hacer Yalim Keles
Adversarial Sparse Teacher: Defense Against Distillation-Based Model Stealing Attacks Using Adversarial Examples
IEEE Access
Adversarial examples
exponential predictive divergence (EPD)
knowledge distillation
model stealing defense
title Adversarial Sparse Teacher: Defense Against Distillation-Based Model Stealing Attacks Using Adversarial Examples
title_full Adversarial Sparse Teacher: Defense Against Distillation-Based Model Stealing Attacks Using Adversarial Examples
title_fullStr Adversarial Sparse Teacher: Defense Against Distillation-Based Model Stealing Attacks Using Adversarial Examples
title_full_unstemmed Adversarial Sparse Teacher: Defense Against Distillation-Based Model Stealing Attacks Using Adversarial Examples
title_short Adversarial Sparse Teacher: Defense Against Distillation-Based Model Stealing Attacks Using Adversarial Examples
title_sort adversarial sparse teacher defense against distillation based model stealing attacks using adversarial examples
topic Adversarial examples
exponential predictive divergence (EPD)
knowledge distillation
model stealing defense
url https://ieeexplore.ieee.org/document/11014106/
work_keys_str_mv AT edayilmaz adversarialsparseteacherdefenseagainstdistillationbasedmodelstealingattacksusingadversarialexamples
AT haceryalimkeles adversarialsparseteacherdefenseagainstdistillationbasedmodelstealingattacksusingadversarialexamples