Adversarial Sparse Teacher: Defense Against Distillation-Based Model Stealing Attacks Using Adversarial Examples
We introduce Adversarial Sparse Teacher (AST), a robust defense method against distillation-based model stealing attacks. Our approach trains a teacher model using adversarial examples to produce sparse logit responses and increase the entropy of the output distribution. Typically, a model generates...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11014106/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850230011268694016 |
|---|---|
| author | Eda Yilmaz Hacer Yalim Keles |
| author_facet | Eda Yilmaz Hacer Yalim Keles |
| author_sort | Eda Yilmaz |
| collection | DOAJ |
| description | We introduce Adversarial Sparse Teacher (AST), a robust defense method against distillation-based model stealing attacks. Our approach trains a teacher model using adversarial examples to produce sparse logit responses and increase the entropy of the output distribution. Typically, a model generates a peak in its output corresponding to its prediction. By leveraging adversarial examples, AST modifies the teacher model’s original response, embedding a few altered logits into the output, while keeping the primary response slightly higher. Concurrently, all remaining logits are elevated to further increase the output distribution’s entropy. All these complex manipulations are performed using an optimization function with our proposed Exponential Predictive Divergence (EPD) loss function. EPD allows us to maintain higher entropy levels compared to traditional KL divergence, effectively confusing attackers. Experiments on the CIFAR-10 and CIFAR-100 datasets demonstrate that AST outperforms state-of-the-art methods, providing effective defense against model stealing, while preserving high accuracy. The source codes are publicly available at <uri>https://github.com/codeofanon/AdversarialSparseTeacher</uri> |
| format | Article |
| id | doaj-art-3396a96e2c1b4e53a45448d438011d42 |
| institution | OA Journals |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-3396a96e2c1b4e53a45448d438011d422025-08-20T02:04:00ZengIEEEIEEE Access2169-35362025-01-0113920749208510.1109/ACCESS.2025.357310511014106Adversarial Sparse Teacher: Defense Against Distillation-Based Model Stealing Attacks Using Adversarial ExamplesEda Yilmaz0https://orcid.org/0009-0006-0138-0641Hacer Yalim Keles1https://orcid.org/0000-0002-1671-4126Computer Engineering Department, Hacettepe University, Ankara, TürkiyeComputer Engineering Department, Hacettepe University, Ankara, TürkiyeWe introduce Adversarial Sparse Teacher (AST), a robust defense method against distillation-based model stealing attacks. Our approach trains a teacher model using adversarial examples to produce sparse logit responses and increase the entropy of the output distribution. Typically, a model generates a peak in its output corresponding to its prediction. By leveraging adversarial examples, AST modifies the teacher model’s original response, embedding a few altered logits into the output, while keeping the primary response slightly higher. Concurrently, all remaining logits are elevated to further increase the output distribution’s entropy. All these complex manipulations are performed using an optimization function with our proposed Exponential Predictive Divergence (EPD) loss function. EPD allows us to maintain higher entropy levels compared to traditional KL divergence, effectively confusing attackers. Experiments on the CIFAR-10 and CIFAR-100 datasets demonstrate that AST outperforms state-of-the-art methods, providing effective defense against model stealing, while preserving high accuracy. The source codes are publicly available at <uri>https://github.com/codeofanon/AdversarialSparseTeacher</uri>https://ieeexplore.ieee.org/document/11014106/Adversarial examplesexponential predictive divergence (EPD)knowledge distillationmodel stealing defense |
| spellingShingle | Eda Yilmaz Hacer Yalim Keles Adversarial Sparse Teacher: Defense Against Distillation-Based Model Stealing Attacks Using Adversarial Examples IEEE Access Adversarial examples exponential predictive divergence (EPD) knowledge distillation model stealing defense |
| title | Adversarial Sparse Teacher: Defense Against Distillation-Based Model Stealing Attacks Using Adversarial Examples |
| title_full | Adversarial Sparse Teacher: Defense Against Distillation-Based Model Stealing Attacks Using Adversarial Examples |
| title_fullStr | Adversarial Sparse Teacher: Defense Against Distillation-Based Model Stealing Attacks Using Adversarial Examples |
| title_full_unstemmed | Adversarial Sparse Teacher: Defense Against Distillation-Based Model Stealing Attacks Using Adversarial Examples |
| title_short | Adversarial Sparse Teacher: Defense Against Distillation-Based Model Stealing Attacks Using Adversarial Examples |
| title_sort | adversarial sparse teacher defense against distillation based model stealing attacks using adversarial examples |
| topic | Adversarial examples exponential predictive divergence (EPD) knowledge distillation model stealing defense |
| url | https://ieeexplore.ieee.org/document/11014106/ |
| work_keys_str_mv | AT edayilmaz adversarialsparseteacherdefenseagainstdistillationbasedmodelstealingattacksusingadversarialexamples AT haceryalimkeles adversarialsparseteacherdefenseagainstdistillationbasedmodelstealingattacksusingadversarialexamples |