Detecting Black-Box Model Probing Attacks Through Probability Scores
In the black-box model probing attack, the attacker sends a series of model inference requests to a victim model to map out the classification boundary of the model. This attack is considered critical because it helps the attacker gain a better understanding of the model and launch follow-up attacks...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11029295/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849334007760683008 |
|---|---|
| author | Yongzhi Wang Ahsan Habib Likhitha Reddy Kesara Brahmarshi Jasti Renjie Hu Tejasv Singh |
| author_facet | Yongzhi Wang Ahsan Habib Likhitha Reddy Kesara Brahmarshi Jasti Renjie Hu Tejasv Singh |
| author_sort | Yongzhi Wang |
| collection | DOAJ |
| description | In the black-box model probing attack, the attacker sends a series of model inference requests to a victim model to map out the classification boundary of the model. This attack is considered critical because it helps the attacker gain a better understanding of the model and launch follow-up attacks. While most existing works defend against this attack through the input of the victim model, this paper addresses the problem by utilizing the output of the victim model, i.e., the sequence of probability scores generated through the model inquiries. Our research discovered that the sequences of top-2 probability scores generated by different model probing attacks form distinct patterns that can be detected through time-series classification methods, such as Transformer and Bidirectional Long Short-Term Memory (BiLSTM). Our experiments showed that both Transformer and BiLSTM can detect known and unknown black-box model probing attacks and their variants. Compared with existing defense methods, our classifiers can reduce the Attack Success Rate (ASR) of the OARS-enhanced attacks from 93% to 1.78%. To facilitate further studies on the black-box model probing attack, we collected the probability score sequences generated by 6 different black-box model probing attacks and released them as an open dataset: <uri>https://www.kaggle.com/datasets/drvoyager/model-probing-attack-dataset</uri>. |
| format | Article |
| id | doaj-art-82751f018f8b474099fea9c0511ac265 |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-82751f018f8b474099fea9c0511ac2652025-08-20T03:45:41ZengIEEEIEEE Access2169-35362025-01-011310306010307610.1109/ACCESS.2025.357856511029295Detecting Black-Box Model Probing Attacks Through Probability ScoresYongzhi Wang0https://orcid.org/0000-0002-7117-4097Ahsan Habib1Likhitha Reddy Kesara2Brahmarshi Jasti3Renjie Hu4https://orcid.org/0000-0002-0496-6035Tejasv Singh5Department of Computer Information Systems, California State Polytechnic University Pomona, Pomona, CA, USADepartment of Computer Science, Texas A&M University-Corpus Christi, Corpus Christi, TX, USADepartment of Information Science Technology, University of Houston, Houston, TX, USADepartment of Information Science Technology, University of Houston, Houston, TX, USADepartment of Information Science Technology, University of Houston, Houston, TX, USADepartment of Computer Science, Texas A&M University-Corpus Christi, Corpus Christi, TX, USAIn the black-box model probing attack, the attacker sends a series of model inference requests to a victim model to map out the classification boundary of the model. This attack is considered critical because it helps the attacker gain a better understanding of the model and launch follow-up attacks. While most existing works defend against this attack through the input of the victim model, this paper addresses the problem by utilizing the output of the victim model, i.e., the sequence of probability scores generated through the model inquiries. Our research discovered that the sequences of top-2 probability scores generated by different model probing attacks form distinct patterns that can be detected through time-series classification methods, such as Transformer and Bidirectional Long Short-Term Memory (BiLSTM). Our experiments showed that both Transformer and BiLSTM can detect known and unknown black-box model probing attacks and their variants. Compared with existing defense methods, our classifiers can reduce the Attack Success Rate (ASR) of the OARS-enhanced attacks from 93% to 1.78%. To facilitate further studies on the black-box model probing attack, we collected the probability score sequences generated by 6 different black-box model probing attacks and released them as an open dataset: <uri>https://www.kaggle.com/datasets/drvoyager/model-probing-attack-dataset</uri>.https://ieeexplore.ieee.org/document/11029295/Machine learning securityadversarial machine learningmodel probing attacksblack-box attackstransformerLSTM |
| spellingShingle | Yongzhi Wang Ahsan Habib Likhitha Reddy Kesara Brahmarshi Jasti Renjie Hu Tejasv Singh Detecting Black-Box Model Probing Attacks Through Probability Scores IEEE Access Machine learning security adversarial machine learning model probing attacks black-box attacks transformer LSTM |
| title | Detecting Black-Box Model Probing Attacks Through Probability Scores |
| title_full | Detecting Black-Box Model Probing Attacks Through Probability Scores |
| title_fullStr | Detecting Black-Box Model Probing Attacks Through Probability Scores |
| title_full_unstemmed | Detecting Black-Box Model Probing Attacks Through Probability Scores |
| title_short | Detecting Black-Box Model Probing Attacks Through Probability Scores |
| title_sort | detecting black box model probing attacks through probability scores |
| topic | Machine learning security adversarial machine learning model probing attacks black-box attacks transformer LSTM |
| url | https://ieeexplore.ieee.org/document/11029295/ |
| work_keys_str_mv | AT yongzhiwang detectingblackboxmodelprobingattacksthroughprobabilityscores AT ahsanhabib detectingblackboxmodelprobingattacksthroughprobabilityscores AT likhithareddykesara detectingblackboxmodelprobingattacksthroughprobabilityscores AT brahmarshijasti detectingblackboxmodelprobingattacksthroughprobabilityscores AT renjiehu detectingblackboxmodelprobingattacksthroughprobabilityscores AT tejasvsingh detectingblackboxmodelprobingattacksthroughprobabilityscores |