Research on adversarial examples defense method based on multi-modal feature fusion
In recent years, the vulnerability of neural network models to adversarial attacks has been a significant concern, particularly in image classification tasks, where such attacks can lead to incorrect classifications. To counteract these attacks, numerous defence methods have been proposed. Existing...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
POSTS&TELECOM PRESS Co., LTD
2025-04-01
|
| Series: | 网络与信息安全学报 |
| Subjects: | |
| Online Access: | http://www.cjnis.com.cn/thesisDetails#10.11959/j.issn.2096-109x.2025023 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | In recent years, the vulnerability of neural network models to adversarial attacks has been a significant concern, particularly in image classification tasks, where such attacks can lead to incorrect classifications. To counteract these attacks, numerous defence methods have been proposed. Existing defence methods have predominantly concentrated on enhancing model structures or adopting adversarial training methods individually, resulting in a single type of defence and potentially compromising the model's classification capability. Drawing on the human visual system's ability to perceive information through multimodal sensory inputs, a multimodal pyramid feature fusion (MPFF) defence method was proposed, integrating textual descriptions of images into the image information. Initially, ViT-GPT2 was utilized to generate corresponding textual descriptions based on image information, while a feature pyramid network captured multi-scale information. Subsequently, a pre-trained TF-IDF model was employed to extract feature matrices from the textual descriptions, and a ResNet50 model was used to extract image features. These image and text features were then weighted and fused to obtain the final multimodal features. Finally, a classifier was applied to perform classification detection using the fused features. Comparative experiments were conducted on the CIFAR-10 and ImageNet datasets. The experimental results demonstrate that the accuracy of the proposed method is improved by 21.8% and 22.5% on average compared to other methods under black-box attacks with varying disturbance intensities on the two datasets respectively. |
|---|---|
| ISSN: | 2096-109X |