Feature Extraction Model of SE-CMT Semantic Information Supplement

In image classification, beneficial semantic information supplementation can efficiently capture key regions and improve classification performance. To obtain beneficial image semantic information, an SE-CMT (SE-Networks CNN Meet Transformer) model is proposed. The model is based on the simple CNN f...

Full description

Saved in:
Bibliographic Details
Main Authors: DU Ruishan, ZHOU Changkun, XIE Hongtao, LI Hongjie
Format: Article
Language:zho
Published: Harbin University of Science and Technology Publications 2024-12-01
Series:Journal of Harbin University of Science and Technology
Subjects:
Online Access:https://hlgxb.hrbust.edu.cn/#/digest?ArticleID=2384
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In image classification, beneficial semantic information supplementation can efficiently capture key regions and improve classification performance. To obtain beneficial image semantic information, an SE-CMT (SE-Networks CNN Meet Transformer) model is proposed. The model is based on the simple CNN feature extraction theory, where the input image is rescaled by the SE-CMT Stem to the previously extracted features, and then the features are enhanced by the deep convolutional layer in the SE-CMT Block. The model uses SE-CNN (Squeeze-and-Excitation Networks-CNN) to extract low-level features, enhance localization, and combine with Transformer to establish long-range dependencies to improve feature extraction performance by fusing SE-CNN and Transformer structures. The experimental results on ImageNet and CIFAR-10 datasets show that the classification accuracy of the SE-CMT model reaches 85. 47% and 87. 16% top-1 accuracy, respectively, and the experiments show that the method outperforms the baseline models CMT and Vision Transformer. Therefore, the proposed SE-CMT model in this study is an effective method for image feature extraction.
ISSN:1007-2683