Global–Local Feature Fusion of Swin Kansformer Novel Network for Complex Scene Classification in Remote Sensing Images
The spatial distribution characteristics of remote sensing scene imagery exhibit significant complexity, necessitating the extraction of critical semantic features and effective discrimination of feature information to improve classification accuracy. While the combination of traditional convolution...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-03-01
|
| Series: | Remote Sensing |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2072-4292/17/7/1137 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The spatial distribution characteristics of remote sensing scene imagery exhibit significant complexity, necessitating the extraction of critical semantic features and effective discrimination of feature information to improve classification accuracy. While the combination of traditional convolutional neural networks (CNNs) and Transformers has proven effective in extracting features from both local and global perspectives, the multilayer perceptron (MLP) within Transformers struggles with nonlinear problems and insufficient feature representation, leading to suboptimal performance in fused models. To address these limitations, we propose a Swin Kansformer network for remote sensing scene classification, which integrates the Kolmogorov–Arnold Network (KAN) and employs a window-based self-attention mechanism for global information extraction. By replacing the traditional MLP layer with the KAN module, the network approximates functions through the decomposition of complex multivariate functions into univariate functions, enhancing the extraction of complex features. Additionally, an asymmetric convolution group module is introduced to replace conventional convolutions, further improving local feature extraction capabilities. Experimental validation on the AID and NWPU-RESISC45 datasets demonstrates that the proposed method achieves classification accuracies of 97.78% and 94.90%, respectively, outperforming state-of-the-art models such as ViT + LCA and ViT + PA by 0.89%, 1.06%, 0.27%, and 0.66%. These results highlight the performance advantages of the Swin Kansformer, while the incorporation of the KAN offers a novel and promising approach for remote sensing scene classification tasks with broad application potential. |
|---|---|
| ISSN: | 2072-4292 |