ACTFormer: A Transformer Network With Attention and Convolutional Synergy for Remote Sensing Scene Classification
In the field of remote sensing scene classification, persistent challenges such as interclass similarity and intraclass diversity—stemming from the inherent complexity of remote sensing scenes—continue to impede progress. Although convolutional neural networks (CNNs) and vision...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11080299/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | In the field of remote sensing scene classification, persistent challenges such as interclass similarity and intraclass diversity—stemming from the inherent complexity of remote sensing scenes—continue to impede progress. Although convolutional neural networks (CNNs) and vision transformers (ViTs) have both demonstrated commendable performance in this domain, CNNs often struggle to capture global dependencies, while ViTs show deficiencies in extracting localized image features. To overcome these limitations, we designed a transformer network called ACTFormer, which integrates convolution, self-attention, and attention mechanisms. This effectively combines the local feature extraction capability of convolution with the global dependency modeling ability of self-attention. In addition, in ACTFormer, we designed an adaptive focus attention module, which enables the network to focus more precisely and effectively on significant regions while filtering out irrelevant background noise. We also introduce a hybrid loss function, which combines center loss with cross-entropy loss to further reduce intraclass variance and enhance interclass distinctions. Extensive experiments on three benchmark remote sensing datasets (i.e., AID, NWPU, and UCM) demonstrate the effectiveness of our proposed method. |
|---|---|
| ISSN: | 1939-1404 2151-1535 |