ACTFormer: A Transformer Network With Attention and Convolutional Synergy for Remote Sensing Scene Classification

In the field of remote sensing scene classification, persistent challenges such as interclass similarity and intraclass diversity—stemming from the inherent complexity of remote sensing scenes—continue to impede progress. Although convolutional neural networks (CNNs) and vision...

Full description

Saved in:

Bibliographic Details
Main Authors:	Chao Xie, Shengyu Zhao, Shutong Ye, Yeqi Fei, Xinyan Dai, Yap-Peng Tan
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:	Attention module convolutional neural network (CNN) loss function remote sensing scene classification (RSSC) vision transformer (ViT)
Online Access:	https://ieeexplore.ieee.org/document/11080299/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In the field of remote sensing scene classification, persistent challenges such as interclass similarity and intraclass diversity—stemming from the inherent complexity of remote sensing scenes—continue to impede progress. Although convolutional neural networks (CNNs) and vision transformers (ViTs) have both demonstrated commendable performance in this domain, CNNs often struggle to capture global dependencies, while ViTs show deficiencies in extracting localized image features. To overcome these limitations, we designed a transformer network called ACTFormer, which integrates convolution, self-attention, and attention mechanisms. This effectively combines the local feature extraction capability of convolution with the global dependency modeling ability of self-attention. In addition, in ACTFormer, we designed an adaptive focus attention module, which enables the network to focus more precisely and effectively on significant regions while filtering out irrelevant background noise. We also introduce a hybrid loss function, which combines center loss with cross-entropy loss to further reduce intraclass variance and enhance interclass distinctions. Extensive experiments on three benchmark remote sensing datasets (i.e., AID, NWPU, and UCM) demonstrate the effectiveness of our proposed method.
ISSN:	1939-1404 2151-1535

ACTFormer: A Transformer Network With Attention and Convolutional Synergy for Remote Sensing Scene Classification

Similar Items