RSEFormer: A Residual Squeeze-Excitation-Based Transformer for Pixelwise Hyperspectral Image Classification

Hyperspectral image (HSI) classification plays an essential role in remote sensing image processing. Deep learning methods, especially the transformer, has achieved great success in HSI classification. However, due to the limited existing labeled data of HSI, the relation between objects is irregula...

Full description

Saved in:
Bibliographic Details
Main Authors: Yusen Liu, Hao Zhang, Fashuai Li, Fei Han, Yicheng Wang, Hao Pan, Boyu Liu, Guoliang Tang, Genghua Huang, Tingting He, Yuwei Chen
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10962545/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Hyperspectral image (HSI) classification plays an essential role in remote sensing image processing. Deep learning methods, especially the transformer, has achieved great success in HSI classification. However, due to the limited existing labeled data of HSI, the relation between objects is irregular in such a small dataset. Merely using the long-range attention based on transformers for learning may lead to bias results. In addition, it is challenging for current attention-based methods to extract attention between high-dimensional spectra, which affects the performance of the classification model. To this end, we propose a network that combines local spectral attention and global spatial-spectral attention, the residual depthwise separable squeeze-and-extraction transformer for HSI classification. Our framework integrates 3-D depthwise separable convolution (DSC) squeeze-and–excitation module, residual block, and sharpened attention vision transformer (SA-ViT) to extract spatial-spectral features from HSI. Three-dimensional DSC squeeze-and–excitation extracts spatial-spectral features and learns the local spectral implicit attention. Residual connection is introduced to hamper gradient vanishment during the network training. For global modeling, SA-ViT employs diagonal masking to eliminate self-token bias and learnable temperature parameters to sharpen attention score. Experimental results demonstrate that our method outperforms other approaches on five HSI benchmark datasets, achieving state-of-the-art performance.
ISSN:1939-1404
2151-1535