Local-Global Feature Extraction Network With Dynamic 3-D Convolution and Residual Attention Transformer for Hyperspectral Image Classification

Currently, convolutional neural network (CNN) and transformer-based hyperspectral image (HSI) classification methods have attracted significant attention owing to their effective feature representation capabilities. However, methods based on CNN pay insufficient attention to valuable pixels in 3-D H...

Full description

Saved in:

Bibliographic Details
Main Authors:	Qiqiang Chen, Zhengyang Li, Junru Yin, Wei Huang, Tianming Zhan
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:	Dynamic 3-D convolution dynamic local feature extraction hyperspectral image (HSI) classification residual attention transformer residual global feature extraction
Online Access:	https://ieeexplore.ieee.org/document/10946673/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Currently, convolutional neural network (CNN) and transformer-based hyperspectral image (HSI) classification methods have attracted significant attention owing to their effective feature representation capabilities. However, methods based on CNN pay insufficient attention to valuable pixels in 3-D HSI samples and cannot adapt to variations in these samples. Transformer-based methods also suffer from high computational complexity and a tendency for low-level spatial-spectral features of the shallow attention layer to vanish as the number of attention layers increases. To address these issues, we proposed a local–global feature extraction network with dynamic 3-D convolution and residual attention transformer (LGDRNet). The LGDRNet primarily consists of multiscale 3-D conv, dynamic local feature extraction, residual global feature extraction, and feature fusion modules. Specifically, a multiscale 3-D conv module is used for low-level multiscale spectral information extraction. Then, the dynamic local feature extraction module utilizes dynamic 3-D convolution, which can adapt to different samples. This allows the network to focus on valuable pixels in 3-D samples. The residual global feature extraction module utilizes a convolutional projection unit and convolutional multihead self-attention to reduce computational complexity. It employs a residual attention connection to enable the network to effectively transmit and accumulate attention information across consecutive multihead attention layers. This prevents the vanishing of shallow spatial-spectral features. Finally, local and global HSI information may be efficiently integrated using the feature fusion module, which also improves performance during subsequent classification. The proposed model achieves overall classification accuracies of 89.24%, 92.01%, and 94.53% on three benchmark datasets, respectively, outperforming state-of-the-art approaches with limited training samples.
ISSN:	1939-1404 2151-1535

Local-Global Feature Extraction Network With Dynamic 3-D Convolution and Residual Attention Transformer for Hyperspectral Image Classification

Similar Items