Local-Global Feature Extraction Network With Dynamic 3-D Convolution and Residual Attention Transformer for Hyperspectral Image Classification

Currently, convolutional neural network (CNN) and transformer-based hyperspectral image (HSI) classification methods have attracted significant attention owing to their effective feature representation capabilities. However, methods based on CNN pay insufficient attention to valuable pixels in 3-D H...

Full description

Saved in:

Bibliographic Details
Main Authors:	Qiqiang Chen, Zhengyang Li, Junru Yin, Wei Huang, Tianming Zhan
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:	Dynamic 3-D convolution dynamic local feature extraction hyperspectral image (HSI) classification residual attention transformer residual global feature extraction
Online Access:	https://ieeexplore.ieee.org/document/10946673/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849240455211909120
author	Qiqiang Chen Zhengyang Li Junru Yin Wei Huang Tianming Zhan
author_facet	Qiqiang Chen Zhengyang Li Junru Yin Wei Huang Tianming Zhan
author_sort	Qiqiang Chen
collection	DOAJ
description	Currently, convolutional neural network (CNN) and transformer-based hyperspectral image (HSI) classification methods have attracted significant attention owing to their effective feature representation capabilities. However, methods based on CNN pay insufficient attention to valuable pixels in 3-D HSI samples and cannot adapt to variations in these samples. Transformer-based methods also suffer from high computational complexity and a tendency for low-level spatial-spectral features of the shallow attention layer to vanish as the number of attention layers increases. To address these issues, we proposed a local–global feature extraction network with dynamic 3-D convolution and residual attention transformer (LGDRNet). The LGDRNet primarily consists of multiscale 3-D conv, dynamic local feature extraction, residual global feature extraction, and feature fusion modules. Specifically, a multiscale 3-D conv module is used for low-level multiscale spectral information extraction. Then, the dynamic local feature extraction module utilizes dynamic 3-D convolution, which can adapt to different samples. This allows the network to focus on valuable pixels in 3-D samples. The residual global feature extraction module utilizes a convolutional projection unit and convolutional multihead self-attention to reduce computational complexity. It employs a residual attention connection to enable the network to effectively transmit and accumulate attention information across consecutive multihead attention layers. This prevents the vanishing of shallow spatial-spectral features. Finally, local and global HSI information may be efficiently integrated using the feature fusion module, which also improves performance during subsequent classification. The proposed model achieves overall classification accuracies of 89.24%, 92.01%, and 94.53% on three benchmark datasets, respectively, outperforming state-of-the-art approaches with limited training samples.
format	Article
id	doaj-art-9afad653585b46369ffcd02af0b4deb0
institution	Kabale University
issn	1939-1404 2151-1535
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling	doaj-art-9afad653585b46369ffcd02af0b4deb02025-08-20T04:00:34ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-011899861000110.1109/JSTARS.2025.355672210946673Local-Global Feature Extraction Network With Dynamic 3-D Convolution and Residual Attention Transformer for Hyperspectral Image ClassificationQiqiang Chen0https://orcid.org/0009-0005-5965-7853Zhengyang Li1Junru Yin2https://orcid.org/0000-0002-7101-1140Wei Huang3https://orcid.org/0000-0002-0095-1354Tianming Zhan4https://orcid.org/0000-0001-5030-3032School of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou, ChinaSchool of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou, ChinaSchool of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou, ChinaSchool of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou, ChinaJiangsu Modern Intelligent Audit Integrated Application Technology Engineering Research Center, School of Computer Science, Nanjing Audit University, Nanjing, ChinaCurrently, convolutional neural network (CNN) and transformer-based hyperspectral image (HSI) classification methods have attracted significant attention owing to their effective feature representation capabilities. However, methods based on CNN pay insufficient attention to valuable pixels in 3-D HSI samples and cannot adapt to variations in these samples. Transformer-based methods also suffer from high computational complexity and a tendency for low-level spatial-spectral features of the shallow attention layer to vanish as the number of attention layers increases. To address these issues, we proposed a local–global feature extraction network with dynamic 3-D convolution and residual attention transformer (LGDRNet). The LGDRNet primarily consists of multiscale 3-D conv, dynamic local feature extraction, residual global feature extraction, and feature fusion modules. Specifically, a multiscale 3-D conv module is used for low-level multiscale spectral information extraction. Then, the dynamic local feature extraction module utilizes dynamic 3-D convolution, which can adapt to different samples. This allows the network to focus on valuable pixels in 3-D samples. The residual global feature extraction module utilizes a convolutional projection unit and convolutional multihead self-attention to reduce computational complexity. It employs a residual attention connection to enable the network to effectively transmit and accumulate attention information across consecutive multihead attention layers. This prevents the vanishing of shallow spatial-spectral features. Finally, local and global HSI information may be efficiently integrated using the feature fusion module, which also improves performance during subsequent classification. The proposed model achieves overall classification accuracies of 89.24%, 92.01%, and 94.53% on three benchmark datasets, respectively, outperforming state-of-the-art approaches with limited training samples.https://ieeexplore.ieee.org/document/10946673/Dynamic 3-D convolutiondynamic local feature extractionhyperspectral image (HSI) classificationresidual attention transformerresidual global feature extraction
spellingShingle	Qiqiang Chen Zhengyang Li Junru Yin Wei Huang Tianming Zhan Local-Global Feature Extraction Network With Dynamic 3-D Convolution and Residual Attention Transformer for Hyperspectral Image Classification IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Dynamic 3-D convolution dynamic local feature extraction hyperspectral image (HSI) classification residual attention transformer residual global feature extraction
title	Local-Global Feature Extraction Network With Dynamic 3-D Convolution and Residual Attention Transformer for Hyperspectral Image Classification
title_full	Local-Global Feature Extraction Network With Dynamic 3-D Convolution and Residual Attention Transformer for Hyperspectral Image Classification
title_fullStr	Local-Global Feature Extraction Network With Dynamic 3-D Convolution and Residual Attention Transformer for Hyperspectral Image Classification
title_full_unstemmed	Local-Global Feature Extraction Network With Dynamic 3-D Convolution and Residual Attention Transformer for Hyperspectral Image Classification
title_short	Local-Global Feature Extraction Network With Dynamic 3-D Convolution and Residual Attention Transformer for Hyperspectral Image Classification
title_sort	local global feature extraction network with dynamic 3 d convolution and residual attention transformer for hyperspectral image classification
topic	Dynamic 3-D convolution dynamic local feature extraction hyperspectral image (HSI) classification residual attention transformer residual global feature extraction
url	https://ieeexplore.ieee.org/document/10946673/
work_keys_str_mv	AT qiqiangchen localglobalfeatureextractionnetworkwithdynamic3dconvolutionandresidualattentiontransformerforhyperspectralimageclassification AT zhengyangli localglobalfeatureextractionnetworkwithdynamic3dconvolutionandresidualattentiontransformerforhyperspectralimageclassification AT junruyin localglobalfeatureextractionnetworkwithdynamic3dconvolutionandresidualattentiontransformerforhyperspectralimageclassification AT weihuang localglobalfeatureextractionnetworkwithdynamic3dconvolutionandresidualattentiontransformerforhyperspectralimageclassification AT tianmingzhan localglobalfeatureextractionnetworkwithdynamic3dconvolutionandresidualattentiontransformerforhyperspectralimageclassification

Local-Global Feature Extraction Network With Dynamic 3-D Convolution and Residual Attention Transformer for Hyperspectral Image Classification

Similar Items