Local-Global Feature Extraction Network With Dynamic 3-D Convolution and Residual Attention Transformer for Hyperspectral Image Classification
Currently, convolutional neural network (CNN) and transformer-based hyperspectral image (HSI) classification methods have attracted significant attention owing to their effective feature representation capabilities. However, methods based on CNN pay insufficient attention to valuable pixels in 3-D H...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10946673/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849240455211909120 |
|---|---|
| author | Qiqiang Chen Zhengyang Li Junru Yin Wei Huang Tianming Zhan |
| author_facet | Qiqiang Chen Zhengyang Li Junru Yin Wei Huang Tianming Zhan |
| author_sort | Qiqiang Chen |
| collection | DOAJ |
| description | Currently, convolutional neural network (CNN) and transformer-based hyperspectral image (HSI) classification methods have attracted significant attention owing to their effective feature representation capabilities. However, methods based on CNN pay insufficient attention to valuable pixels in 3-D HSI samples and cannot adapt to variations in these samples. Transformer-based methods also suffer from high computational complexity and a tendency for low-level spatial-spectral features of the shallow attention layer to vanish as the number of attention layers increases. To address these issues, we proposed a local–global feature extraction network with dynamic 3-D convolution and residual attention transformer (LGDRNet). The LGDRNet primarily consists of multiscale 3-D conv, dynamic local feature extraction, residual global feature extraction, and feature fusion modules. Specifically, a multiscale 3-D conv module is used for low-level multiscale spectral information extraction. Then, the dynamic local feature extraction module utilizes dynamic 3-D convolution, which can adapt to different samples. This allows the network to focus on valuable pixels in 3-D samples. The residual global feature extraction module utilizes a convolutional projection unit and convolutional multihead self-attention to reduce computational complexity. It employs a residual attention connection to enable the network to effectively transmit and accumulate attention information across consecutive multihead attention layers. This prevents the vanishing of shallow spatial-spectral features. Finally, local and global HSI information may be efficiently integrated using the feature fusion module, which also improves performance during subsequent classification. The proposed model achieves overall classification accuracies of 89.24%, 92.01%, and 94.53% on three benchmark datasets, respectively, outperforming state-of-the-art approaches with limited training samples. |
| format | Article |
| id | doaj-art-9afad653585b46369ffcd02af0b4deb0 |
| institution | Kabale University |
| issn | 1939-1404 2151-1535 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
| spelling | doaj-art-9afad653585b46369ffcd02af0b4deb02025-08-20T04:00:34ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-011899861000110.1109/JSTARS.2025.355672210946673Local-Global Feature Extraction Network With Dynamic 3-D Convolution and Residual Attention Transformer for Hyperspectral Image ClassificationQiqiang Chen0https://orcid.org/0009-0005-5965-7853Zhengyang Li1Junru Yin2https://orcid.org/0000-0002-7101-1140Wei Huang3https://orcid.org/0000-0002-0095-1354Tianming Zhan4https://orcid.org/0000-0001-5030-3032School of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou, ChinaSchool of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou, ChinaSchool of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou, ChinaSchool of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou, ChinaJiangsu Modern Intelligent Audit Integrated Application Technology Engineering Research Center, School of Computer Science, Nanjing Audit University, Nanjing, ChinaCurrently, convolutional neural network (CNN) and transformer-based hyperspectral image (HSI) classification methods have attracted significant attention owing to their effective feature representation capabilities. However, methods based on CNN pay insufficient attention to valuable pixels in 3-D HSI samples and cannot adapt to variations in these samples. Transformer-based methods also suffer from high computational complexity and a tendency for low-level spatial-spectral features of the shallow attention layer to vanish as the number of attention layers increases. To address these issues, we proposed a local–global feature extraction network with dynamic 3-D convolution and residual attention transformer (LGDRNet). The LGDRNet primarily consists of multiscale 3-D conv, dynamic local feature extraction, residual global feature extraction, and feature fusion modules. Specifically, a multiscale 3-D conv module is used for low-level multiscale spectral information extraction. Then, the dynamic local feature extraction module utilizes dynamic 3-D convolution, which can adapt to different samples. This allows the network to focus on valuable pixels in 3-D samples. The residual global feature extraction module utilizes a convolutional projection unit and convolutional multihead self-attention to reduce computational complexity. It employs a residual attention connection to enable the network to effectively transmit and accumulate attention information across consecutive multihead attention layers. This prevents the vanishing of shallow spatial-spectral features. Finally, local and global HSI information may be efficiently integrated using the feature fusion module, which also improves performance during subsequent classification. The proposed model achieves overall classification accuracies of 89.24%, 92.01%, and 94.53% on three benchmark datasets, respectively, outperforming state-of-the-art approaches with limited training samples.https://ieeexplore.ieee.org/document/10946673/Dynamic 3-D convolutiondynamic local feature extractionhyperspectral image (HSI) classificationresidual attention transformerresidual global feature extraction |
| spellingShingle | Qiqiang Chen Zhengyang Li Junru Yin Wei Huang Tianming Zhan Local-Global Feature Extraction Network With Dynamic 3-D Convolution and Residual Attention Transformer for Hyperspectral Image Classification IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Dynamic 3-D convolution dynamic local feature extraction hyperspectral image (HSI) classification residual attention transformer residual global feature extraction |
| title | Local-Global Feature Extraction Network With Dynamic 3-D Convolution and Residual Attention Transformer for Hyperspectral Image Classification |
| title_full | Local-Global Feature Extraction Network With Dynamic 3-D Convolution and Residual Attention Transformer for Hyperspectral Image Classification |
| title_fullStr | Local-Global Feature Extraction Network With Dynamic 3-D Convolution and Residual Attention Transformer for Hyperspectral Image Classification |
| title_full_unstemmed | Local-Global Feature Extraction Network With Dynamic 3-D Convolution and Residual Attention Transformer for Hyperspectral Image Classification |
| title_short | Local-Global Feature Extraction Network With Dynamic 3-D Convolution and Residual Attention Transformer for Hyperspectral Image Classification |
| title_sort | local global feature extraction network with dynamic 3 d convolution and residual attention transformer for hyperspectral image classification |
| topic | Dynamic 3-D convolution dynamic local feature extraction hyperspectral image (HSI) classification residual attention transformer residual global feature extraction |
| url | https://ieeexplore.ieee.org/document/10946673/ |
| work_keys_str_mv | AT qiqiangchen localglobalfeatureextractionnetworkwithdynamic3dconvolutionandresidualattentiontransformerforhyperspectralimageclassification AT zhengyangli localglobalfeatureextractionnetworkwithdynamic3dconvolutionandresidualattentiontransformerforhyperspectralimageclassification AT junruyin localglobalfeatureextractionnetworkwithdynamic3dconvolutionandresidualattentiontransformerforhyperspectralimageclassification AT weihuang localglobalfeatureextractionnetworkwithdynamic3dconvolutionandresidualattentiontransformerforhyperspectralimageclassification AT tianmingzhan localglobalfeatureextractionnetworkwithdynamic3dconvolutionandresidualattentiontransformerforhyperspectralimageclassification |