Integrating Multiscale Spatial–Spectral Shuffling Convolution With 3-D Lightweight Transformer for Hyperspectral Image Classification

The combination of convolutional neural networks and vision transformers has garnered considerable attention in hyperspectral image (HSI) classification due to their abilities to enhance the classification accuracy by concurrently extracting local and global features. However, these accuracy improve...

Full description

Saved in:
Bibliographic Details
Main Authors: Qinggang Wu, Mengkun He, Qiqiang Chen, Le Sun, Chao Ma
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10850760/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The combination of convolutional neural networks and vision transformers has garnered considerable attention in hyperspectral image (HSI) classification due to their abilities to enhance the classification accuracy by concurrently extracting local and global features. However, these accuracy improvements come at the cost of significant demands on storage resources, computational overhead, and extensive training samples. To address these challenges, this article proposes a multiscale spatial–spectral shuffling convolution integrated with a 3-D lightweight transformer (MSC-3DLT) for HSI classification. This network directly captures 3-D structural features throughout the entire feature extraction process, thereby enhancing HSI classification performance even at small sampling rates within a lightweight framework. Specifically, we first design a multiscale spatial–spectral shuffling convolution to comprehensively refine spatial–spectral feature granularities and enhance feature interactions by shuffling multiscale features across different groups. Second, to maximize the exploitation of limited training samples, we rethink transformers from the 3-D structural perspective of HSI data and propose a novel 3-D lightweight transformer (3DLT). Different from the slicing operation employed in classical transformers, the 3DLT directly extracts the inherent 3-D structural features from the HSI and mitigates quadratic complexity through a lightweight spatial–spectral pooling cross-attention mechanism. Finally, a novel training strategy is designed to adaptively adjust the learning rate based on multimetric feedback during the model training process, significantly accelerating the model fitting speed. Extensive experiments demonstrate that the proposed MSC-3DLT method remains highly competitive compared with state-of-the-art methods in terms of classification accuracy, model parameters, and floating point and operations under small sampling rates.
ISSN:1939-1404
2151-1535