Spatial–Spectral Hierarchical Multiscale Transformer-Based Masked Autoencoder for Hyperspectral Image Classification
Due to the excellent feature extraction capabilities, deep learning has become the mainstream method for hyperspectral image (HSI) classification. Transformer, with its powerful long-range relationship modeling ability, has become a popular model; however, it usually requires a large number of label...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11005553/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850153271471112192 |
|---|---|
| author | Haipeng Liu Zhen Ye Wen-Shuai Hu Zhan Cao Wei Li |
| author_facet | Haipeng Liu Zhen Ye Wen-Shuai Hu Zhan Cao Wei Li |
| author_sort | Haipeng Liu |
| collection | DOAJ |
| description | Due to the excellent feature extraction capabilities, deep learning has become the mainstream method for hyperspectral image (HSI) classification. Transformer, with its powerful long-range relationship modeling ability, has become a popular model; however, it usually requires a large number of labeled data for parameter training, which may be costly and impractical for HSI classification. As such, based on the self-supervised learning, this article proposes a spatial–spectral hierarchical multiscale transformer-based masked autoencoder (SSHMT-MAE) for HSI classification. First, after the spatial–spectral feature embedding with a spatial–spectral feature extraction module, to solve the increased computational complexity caused by filling invisible patches in traditional masked autoencoder (MAE), the grouped window attention module is introduced to process only the visible patches of HSIs during spatial–spectral reconstruction, avoiding unnecessary computations for masked ones. After that, a spatial–spectral hierarchical transformer is designed to build a hierarchical MAE structure, followed by a cross-feature fusion module to extract the multiscale spatial–spectral fusion features. It can not only assist the whole model in learning the fine-grained local spatial–spectral features within each local region but also capture the long-range dependencies between different regions, generating rich multiscale spatial–spectral features with high-level semantic and low-level detail information for HSI classification. Extensive experiments are conducted on the five public HSI datasets, evaluating the superiority of the proposed SSHMT-MAE model over several state-of-the-art methods. |
| format | Article |
| id | doaj-art-840ac260ce1e4645b3d9a740a3e1cbfa |
| institution | OA Journals |
| issn | 1939-1404 2151-1535 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
| spelling | doaj-art-840ac260ce1e4645b3d9a740a3e1cbfa2025-08-20T02:25:45ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-0118121501216510.1109/JSTARS.2025.356565211005553Spatial–Spectral Hierarchical Multiscale Transformer-Based Masked Autoencoder for Hyperspectral Image ClassificationHaipeng Liu0Zhen Ye1https://orcid.org/0000-0001-5410-863XWen-Shuai Hu2https://orcid.org/0000-0002-4757-2765Zhan Cao3Wei Li4https://orcid.org/0000-0001-7015-7335School of Electronics and Control Engineering, Chang'an University, Xi'an, ChinaSchool of Electronics and Control Engineering, Chang'an University, Xi'an, ChinaSchool of Information and Electronics, Beijing Institute of Technology, Beijing, ChinaSchool of Electronics and Control Engineering, Chang'an University, Xi'an, ChinaSchool of Information and Electronics, Beijing Institute of Technology, Beijing, ChinaDue to the excellent feature extraction capabilities, deep learning has become the mainstream method for hyperspectral image (HSI) classification. Transformer, with its powerful long-range relationship modeling ability, has become a popular model; however, it usually requires a large number of labeled data for parameter training, which may be costly and impractical for HSI classification. As such, based on the self-supervised learning, this article proposes a spatial–spectral hierarchical multiscale transformer-based masked autoencoder (SSHMT-MAE) for HSI classification. First, after the spatial–spectral feature embedding with a spatial–spectral feature extraction module, to solve the increased computational complexity caused by filling invisible patches in traditional masked autoencoder (MAE), the grouped window attention module is introduced to process only the visible patches of HSIs during spatial–spectral reconstruction, avoiding unnecessary computations for masked ones. After that, a spatial–spectral hierarchical transformer is designed to build a hierarchical MAE structure, followed by a cross-feature fusion module to extract the multiscale spatial–spectral fusion features. It can not only assist the whole model in learning the fine-grained local spatial–spectral features within each local region but also capture the long-range dependencies between different regions, generating rich multiscale spatial–spectral features with high-level semantic and low-level detail information for HSI classification. Extensive experiments are conducted on the five public HSI datasets, evaluating the superiority of the proposed SSHMT-MAE model over several state-of-the-art methods.https://ieeexplore.ieee.org/document/11005553/Feature fusiongrouped window attentionhierarchical vision transformer (HViT)hyperspectral image (HSI) classificationmasked autoencoder (MAE)multiscale learning |
| spellingShingle | Haipeng Liu Zhen Ye Wen-Shuai Hu Zhan Cao Wei Li Spatial–Spectral Hierarchical Multiscale Transformer-Based Masked Autoencoder for Hyperspectral Image Classification IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Feature fusion grouped window attention hierarchical vision transformer (HViT) hyperspectral image (HSI) classification masked autoencoder (MAE) multiscale learning |
| title | Spatial–Spectral Hierarchical Multiscale Transformer-Based Masked Autoencoder for Hyperspectral Image Classification |
| title_full | Spatial–Spectral Hierarchical Multiscale Transformer-Based Masked Autoencoder for Hyperspectral Image Classification |
| title_fullStr | Spatial–Spectral Hierarchical Multiscale Transformer-Based Masked Autoencoder for Hyperspectral Image Classification |
| title_full_unstemmed | Spatial–Spectral Hierarchical Multiscale Transformer-Based Masked Autoencoder for Hyperspectral Image Classification |
| title_short | Spatial–Spectral Hierarchical Multiscale Transformer-Based Masked Autoencoder for Hyperspectral Image Classification |
| title_sort | spatial x2013 spectral hierarchical multiscale transformer based masked autoencoder for hyperspectral image classification |
| topic | Feature fusion grouped window attention hierarchical vision transformer (HViT) hyperspectral image (HSI) classification masked autoencoder (MAE) multiscale learning |
| url | https://ieeexplore.ieee.org/document/11005553/ |
| work_keys_str_mv | AT haipengliu spatialx2013spectralhierarchicalmultiscaletransformerbasedmaskedautoencoderforhyperspectralimageclassification AT zhenye spatialx2013spectralhierarchicalmultiscaletransformerbasedmaskedautoencoderforhyperspectralimageclassification AT wenshuaihu spatialx2013spectralhierarchicalmultiscaletransformerbasedmaskedautoencoderforhyperspectralimageclassification AT zhancao spatialx2013spectralhierarchicalmultiscaletransformerbasedmaskedautoencoderforhyperspectralimageclassification AT weili spatialx2013spectralhierarchicalmultiscaletransformerbasedmaskedautoencoderforhyperspectralimageclassification |