Self- and Cross-Attention Enhanced Transformer for Visible and Thermal Infrared Hyperspectral Image Classification
Visible hyperspectral image (V-HSI) and thermal infrared hyperspectral image (TI-HSI) have been crucial data sources for land cover classification. V-HSI can directly provide information of land surface, such as shape, color, texture, and other features. TI-HSI contains rich long-wave spectral infor...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11006409/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850131765031600128 |
|---|---|
| author | Enyu Zhao Yongfang Su Nianxin Qu Yulei Wang Caixia Gao Jian Zeng |
| author_facet | Enyu Zhao Yongfang Su Nianxin Qu Yulei Wang Caixia Gao Jian Zeng |
| author_sort | Enyu Zhao |
| collection | DOAJ |
| description | Visible hyperspectral image (V-HSI) and thermal infrared hyperspectral image (TI-HSI) have been crucial data sources for land cover classification. V-HSI can directly provide information of land surface, such as shape, color, texture, and other features. TI-HSI contains rich long-wave spectral information, which can reflect the unique emission characteristics of ground objects in the thermal infrared spectral range. To fully leverage the advantages of V-HSI and TI-HSI while enhancing the classification accuracy, this article proposes a self- and cross-attention enhanced transformer network (SCAET), integrated with convolutional neural network (CNN) for HSI classification. Initially, the proposed method employs a dual-branch spatial-spectral CNN (SS CNN) to extract spectral convolution features from V-HSI and TI-HSI, respectively. Subsequently, a spectral feature mapping (SFM) module is proposed to perform feature transformation, extracting independent and interactive features of V-HSI and TI-HSI. Then, a self- and cross-attention interactive enhancement module is designed to extract deeper features and enhance the independent features by using the interactive features. In addition, a self-projection mixing module is formulated to promote feature interaction and improve the generalization capability of the model. To validate the effectiveness of the proposed network, extensive experiments are conducted on real-world datasets, and the results indicate that SCAET significantly outperforms current multisource fusion networks. |
| format | Article |
| id | doaj-art-4a80bc6b437d4bb2adf48f279c86f25d |
| institution | OA Journals |
| issn | 1939-1404 2151-1535 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
| spelling | doaj-art-4a80bc6b437d4bb2adf48f279c86f25d2025-08-20T02:32:22ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-0118134081342210.1109/JSTARS.2025.357122611006409Self- and Cross-Attention Enhanced Transformer for Visible and Thermal Infrared Hyperspectral Image ClassificationEnyu Zhao0https://orcid.org/0000-0001-7165-1861Yongfang Su1Nianxin Qu2Yulei Wang3https://orcid.org/0000-0001-6436-5883Caixia Gao4https://orcid.org/0000-0003-1571-7381Jian Zeng5https://orcid.org/0000-0002-4106-417XCenter for Hyperspectral Imaging in Remote Sensing (CHIRS), Information Science and Technology College, Dalian Maritime University, Dalian, ChinaCenter for Hyperspectral Imaging in Remote Sensing (CHIRS), Information Science and Technology College, Dalian Maritime University, Dalian, ChinaCenter for Hyperspectral Imaging in Remote Sensing (CHIRS), Information Science and Technology College, Dalian Maritime University, Dalian, ChinaCenter for Hyperspectral Imaging in Remote Sensing (CHIRS), Information Science and Technology College, Dalian Maritime University, Dalian, ChinaKey Laboratory of Quantitative Remote Sensing Information Technology, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaChina Centre for Resources Satellite Data and Application, Beijing, ChinaVisible hyperspectral image (V-HSI) and thermal infrared hyperspectral image (TI-HSI) have been crucial data sources for land cover classification. V-HSI can directly provide information of land surface, such as shape, color, texture, and other features. TI-HSI contains rich long-wave spectral information, which can reflect the unique emission characteristics of ground objects in the thermal infrared spectral range. To fully leverage the advantages of V-HSI and TI-HSI while enhancing the classification accuracy, this article proposes a self- and cross-attention enhanced transformer network (SCAET), integrated with convolutional neural network (CNN) for HSI classification. Initially, the proposed method employs a dual-branch spatial-spectral CNN (SS CNN) to extract spectral convolution features from V-HSI and TI-HSI, respectively. Subsequently, a spectral feature mapping (SFM) module is proposed to perform feature transformation, extracting independent and interactive features of V-HSI and TI-HSI. Then, a self- and cross-attention interactive enhancement module is designed to extract deeper features and enhance the independent features by using the interactive features. In addition, a self-projection mixing module is formulated to promote feature interaction and improve the generalization capability of the model. To validate the effectiveness of the proposed network, extensive experiments are conducted on real-world datasets, and the results indicate that SCAET significantly outperforms current multisource fusion networks.https://ieeexplore.ieee.org/document/11006409/Convolutional neural network (CNN)image classificationthermal infrared hyperspectral image (TI-HSI)transformervisible hyperspectral image (V-HSI) |
| spellingShingle | Enyu Zhao Yongfang Su Nianxin Qu Yulei Wang Caixia Gao Jian Zeng Self- and Cross-Attention Enhanced Transformer for Visible and Thermal Infrared Hyperspectral Image Classification IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Convolutional neural network (CNN) image classification thermal infrared hyperspectral image (TI-HSI) transformer visible hyperspectral image (V-HSI) |
| title | Self- and Cross-Attention Enhanced Transformer for Visible and Thermal Infrared Hyperspectral Image Classification |
| title_full | Self- and Cross-Attention Enhanced Transformer for Visible and Thermal Infrared Hyperspectral Image Classification |
| title_fullStr | Self- and Cross-Attention Enhanced Transformer for Visible and Thermal Infrared Hyperspectral Image Classification |
| title_full_unstemmed | Self- and Cross-Attention Enhanced Transformer for Visible and Thermal Infrared Hyperspectral Image Classification |
| title_short | Self- and Cross-Attention Enhanced Transformer for Visible and Thermal Infrared Hyperspectral Image Classification |
| title_sort | self and cross attention enhanced transformer for visible and thermal infrared hyperspectral image classification |
| topic | Convolutional neural network (CNN) image classification thermal infrared hyperspectral image (TI-HSI) transformer visible hyperspectral image (V-HSI) |
| url | https://ieeexplore.ieee.org/document/11006409/ |
| work_keys_str_mv | AT enyuzhao selfandcrossattentionenhancedtransformerforvisibleandthermalinfraredhyperspectralimageclassification AT yongfangsu selfandcrossattentionenhancedtransformerforvisibleandthermalinfraredhyperspectralimageclassification AT nianxinqu selfandcrossattentionenhancedtransformerforvisibleandthermalinfraredhyperspectralimageclassification AT yuleiwang selfandcrossattentionenhancedtransformerforvisibleandthermalinfraredhyperspectralimageclassification AT caixiagao selfandcrossattentionenhancedtransformerforvisibleandthermalinfraredhyperspectralimageclassification AT jianzeng selfandcrossattentionenhancedtransformerforvisibleandthermalinfraredhyperspectralimageclassification |