Self- and Cross-Attention Enhanced Transformer for Visible and Thermal Infrared Hyperspectral Image Classification

Visible hyperspectral image (V-HSI) and thermal infrared hyperspectral image (TI-HSI) have been crucial data sources for land cover classification. V-HSI can directly provide information of land surface, such as shape, color, texture, and other features. TI-HSI contains rich long-wave spectral infor...

Full description

Saved in:
Bibliographic Details
Main Authors: Enyu Zhao, Yongfang Su, Nianxin Qu, Yulei Wang, Caixia Gao, Jian Zeng
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11006409/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850131765031600128
author Enyu Zhao
Yongfang Su
Nianxin Qu
Yulei Wang
Caixia Gao
Jian Zeng
author_facet Enyu Zhao
Yongfang Su
Nianxin Qu
Yulei Wang
Caixia Gao
Jian Zeng
author_sort Enyu Zhao
collection DOAJ
description Visible hyperspectral image (V-HSI) and thermal infrared hyperspectral image (TI-HSI) have been crucial data sources for land cover classification. V-HSI can directly provide information of land surface, such as shape, color, texture, and other features. TI-HSI contains rich long-wave spectral information, which can reflect the unique emission characteristics of ground objects in the thermal infrared spectral range. To fully leverage the advantages of V-HSI and TI-HSI while enhancing the classification accuracy, this article proposes a self- and cross-attention enhanced transformer network (SCAET), integrated with convolutional neural network (CNN) for HSI classification. Initially, the proposed method employs a dual-branch spatial-spectral CNN (SS CNN) to extract spectral convolution features from V-HSI and TI-HSI, respectively. Subsequently, a spectral feature mapping (SFM) module is proposed to perform feature transformation, extracting independent and interactive features of V-HSI and TI-HSI. Then, a self- and cross-attention interactive enhancement module is designed to extract deeper features and enhance the independent features by using the interactive features. In addition, a self-projection mixing module is formulated to promote feature interaction and improve the generalization capability of the model. To validate the effectiveness of the proposed network, extensive experiments are conducted on real-world datasets, and the results indicate that SCAET significantly outperforms current multisource fusion networks.
format Article
id doaj-art-4a80bc6b437d4bb2adf48f279c86f25d
institution OA Journals
issn 1939-1404
2151-1535
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling doaj-art-4a80bc6b437d4bb2adf48f279c86f25d2025-08-20T02:32:22ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-0118134081342210.1109/JSTARS.2025.357122611006409Self- and Cross-Attention Enhanced Transformer for Visible and Thermal Infrared Hyperspectral Image ClassificationEnyu Zhao0https://orcid.org/0000-0001-7165-1861Yongfang Su1Nianxin Qu2Yulei Wang3https://orcid.org/0000-0001-6436-5883Caixia Gao4https://orcid.org/0000-0003-1571-7381Jian Zeng5https://orcid.org/0000-0002-4106-417XCenter for Hyperspectral Imaging in Remote Sensing (CHIRS), Information Science and Technology College, Dalian Maritime University, Dalian, ChinaCenter for Hyperspectral Imaging in Remote Sensing (CHIRS), Information Science and Technology College, Dalian Maritime University, Dalian, ChinaCenter for Hyperspectral Imaging in Remote Sensing (CHIRS), Information Science and Technology College, Dalian Maritime University, Dalian, ChinaCenter for Hyperspectral Imaging in Remote Sensing (CHIRS), Information Science and Technology College, Dalian Maritime University, Dalian, ChinaKey Laboratory of Quantitative Remote Sensing Information Technology, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaChina Centre for Resources Satellite Data and Application, Beijing, ChinaVisible hyperspectral image (V-HSI) and thermal infrared hyperspectral image (TI-HSI) have been crucial data sources for land cover classification. V-HSI can directly provide information of land surface, such as shape, color, texture, and other features. TI-HSI contains rich long-wave spectral information, which can reflect the unique emission characteristics of ground objects in the thermal infrared spectral range. To fully leverage the advantages of V-HSI and TI-HSI while enhancing the classification accuracy, this article proposes a self- and cross-attention enhanced transformer network (SCAET), integrated with convolutional neural network (CNN) for HSI classification. Initially, the proposed method employs a dual-branch spatial-spectral CNN (SS CNN) to extract spectral convolution features from V-HSI and TI-HSI, respectively. Subsequently, a spectral feature mapping (SFM) module is proposed to perform feature transformation, extracting independent and interactive features of V-HSI and TI-HSI. Then, a self- and cross-attention interactive enhancement module is designed to extract deeper features and enhance the independent features by using the interactive features. In addition, a self-projection mixing module is formulated to promote feature interaction and improve the generalization capability of the model. To validate the effectiveness of the proposed network, extensive experiments are conducted on real-world datasets, and the results indicate that SCAET significantly outperforms current multisource fusion networks.https://ieeexplore.ieee.org/document/11006409/Convolutional neural network (CNN)image classificationthermal infrared hyperspectral image (TI-HSI)transformervisible hyperspectral image (V-HSI)
spellingShingle Enyu Zhao
Yongfang Su
Nianxin Qu
Yulei Wang
Caixia Gao
Jian Zeng
Self- and Cross-Attention Enhanced Transformer for Visible and Thermal Infrared Hyperspectral Image Classification
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Convolutional neural network (CNN)
image classification
thermal infrared hyperspectral image (TI-HSI)
transformer
visible hyperspectral image (V-HSI)
title Self- and Cross-Attention Enhanced Transformer for Visible and Thermal Infrared Hyperspectral Image Classification
title_full Self- and Cross-Attention Enhanced Transformer for Visible and Thermal Infrared Hyperspectral Image Classification
title_fullStr Self- and Cross-Attention Enhanced Transformer for Visible and Thermal Infrared Hyperspectral Image Classification
title_full_unstemmed Self- and Cross-Attention Enhanced Transformer for Visible and Thermal Infrared Hyperspectral Image Classification
title_short Self- and Cross-Attention Enhanced Transformer for Visible and Thermal Infrared Hyperspectral Image Classification
title_sort self and cross attention enhanced transformer for visible and thermal infrared hyperspectral image classification
topic Convolutional neural network (CNN)
image classification
thermal infrared hyperspectral image (TI-HSI)
transformer
visible hyperspectral image (V-HSI)
url https://ieeexplore.ieee.org/document/11006409/
work_keys_str_mv AT enyuzhao selfandcrossattentionenhancedtransformerforvisibleandthermalinfraredhyperspectralimageclassification
AT yongfangsu selfandcrossattentionenhancedtransformerforvisibleandthermalinfraredhyperspectralimageclassification
AT nianxinqu selfandcrossattentionenhancedtransformerforvisibleandthermalinfraredhyperspectralimageclassification
AT yuleiwang selfandcrossattentionenhancedtransformerforvisibleandthermalinfraredhyperspectralimageclassification
AT caixiagao selfandcrossattentionenhancedtransformerforvisibleandthermalinfraredhyperspectralimageclassification
AT jianzeng selfandcrossattentionenhancedtransformerforvisibleandthermalinfraredhyperspectralimageclassification