Feature-Level Fusion Network for Hyperspectral Object Tracking via Mixed Multi-Head Self-Attention Learning

Hyperspectral object tracking has emerged as a promising task in visual object tracking. The rich spectral information within hyperspectral images benefits the accurate tracking in challenging scenarios. The performances of existing hyperspectral object tracking networks are constrained by neglectin...

Full description

Saved in:
Bibliographic Details
Main Authors: Long Gao, Langkun Chen, Yan Jiang, Bobo Xi, Weiying Xie, Yunsong Li
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/17/6/997
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850088072314617856
author Long Gao
Langkun Chen
Yan Jiang
Bobo Xi
Weiying Xie
Yunsong Li
author_facet Long Gao
Langkun Chen
Yan Jiang
Bobo Xi
Weiying Xie
Yunsong Li
author_sort Long Gao
collection DOAJ
description Hyperspectral object tracking has emerged as a promising task in visual object tracking. The rich spectral information within hyperspectral images benefits the accurate tracking in challenging scenarios. The performances of existing hyperspectral object tracking networks are constrained by neglecting the interactive information among bands within hyperspectral images. Moreover, designing an accurate deep learning-based algorithm for hyperspectral object tracking poses challenges because of the substantial amount of training data required. In order to address these challenges, a new mixed multi-head attention-based feature fusion tracking (MMFT) algorithm for hyperspectral videos is proposed. Firstly, MMFT introduces a feature-level fusion module, mixed multi-head attention feature fusion (MMFF), which fuses false-color features and augments the fused feature with one mixed multi-head attention (MMA) block with interactive information, which increases the representational ability of the features for tracking. Specifically, MMA learns the interactive information across the bands in the false-color images and incorporates the learned interactive information into the fused feature, which is obtained by combining the features of the false-color images. Secondly, a new training procedure is introduced, in which the modules designed for hyperspectral object tracking are first pre-trained on a sufficient amount of modified RGB data to enhance generalization, and then fine-tuned on a limited amount of HS data for task adaption. Extensive experiments verify the effectiveness of MMFT, demonstrating its SOTA performance.
format Article
id doaj-art-2c2b72eb3e58422e8c7bedc7ba0c0409
institution DOAJ
issn 2072-4292
language English
publishDate 2025-03-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj-art-2c2b72eb3e58422e8c7bedc7ba0c04092025-08-20T02:43:06ZengMDPI AGRemote Sensing2072-42922025-03-0117699710.3390/rs17060997Feature-Level Fusion Network for Hyperspectral Object Tracking via Mixed Multi-Head Self-Attention LearningLong Gao0Langkun Chen1Yan Jiang2Bobo Xi3Weiying Xie4Yunsong Li5The State Key Laboratory of Integrated Service Networks, School of Telecommunications Engineering, Xidian University, Xi’an 710071, ChinaThe State Key Laboratory of Integrated Service Networks, School of Telecommunications Engineering, Xidian University, Xi’an 710071, ChinaThe Department of Electronic and Electrical Engineering, The University of Sheffield, Sheffield S10 2TN, UKThe State Key Laboratory of Integrated Service Networks, School of Telecommunications Engineering, Xidian University, Xi’an 710071, ChinaThe State Key Laboratory of Integrated Service Networks, School of Telecommunications Engineering, Xidian University, Xi’an 710071, ChinaThe State Key Laboratory of Integrated Service Networks, School of Telecommunications Engineering, Xidian University, Xi’an 710071, ChinaHyperspectral object tracking has emerged as a promising task in visual object tracking. The rich spectral information within hyperspectral images benefits the accurate tracking in challenging scenarios. The performances of existing hyperspectral object tracking networks are constrained by neglecting the interactive information among bands within hyperspectral images. Moreover, designing an accurate deep learning-based algorithm for hyperspectral object tracking poses challenges because of the substantial amount of training data required. In order to address these challenges, a new mixed multi-head attention-based feature fusion tracking (MMFT) algorithm for hyperspectral videos is proposed. Firstly, MMFT introduces a feature-level fusion module, mixed multi-head attention feature fusion (MMFF), which fuses false-color features and augments the fused feature with one mixed multi-head attention (MMA) block with interactive information, which increases the representational ability of the features for tracking. Specifically, MMA learns the interactive information across the bands in the false-color images and incorporates the learned interactive information into the fused feature, which is obtained by combining the features of the false-color images. Secondly, a new training procedure is introduced, in which the modules designed for hyperspectral object tracking are first pre-trained on a sufficient amount of modified RGB data to enhance generalization, and then fine-tuned on a limited amount of HS data for task adaption. Extensive experiments verify the effectiveness of MMFT, demonstrating its SOTA performance.https://www.mdpi.com/2072-4292/17/6/997feature fusionmixed multi-head attentionTransformerhyperspectral object tracking
spellingShingle Long Gao
Langkun Chen
Yan Jiang
Bobo Xi
Weiying Xie
Yunsong Li
Feature-Level Fusion Network for Hyperspectral Object Tracking via Mixed Multi-Head Self-Attention Learning
Remote Sensing
feature fusion
mixed multi-head attention
Transformer
hyperspectral object tracking
title Feature-Level Fusion Network for Hyperspectral Object Tracking via Mixed Multi-Head Self-Attention Learning
title_full Feature-Level Fusion Network for Hyperspectral Object Tracking via Mixed Multi-Head Self-Attention Learning
title_fullStr Feature-Level Fusion Network for Hyperspectral Object Tracking via Mixed Multi-Head Self-Attention Learning
title_full_unstemmed Feature-Level Fusion Network for Hyperspectral Object Tracking via Mixed Multi-Head Self-Attention Learning
title_short Feature-Level Fusion Network for Hyperspectral Object Tracking via Mixed Multi-Head Self-Attention Learning
title_sort feature level fusion network for hyperspectral object tracking via mixed multi head self attention learning
topic feature fusion
mixed multi-head attention
Transformer
hyperspectral object tracking
url https://www.mdpi.com/2072-4292/17/6/997
work_keys_str_mv AT longgao featurelevelfusionnetworkforhyperspectralobjecttrackingviamixedmultiheadselfattentionlearning
AT langkunchen featurelevelfusionnetworkforhyperspectralobjecttrackingviamixedmultiheadselfattentionlearning
AT yanjiang featurelevelfusionnetworkforhyperspectralobjecttrackingviamixedmultiheadselfattentionlearning
AT boboxi featurelevelfusionnetworkforhyperspectralobjecttrackingviamixedmultiheadselfattentionlearning
AT weiyingxie featurelevelfusionnetworkforhyperspectralobjecttrackingviamixedmultiheadselfattentionlearning
AT yunsongli featurelevelfusionnetworkforhyperspectralobjecttrackingviamixedmultiheadselfattentionlearning