High-Order Temporal Context-Aware Aerial Tracking with Heterogeneous Visual Experts

Visual tracking from the unmanned aerial vehicle (UAV) perspective has been at the core of many low-altitude remote sensing applications. Most of the aerial trackers follow “tracking-by-detection” paradigms or their temporal-context-embedded variants, where the only visual appearance cue is encompas...

Full description

Saved in:
Bibliographic Details
Main Authors: Shichao Zhou, Xiangpan Fan, Zhuowei Wang, Wenzheng Wang, Yunpu Zhang
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/17/13/2237
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850115543228481536
author Shichao Zhou
Xiangpan Fan
Zhuowei Wang
Wenzheng Wang
Yunpu Zhang
author_facet Shichao Zhou
Xiangpan Fan
Zhuowei Wang
Wenzheng Wang
Yunpu Zhang
author_sort Shichao Zhou
collection DOAJ
description Visual tracking from the unmanned aerial vehicle (UAV) perspective has been at the core of many low-altitude remote sensing applications. Most of the aerial trackers follow “tracking-by-detection” paradigms or their temporal-context-embedded variants, where the only visual appearance cue is encompassed for representation learning and estimating the spatial likelihood of the target. However, the variation of the target appearance among consecutive frames is inherently unpredictable, which degrades the robustness of the temporal context-aware representation. To address this concern, we advocate extra visual motion exhibiting predictable temporal continuity for complete temporal context-aware representation and introduce a dual-stream tracker involving explicit heterogeneous visual tracking experts. Our technical contributions involve three-folds: (1) high-order temporal context-aware representation integrates motion and appearance cues over a temporal context queue, (2) bidirectional cross-domain refinement enhances feature representation through cross-attention based mutual guidance, and (3) consistent decision-making allows for anti-drifting localization via dynamic gating and failure-aware recovery. Extensive experiments on four UAV benchmarks (UAV123, UAV123@10fps, UAV20L, and DTB70) illustrate that our method outperforms existing aerial trackers in terms of success rate and precision, particularly in occlusion and fast motion scenarios. Such superior tracking stability highlights its potential for real-world UAV applications.
format Article
id doaj-art-e838aaefca6d455a8844dce602792257
institution OA Journals
issn 2072-4292
language English
publishDate 2025-06-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj-art-e838aaefca6d455a8844dce6027922572025-08-20T02:36:33ZengMDPI AGRemote Sensing2072-42922025-06-011713223710.3390/rs17132237High-Order Temporal Context-Aware Aerial Tracking with Heterogeneous Visual ExpertsShichao Zhou0Xiangpan Fan1Zhuowei Wang2Wenzheng Wang3Yunpu Zhang4School of Information Communication Engineering, Beijing Information Science and Technology University, Beijing 100192, ChinaSchool of Information Communication Engineering, Beijing Information Science and Technology University, Beijing 100192, ChinaSchool of Information Communication Engineering, Beijing Information Science and Technology University, Beijing 100192, ChinaSchool of Information and Electronics, Beijing Institute of Technology, Beijing 100081, ChinaSchool of Information Communication Engineering, Beijing Information Science and Technology University, Beijing 100192, ChinaVisual tracking from the unmanned aerial vehicle (UAV) perspective has been at the core of many low-altitude remote sensing applications. Most of the aerial trackers follow “tracking-by-detection” paradigms or their temporal-context-embedded variants, where the only visual appearance cue is encompassed for representation learning and estimating the spatial likelihood of the target. However, the variation of the target appearance among consecutive frames is inherently unpredictable, which degrades the robustness of the temporal context-aware representation. To address this concern, we advocate extra visual motion exhibiting predictable temporal continuity for complete temporal context-aware representation and introduce a dual-stream tracker involving explicit heterogeneous visual tracking experts. Our technical contributions involve three-folds: (1) high-order temporal context-aware representation integrates motion and appearance cues over a temporal context queue, (2) bidirectional cross-domain refinement enhances feature representation through cross-attention based mutual guidance, and (3) consistent decision-making allows for anti-drifting localization via dynamic gating and failure-aware recovery. Extensive experiments on four UAV benchmarks (UAV123, UAV123@10fps, UAV20L, and DTB70) illustrate that our method outperforms existing aerial trackers in terms of success rate and precision, particularly in occlusion and fast motion scenarios. Such superior tracking stability highlights its potential for real-world UAV applications.https://www.mdpi.com/2072-4292/17/13/2237unmanned aerial vehiclelow-altitude remote sensingoptical trackingtemporal reasoningmotion analysisdecision-making
spellingShingle Shichao Zhou
Xiangpan Fan
Zhuowei Wang
Wenzheng Wang
Yunpu Zhang
High-Order Temporal Context-Aware Aerial Tracking with Heterogeneous Visual Experts
Remote Sensing
unmanned aerial vehicle
low-altitude remote sensing
optical tracking
temporal reasoning
motion analysis
decision-making
title High-Order Temporal Context-Aware Aerial Tracking with Heterogeneous Visual Experts
title_full High-Order Temporal Context-Aware Aerial Tracking with Heterogeneous Visual Experts
title_fullStr High-Order Temporal Context-Aware Aerial Tracking with Heterogeneous Visual Experts
title_full_unstemmed High-Order Temporal Context-Aware Aerial Tracking with Heterogeneous Visual Experts
title_short High-Order Temporal Context-Aware Aerial Tracking with Heterogeneous Visual Experts
title_sort high order temporal context aware aerial tracking with heterogeneous visual experts
topic unmanned aerial vehicle
low-altitude remote sensing
optical tracking
temporal reasoning
motion analysis
decision-making
url https://www.mdpi.com/2072-4292/17/13/2237
work_keys_str_mv AT shichaozhou highordertemporalcontextawareaerialtrackingwithheterogeneousvisualexperts
AT xiangpanfan highordertemporalcontextawareaerialtrackingwithheterogeneousvisualexperts
AT zhuoweiwang highordertemporalcontextawareaerialtrackingwithheterogeneousvisualexperts
AT wenzhengwang highordertemporalcontextawareaerialtrackingwithheterogeneousvisualexperts
AT yunpuzhang highordertemporalcontextawareaerialtrackingwithheterogeneousvisualexperts