Coarse-Fine Tracker: A Robust MOT Framework for Satellite Videos via Tracking Any Point

Traditional Multiple Object Tracking (MOT) methods in satellite videos mostly follow the Detection-Based Tracking (DBT) framework. However, the DBT framework assumes that all objects are correctly recognized and localized by the detector. In practice, the low resolution of satellite videos, small ob...

Full description

Saved in:
Bibliographic Details
Main Authors: Hanru Shi, Xiaoxuan Liu, Xiyu Qi, Enze Zhu, Jie Jia, Lei Wang
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/17/13/2167
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849428743869693952
author Hanru Shi
Xiaoxuan Liu
Xiyu Qi
Enze Zhu
Jie Jia
Lei Wang
author_facet Hanru Shi
Xiaoxuan Liu
Xiyu Qi
Enze Zhu
Jie Jia
Lei Wang
author_sort Hanru Shi
collection DOAJ
description Traditional Multiple Object Tracking (MOT) methods in satellite videos mostly follow the Detection-Based Tracking (DBT) framework. However, the DBT framework assumes that all objects are correctly recognized and localized by the detector. In practice, the low resolution of satellite videos, small objects, and complex backgrounds inevitably leads to a decline in detector performance. To alleviate the impact of detector degradation on track, we propose Coarse-Fine Tracker, a framework that integrates the MOT framework with the Tracking Any Point (TAP) method CoTracker for the first time, leveraging TAP’s persistent point correspondence modeling to compensate for detector failures. In our Coarse-Fine Tracker, we divide the satellite video into sub-videos. For one sub-video, we first use ByteTrack to track the outputs of the detector, referred to as coarse tracking, which involves the Kalman filter and box-level motion features. Given the small size of objects in satellite videos, we treat each object as a point to be tracked. We then use CoTracker to track the center point of each object, referred to as fine tracking, by calculating the appearance feature similarity between each point and its neighboring points. Finally, the Consensus Fusion Strategy eliminates mismatched detections in coarse tracking results by checking their geometric consistency against fine tracking results and recovers missed objects via linear interpolation or linear fitting. This method is validated on the VISO and SAT-MTB datasets. Experimental results in VISO show that the tracker achieves a multi-object tracking accuracy (MOTA) of 66.9, a multi-object tracking precision (MOTP) of 64.1, and an IDF1 score of 77.8, surpassing the detector-only baseline by 11.1% in MOTA while reducing ID switches by 139. Comparative experiments with ByteTrack demonstrate the robustness of our tracking method when the performance of the detector deteriorates.
format Article
id doaj-art-3ffd02b076ee4dac87c0fbd88b7eca48
institution Kabale University
issn 2072-4292
language English
publishDate 2025-06-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj-art-3ffd02b076ee4dac87c0fbd88b7eca482025-08-20T03:28:37ZengMDPI AGRemote Sensing2072-42922025-06-011713216710.3390/rs17132167Coarse-Fine Tracker: A Robust MOT Framework for Satellite Videos via Tracking Any PointHanru Shi0Xiaoxuan Liu1Xiyu Qi2Enze Zhu3Jie Jia4Lei Wang5Key Laboratory of Target Cognition and Application Technology (TCAT), Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, ChinaKey Laboratory of Target Cognition and Application Technology (TCAT), Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, ChinaKey Laboratory of Target Cognition and Application Technology (TCAT), Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, ChinaKey Laboratory of Target Cognition and Application Technology (TCAT), Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, ChinaKey Laboratory of Target Cognition and Application Technology (TCAT), Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, ChinaKey Laboratory of Target Cognition and Application Technology (TCAT), Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, ChinaTraditional Multiple Object Tracking (MOT) methods in satellite videos mostly follow the Detection-Based Tracking (DBT) framework. However, the DBT framework assumes that all objects are correctly recognized and localized by the detector. In practice, the low resolution of satellite videos, small objects, and complex backgrounds inevitably leads to a decline in detector performance. To alleviate the impact of detector degradation on track, we propose Coarse-Fine Tracker, a framework that integrates the MOT framework with the Tracking Any Point (TAP) method CoTracker for the first time, leveraging TAP’s persistent point correspondence modeling to compensate for detector failures. In our Coarse-Fine Tracker, we divide the satellite video into sub-videos. For one sub-video, we first use ByteTrack to track the outputs of the detector, referred to as coarse tracking, which involves the Kalman filter and box-level motion features. Given the small size of objects in satellite videos, we treat each object as a point to be tracked. We then use CoTracker to track the center point of each object, referred to as fine tracking, by calculating the appearance feature similarity between each point and its neighboring points. Finally, the Consensus Fusion Strategy eliminates mismatched detections in coarse tracking results by checking their geometric consistency against fine tracking results and recovers missed objects via linear interpolation or linear fitting. This method is validated on the VISO and SAT-MTB datasets. Experimental results in VISO show that the tracker achieves a multi-object tracking accuracy (MOTA) of 66.9, a multi-object tracking precision (MOTP) of 64.1, and an IDF1 score of 77.8, surpassing the detector-only baseline by 11.1% in MOTA while reducing ID switches by 139. Comparative experiments with ByteTrack demonstrate the robustness of our tracking method when the performance of the detector deteriorates.https://www.mdpi.com/2072-4292/17/13/2167multiple object trackingdetection-based trackingtracking any point
spellingShingle Hanru Shi
Xiaoxuan Liu
Xiyu Qi
Enze Zhu
Jie Jia
Lei Wang
Coarse-Fine Tracker: A Robust MOT Framework for Satellite Videos via Tracking Any Point
Remote Sensing
multiple object tracking
detection-based tracking
tracking any point
title Coarse-Fine Tracker: A Robust MOT Framework for Satellite Videos via Tracking Any Point
title_full Coarse-Fine Tracker: A Robust MOT Framework for Satellite Videos via Tracking Any Point
title_fullStr Coarse-Fine Tracker: A Robust MOT Framework for Satellite Videos via Tracking Any Point
title_full_unstemmed Coarse-Fine Tracker: A Robust MOT Framework for Satellite Videos via Tracking Any Point
title_short Coarse-Fine Tracker: A Robust MOT Framework for Satellite Videos via Tracking Any Point
title_sort coarse fine tracker a robust mot framework for satellite videos via tracking any point
topic multiple object tracking
detection-based tracking
tracking any point
url https://www.mdpi.com/2072-4292/17/13/2167
work_keys_str_mv AT hanrushi coarsefinetrackerarobustmotframeworkforsatellitevideosviatrackinganypoint
AT xiaoxuanliu coarsefinetrackerarobustmotframeworkforsatellitevideosviatrackinganypoint
AT xiyuqi coarsefinetrackerarobustmotframeworkforsatellitevideosviatrackinganypoint
AT enzezhu coarsefinetrackerarobustmotframeworkforsatellitevideosviatrackinganypoint
AT jiejia coarsefinetrackerarobustmotframeworkforsatellitevideosviatrackinganypoint
AT leiwang coarsefinetrackerarobustmotframeworkforsatellitevideosviatrackinganypoint