Coarse-Fine Tracker: A Robust MOT Framework for Satellite Videos via Tracking Any Point

Traditional Multiple Object Tracking (MOT) methods in satellite videos mostly follow the Detection-Based Tracking (DBT) framework. However, the DBT framework assumes that all objects are correctly recognized and localized by the detector. In practice, the low resolution of satellite videos, small ob...

Full description

Saved in:

Bibliographic Details
Main Authors:	Hanru Shi, Xiaoxuan Liu, Xiyu Qi, Enze Zhu, Jie Jia, Lei Wang
Format:	Article
Language:	English
Published:	MDPI AG 2025-06-01
Series:	Remote Sensing
Subjects:	multiple object tracking detection-based tracking tracking any point
Online Access:	https://www.mdpi.com/2072-4292/17/13/2167
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849428743869693952
author	Hanru Shi Xiaoxuan Liu Xiyu Qi Enze Zhu Jie Jia Lei Wang
author_facet	Hanru Shi Xiaoxuan Liu Xiyu Qi Enze Zhu Jie Jia Lei Wang
author_sort	Hanru Shi
collection	DOAJ
description	Traditional Multiple Object Tracking (MOT) methods in satellite videos mostly follow the Detection-Based Tracking (DBT) framework. However, the DBT framework assumes that all objects are correctly recognized and localized by the detector. In practice, the low resolution of satellite videos, small objects, and complex backgrounds inevitably leads to a decline in detector performance. To alleviate the impact of detector degradation on track, we propose Coarse-Fine Tracker, a framework that integrates the MOT framework with the Tracking Any Point (TAP) method CoTracker for the first time, leveraging TAP’s persistent point correspondence modeling to compensate for detector failures. In our Coarse-Fine Tracker, we divide the satellite video into sub-videos. For one sub-video, we first use ByteTrack to track the outputs of the detector, referred to as coarse tracking, which involves the Kalman filter and box-level motion features. Given the small size of objects in satellite videos, we treat each object as a point to be tracked. We then use CoTracker to track the center point of each object, referred to as fine tracking, by calculating the appearance feature similarity between each point and its neighboring points. Finally, the Consensus Fusion Strategy eliminates mismatched detections in coarse tracking results by checking their geometric consistency against fine tracking results and recovers missed objects via linear interpolation or linear fitting. This method is validated on the VISO and SAT-MTB datasets. Experimental results in VISO show that the tracker achieves a multi-object tracking accuracy (MOTA) of 66.9, a multi-object tracking precision (MOTP) of 64.1, and an IDF1 score of 77.8, surpassing the detector-only baseline by 11.1% in MOTA while reducing ID switches by 139. Comparative experiments with ByteTrack demonstrate the robustness of our tracking method when the performance of the detector deteriorates.
format	Article
id	doaj-art-3ffd02b076ee4dac87c0fbd88b7eca48
institution	Kabale University
issn	2072-4292
language	English
publishDate	2025-06-01
publisher	MDPI AG
record_format	Article
series	Remote Sensing
spelling	doaj-art-3ffd02b076ee4dac87c0fbd88b7eca482025-08-20T03:28:37ZengMDPI AGRemote Sensing2072-42922025-06-011713216710.3390/rs17132167Coarse-Fine Tracker: A Robust MOT Framework for Satellite Videos via Tracking Any PointHanru Shi0Xiaoxuan Liu1Xiyu Qi2Enze Zhu3Jie Jia4Lei Wang5Key Laboratory of Target Cognition and Application Technology (TCAT), Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, ChinaKey Laboratory of Target Cognition and Application Technology (TCAT), Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, ChinaKey Laboratory of Target Cognition and Application Technology (TCAT), Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, ChinaKey Laboratory of Target Cognition and Application Technology (TCAT), Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, ChinaKey Laboratory of Target Cognition and Application Technology (TCAT), Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, ChinaKey Laboratory of Target Cognition and Application Technology (TCAT), Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, ChinaTraditional Multiple Object Tracking (MOT) methods in satellite videos mostly follow the Detection-Based Tracking (DBT) framework. However, the DBT framework assumes that all objects are correctly recognized and localized by the detector. In practice, the low resolution of satellite videos, small objects, and complex backgrounds inevitably leads to a decline in detector performance. To alleviate the impact of detector degradation on track, we propose Coarse-Fine Tracker, a framework that integrates the MOT framework with the Tracking Any Point (TAP) method CoTracker for the first time, leveraging TAP’s persistent point correspondence modeling to compensate for detector failures. In our Coarse-Fine Tracker, we divide the satellite video into sub-videos. For one sub-video, we first use ByteTrack to track the outputs of the detector, referred to as coarse tracking, which involves the Kalman filter and box-level motion features. Given the small size of objects in satellite videos, we treat each object as a point to be tracked. We then use CoTracker to track the center point of each object, referred to as fine tracking, by calculating the appearance feature similarity between each point and its neighboring points. Finally, the Consensus Fusion Strategy eliminates mismatched detections in coarse tracking results by checking their geometric consistency against fine tracking results and recovers missed objects via linear interpolation or linear fitting. This method is validated on the VISO and SAT-MTB datasets. Experimental results in VISO show that the tracker achieves a multi-object tracking accuracy (MOTA) of 66.9, a multi-object tracking precision (MOTP) of 64.1, and an IDF1 score of 77.8, surpassing the detector-only baseline by 11.1% in MOTA while reducing ID switches by 139. Comparative experiments with ByteTrack demonstrate the robustness of our tracking method when the performance of the detector deteriorates.https://www.mdpi.com/2072-4292/17/13/2167multiple object trackingdetection-based trackingtracking any point
spellingShingle	Hanru Shi Xiaoxuan Liu Xiyu Qi Enze Zhu Jie Jia Lei Wang Coarse-Fine Tracker: A Robust MOT Framework for Satellite Videos via Tracking Any Point Remote Sensing multiple object tracking detection-based tracking tracking any point
title	Coarse-Fine Tracker: A Robust MOT Framework for Satellite Videos via Tracking Any Point
title_full	Coarse-Fine Tracker: A Robust MOT Framework for Satellite Videos via Tracking Any Point
title_fullStr	Coarse-Fine Tracker: A Robust MOT Framework for Satellite Videos via Tracking Any Point
title_full_unstemmed	Coarse-Fine Tracker: A Robust MOT Framework for Satellite Videos via Tracking Any Point
title_short	Coarse-Fine Tracker: A Robust MOT Framework for Satellite Videos via Tracking Any Point
title_sort	coarse fine tracker a robust mot framework for satellite videos via tracking any point
topic	multiple object tracking detection-based tracking tracking any point
url	https://www.mdpi.com/2072-4292/17/13/2167
work_keys_str_mv	AT hanrushi coarsefinetrackerarobustmotframeworkforsatellitevideosviatrackinganypoint AT xiaoxuanliu coarsefinetrackerarobustmotframeworkforsatellitevideosviatrackinganypoint AT xiyuqi coarsefinetrackerarobustmotframeworkforsatellitevideosviatrackinganypoint AT enzezhu coarsefinetrackerarobustmotframeworkforsatellitevideosviatrackinganypoint AT jiejia coarsefinetrackerarobustmotframeworkforsatellitevideosviatrackinganypoint AT leiwang coarsefinetrackerarobustmotframeworkforsatellitevideosviatrackinganypoint

Coarse-Fine Tracker: A Robust MOT Framework for Satellite Videos via Tracking Any Point

Similar Items