An Adaptive YOLO11 Framework for the Localisation, Tracking, and Imaging of Small Aerial Targets Using a Pan–Tilt–Zoom Camera Network

This article presents a cost-effective camera network system that employs neural network-based object detection and stereo vision to assist a pan–tilt–zoom camera in imaging fast, erratically moving small aerial targets. Compared to traditional radar systems, this approach offers advantages in suppo...

Full description

Saved in:
Bibliographic Details
Main Authors: Ming Him Lui, Haixu Liu, Zhuochen Tang, Hang Yuan, David Williams, Dongjin Lee, K. C. Wong, Zihao Wang
Format: Article
Language:English
Published: MDPI AG 2024-12-01
Series:Eng
Subjects:
Online Access:https://www.mdpi.com/2673-4117/5/4/182
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850059122095947776
author Ming Him Lui
Haixu Liu
Zhuochen Tang
Hang Yuan
David Williams
Dongjin Lee
K. C. Wong
Zihao Wang
author_facet Ming Him Lui
Haixu Liu
Zhuochen Tang
Hang Yuan
David Williams
Dongjin Lee
K. C. Wong
Zihao Wang
author_sort Ming Him Lui
collection DOAJ
description This article presents a cost-effective camera network system that employs neural network-based object detection and stereo vision to assist a pan–tilt–zoom camera in imaging fast, erratically moving small aerial targets. Compared to traditional radar systems, this approach offers advantages in supporting real-time target differentiation and ease of deployment. Based on the principle of knowledge distillation, a novel data augmentation method is proposed to coordinate the latest open-source pre-trained large models in semantic segmentation, text generation, and image generation tasks to train a BicycleGAN for image enhancement. The resulting dataset is tested on various model structures and backbone sizes of two mainstream object detection frameworks, Ultralytics’ YOLO and MMDetection. Additionally, the algorithm implements and compares two popular object trackers, Bot-SORT and ByteTrack. The experimental proof-of-concept deploys the YOLOv8n model, which achieves an average precision of 82.2% and an inference time of 0.6 ms. Alternatively, the YOLO11x model maximises average precision at 86.7% while maintaining an inference time of 9.3 ms without bottlenecking subsequent processes. Stereo vision achieves accuracy within a median error of 90 mm following a drone flying over 1 m/s in an 8 m <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mo>×</mo></mrow></semantics></math></inline-formula> 4 m area of interest. Stable single-object tracking with the PTZ camera is successful at 15 fps with an accuracy of 92.58%.
format Article
id doaj-art-21af145c82494853b5a2b6d5ad769ecf
institution DOAJ
issn 2673-4117
language English
publishDate 2024-12-01
publisher MDPI AG
record_format Article
series Eng
spelling doaj-art-21af145c82494853b5a2b6d5ad769ecf2025-08-20T02:50:59ZengMDPI AGEng2673-41172024-12-01543488351610.3390/eng5040182An Adaptive YOLO11 Framework for the Localisation, Tracking, and Imaging of Small Aerial Targets Using a Pan–Tilt–Zoom Camera NetworkMing Him Lui0Haixu Liu1Zhuochen Tang2Hang Yuan3David Williams4Dongjin Lee5K. C. Wong6Zihao Wang7School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, Sydney, NSW 2006, AustraliaSchool of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, Sydney, NSW 2006, AustraliaSchool of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, Sydney, NSW 2006, AustraliaSchool of Engineering, Australian National University, Canberra, ACT 2601, AustraliaSiNAB Pty Ltd., Sydney, NSW 2229, AustraliaDepartment of Unmanned Aircraft Systems, Hanseo University, Seosan 31963, Republic of KoreaSchool of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, Sydney, NSW 2006, AustraliaSchool of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, Sydney, NSW 2006, AustraliaThis article presents a cost-effective camera network system that employs neural network-based object detection and stereo vision to assist a pan–tilt–zoom camera in imaging fast, erratically moving small aerial targets. Compared to traditional radar systems, this approach offers advantages in supporting real-time target differentiation and ease of deployment. Based on the principle of knowledge distillation, a novel data augmentation method is proposed to coordinate the latest open-source pre-trained large models in semantic segmentation, text generation, and image generation tasks to train a BicycleGAN for image enhancement. The resulting dataset is tested on various model structures and backbone sizes of two mainstream object detection frameworks, Ultralytics’ YOLO and MMDetection. Additionally, the algorithm implements and compares two popular object trackers, Bot-SORT and ByteTrack. The experimental proof-of-concept deploys the YOLOv8n model, which achieves an average precision of 82.2% and an inference time of 0.6 ms. Alternatively, the YOLO11x model maximises average precision at 86.7% while maintaining an inference time of 9.3 ms without bottlenecking subsequent processes. Stereo vision achieves accuracy within a median error of 90 mm following a drone flying over 1 m/s in an 8 m <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mo>×</mo></mrow></semantics></math></inline-formula> 4 m area of interest. Stable single-object tracking with the PTZ camera is successful at 15 fps with an accuracy of 92.58%.https://www.mdpi.com/2673-4117/5/4/182object detectionobject trackingdata augmentationStable Diffusionpan–tilt–zoomcamera calibration
spellingShingle Ming Him Lui
Haixu Liu
Zhuochen Tang
Hang Yuan
David Williams
Dongjin Lee
K. C. Wong
Zihao Wang
An Adaptive YOLO11 Framework for the Localisation, Tracking, and Imaging of Small Aerial Targets Using a Pan–Tilt–Zoom Camera Network
Eng
object detection
object tracking
data augmentation
Stable Diffusion
pan–tilt–zoom
camera calibration
title An Adaptive YOLO11 Framework for the Localisation, Tracking, and Imaging of Small Aerial Targets Using a Pan–Tilt–Zoom Camera Network
title_full An Adaptive YOLO11 Framework for the Localisation, Tracking, and Imaging of Small Aerial Targets Using a Pan–Tilt–Zoom Camera Network
title_fullStr An Adaptive YOLO11 Framework for the Localisation, Tracking, and Imaging of Small Aerial Targets Using a Pan–Tilt–Zoom Camera Network
title_full_unstemmed An Adaptive YOLO11 Framework for the Localisation, Tracking, and Imaging of Small Aerial Targets Using a Pan–Tilt–Zoom Camera Network
title_short An Adaptive YOLO11 Framework for the Localisation, Tracking, and Imaging of Small Aerial Targets Using a Pan–Tilt–Zoom Camera Network
title_sort adaptive yolo11 framework for the localisation tracking and imaging of small aerial targets using a pan tilt zoom camera network
topic object detection
object tracking
data augmentation
Stable Diffusion
pan–tilt–zoom
camera calibration
url https://www.mdpi.com/2673-4117/5/4/182
work_keys_str_mv AT minghimlui anadaptiveyolo11frameworkforthelocalisationtrackingandimagingofsmallaerialtargetsusingapantiltzoomcameranetwork
AT haixuliu anadaptiveyolo11frameworkforthelocalisationtrackingandimagingofsmallaerialtargetsusingapantiltzoomcameranetwork
AT zhuochentang anadaptiveyolo11frameworkforthelocalisationtrackingandimagingofsmallaerialtargetsusingapantiltzoomcameranetwork
AT hangyuan anadaptiveyolo11frameworkforthelocalisationtrackingandimagingofsmallaerialtargetsusingapantiltzoomcameranetwork
AT davidwilliams anadaptiveyolo11frameworkforthelocalisationtrackingandimagingofsmallaerialtargetsusingapantiltzoomcameranetwork
AT dongjinlee anadaptiveyolo11frameworkforthelocalisationtrackingandimagingofsmallaerialtargetsusingapantiltzoomcameranetwork
AT kcwong anadaptiveyolo11frameworkforthelocalisationtrackingandimagingofsmallaerialtargetsusingapantiltzoomcameranetwork
AT zihaowang anadaptiveyolo11frameworkforthelocalisationtrackingandimagingofsmallaerialtargetsusingapantiltzoomcameranetwork
AT minghimlui adaptiveyolo11frameworkforthelocalisationtrackingandimagingofsmallaerialtargetsusingapantiltzoomcameranetwork
AT haixuliu adaptiveyolo11frameworkforthelocalisationtrackingandimagingofsmallaerialtargetsusingapantiltzoomcameranetwork
AT zhuochentang adaptiveyolo11frameworkforthelocalisationtrackingandimagingofsmallaerialtargetsusingapantiltzoomcameranetwork
AT hangyuan adaptiveyolo11frameworkforthelocalisationtrackingandimagingofsmallaerialtargetsusingapantiltzoomcameranetwork
AT davidwilliams adaptiveyolo11frameworkforthelocalisationtrackingandimagingofsmallaerialtargetsusingapantiltzoomcameranetwork
AT dongjinlee adaptiveyolo11frameworkforthelocalisationtrackingandimagingofsmallaerialtargetsusingapantiltzoomcameranetwork
AT kcwong adaptiveyolo11frameworkforthelocalisationtrackingandimagingofsmallaerialtargetsusingapantiltzoomcameranetwork
AT zihaowang adaptiveyolo11frameworkforthelocalisationtrackingandimagingofsmallaerialtargetsusingapantiltzoomcameranetwork