Airport Clearance Detection Based on Vision Transformer and Multi-Scale Feature Fusion

With the improvement of the ecological environment and technology, the possibility of foreign object invasion over airports is increasing, which seriously affects the flight safety of aircraft. Most airports use independent bird repelling devices to drive away birds, but their effectiveness will gra...

Full description

Saved in:
Bibliographic Details
Main Authors: Yutong Chen, Yufen Liu, Zhixiong Guo, Qiang Gao
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10931779/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:With the improvement of the ecological environment and technology, the possibility of foreign object invasion over airports is increasing, which seriously affects the flight safety of aircraft. Most airports use independent bird repelling devices to drive away birds, but their effectiveness will gradually deteriorate over time. In addition, these devices cannot effectively repel foreign objects such as drones. Therefore, a complete airport clearance system is needed, and the core part of this system is the airport clearance detection which requires accurate identification of birds, drones, and foreign objects in the airspace to ensure aviation safety. Faced with airport complex scenes and small object detection, there are still certain limitations to the traditional object detection algorithm. To overcome the defects in detection, this paper proposes an airport clearance detection algorithm based on Vision Transformer and multi-scale feature fusion to address the problems of poor real-time performance, low accuracy, and large parameter quantity in existing airport clearance detection systems. Firstly, to enrich the feature representation, replace the last C2f of the neck with C2fCIB. Secondly, to improve the feature extraction ability, partial convolution is replaced with dynamic convolution, and attention is introduced to the convolution kernel from four dimensions. Then, add the Vision Transformer module to capture more contextual information. Finally, improve the loss function to enhance the ability of bounding box regression processing. The experimental results show that the model achieves high detection accuracy which has reached mAP@0.5 at 93.7%, an improvement of 5.4% compared to YOLOv8n.
ISSN:2169-3536