VMC-Net: multi-scale feature aggregation and distribution with contextual attention guided fusion for aerial object detection

Abstract As an important branch of remote sensing technology, aerial image target detection plays an indispensable role in supporting urban planning, disaster assessment, and other fields. However, this task faces many challenges such as small object size and complex background, which increase the d...

Full description

Saved in:
Bibliographic Details
Main Authors: Haodong Li, Haicheng Qu
Format: Article
Language:English
Published: Springer 2025-06-01
Series:Complex & Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s40747-025-01888-8
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849388703182487552
author Haodong Li
Haicheng Qu
author_facet Haodong Li
Haicheng Qu
author_sort Haodong Li
collection DOAJ
description Abstract As an important branch of remote sensing technology, aerial image target detection plays an indispensable role in supporting urban planning, disaster assessment, and other fields. However, this task faces many challenges such as small object size and complex background, which increase the difficulty of detection. Existing methods usually use multi-scale feature fusion or attention mechanism to improve performance, but they often ignore the role of object feature perception in the image and have problems such as insufficient use of context information. To address these problems, we propose the VMC-Net framework to optimize the aerial image object detection task. The VHeat C2f module enhances the feature extraction capability and generates a clearer target feature map; the multi-scale feature aggregation and distribution module adds feature distribution technology on the basis of the multi-scale feature fusion strategy to achieve more effective scale interaction; the contextual attention guided fusion module uses attention mechanism and weighted fusion method to effectively utilize context information and significantly improve the performance of small object detection. We evaluate the VMC-Net framework on the AI-TOD, VisDrone-2019 and TinyPerson datasets. Experimental results show that our framework outperforms the mainstream target detection methods in the past three years in aerial object detection, with mAP50 scores of 45.6%, 45.9%, and 25.4% respectively.
format Article
id doaj-art-31bfcbab442c40339fb4faef0feaf079
institution Kabale University
issn 2199-4536
2198-6053
language English
publishDate 2025-06-01
publisher Springer
record_format Article
series Complex & Intelligent Systems
spelling doaj-art-31bfcbab442c40339fb4faef0feaf0792025-08-20T03:42:11ZengSpringerComplex & Intelligent Systems2199-45362198-60532025-06-0111812510.1007/s40747-025-01888-8VMC-Net: multi-scale feature aggregation and distribution with contextual attention guided fusion for aerial object detectionHaodong Li0Haicheng Qu1Liaoning Technical University, School of SoftwareLiaoning Technical University, School of SoftwareAbstract As an important branch of remote sensing technology, aerial image target detection plays an indispensable role in supporting urban planning, disaster assessment, and other fields. However, this task faces many challenges such as small object size and complex background, which increase the difficulty of detection. Existing methods usually use multi-scale feature fusion or attention mechanism to improve performance, but they often ignore the role of object feature perception in the image and have problems such as insufficient use of context information. To address these problems, we propose the VMC-Net framework to optimize the aerial image object detection task. The VHeat C2f module enhances the feature extraction capability and generates a clearer target feature map; the multi-scale feature aggregation and distribution module adds feature distribution technology on the basis of the multi-scale feature fusion strategy to achieve more effective scale interaction; the contextual attention guided fusion module uses attention mechanism and weighted fusion method to effectively utilize context information and significantly improve the performance of small object detection. We evaluate the VMC-Net framework on the AI-TOD, VisDrone-2019 and TinyPerson datasets. Experimental results show that our framework outperforms the mainstream target detection methods in the past three years in aerial object detection, with mAP50 scores of 45.6%, 45.9%, and 25.4% respectively.https://doi.org/10.1007/s40747-025-01888-8Object detectionAerial imagesMulti-scale feature fusionContextual attentionFeature extraction
spellingShingle Haodong Li
Haicheng Qu
VMC-Net: multi-scale feature aggregation and distribution with contextual attention guided fusion for aerial object detection
Complex & Intelligent Systems
Object detection
Aerial images
Multi-scale feature fusion
Contextual attention
Feature extraction
title VMC-Net: multi-scale feature aggregation and distribution with contextual attention guided fusion for aerial object detection
title_full VMC-Net: multi-scale feature aggregation and distribution with contextual attention guided fusion for aerial object detection
title_fullStr VMC-Net: multi-scale feature aggregation and distribution with contextual attention guided fusion for aerial object detection
title_full_unstemmed VMC-Net: multi-scale feature aggregation and distribution with contextual attention guided fusion for aerial object detection
title_short VMC-Net: multi-scale feature aggregation and distribution with contextual attention guided fusion for aerial object detection
title_sort vmc net multi scale feature aggregation and distribution with contextual attention guided fusion for aerial object detection
topic Object detection
Aerial images
Multi-scale feature fusion
Contextual attention
Feature extraction
url https://doi.org/10.1007/s40747-025-01888-8
work_keys_str_mv AT haodongli vmcnetmultiscalefeatureaggregationanddistributionwithcontextualattentionguidedfusionforaerialobjectdetection
AT haichengqu vmcnetmultiscalefeatureaggregationanddistributionwithcontextualattentionguidedfusionforaerialobjectdetection