LGC-YOLO: Local-Global Feature Extraction and Coordination Network With Contextual Interaction for Remote Sensing Object Detection

Object detection in high-resolution remote sensing image (HRRSI) faces great challenges of large-scale variations in object size, densely distributed small objects, and complex background interferences. To address these challenges, we propose an innovative single-stage local-global feature extractio...

Full description

Saved in:
Bibliographic Details
Main Authors: Qinggang Wu, Yang Li, Junru Yin, Xiaotian You
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11018430/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Object detection in high-resolution remote sensing image (HRRSI) faces great challenges of large-scale variations in object size, densely distributed small objects, and complex background interferences. To address these challenges, we propose an innovative single-stage local-global feature extraction and coordination network (LGC-YOLO) to improve the detection accuracy of objects in HRRSIs. LGC-YOLO mainly comprises three modules of local-global spatial feature extraction (LGSFE), gradient optimized spatial information interaction (GOSII), and edge-semantic feature coordination fusion (ESFCF), which synergistically improves the feature extraction and object detection capabilities of LGC-YOLO. First, LGSFE captures local and global features of dense objects through receptive-field attention convolution and global pooling in a multibranch structure, which effectively alleviates the misalignment between the extracted features of objects and their intrinsic characteristics, thereby providing more accurate and abundant features for subsequent object detection. Second, GOSII is designed to dynamically adjust the weights of each feature channel through combining SRU blocks and the SimAM attention mechanism, which are further optimized and embedded into C2f to enhance the representation ability of contextual features. GOSII captures crucial features from complex backgrounds and improves information transmission. Finally, ESFCF integrates the edge and semantic information within shallow feature maps to address the issue of inaccurate localization for small objects, and further improves object detection accuracy by compensating for the loss of edge details in feature extraction. Extensive experiments on three commonly used remote sensing datasets of NWPU VHR-10, VisDrone 2019, and DOTA demonstrate the superiority of our method in object classification and localization compared to other state-of-the-art methods.
ISSN:1939-1404
2151-1535