SFRADNet: Object Detection Network with Angle Fine-Tuning Under Feature Matching
Due to the distant acquisition and bird’s-eye perspective of remote sensing images, ground objects are distributed in arbitrary scales and multiple orientations. Existing detectors often utilize feature pyramid networks (FPN) and deformable (or rotated) convolutions to adapt to variations in object...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Remote Sensing |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2072-4292/17/9/1622 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Due to the distant acquisition and bird’s-eye perspective of remote sensing images, ground objects are distributed in arbitrary scales and multiple orientations. Existing detectors often utilize feature pyramid networks (FPN) and deformable (or rotated) convolutions to adapt to variations in object scale and orientation. However, these methods solve scale and orientation issues separately and ignore their deeper coupling relationships. When the scale features extracted by the network are significantly mismatched with the object, it is difficult for the detection head to effectively capture orientation of object, resulting in misalignment between object and bounding box. Therefore, we propose a one-stage detector—Scale First Refinement-Angle Detection Network (SFRADNet), which aims to fine-tune the rotation angle under precise scale feature matching. We introduce the Group Learning Large Kernel Network (GL<sup>2</sup>KNet) as the backbone of SFRADNet and employ a Shape-Aware Spatial Feature Extraction Module (SA-SFEM) as the primary component of the detection head. Specifically, within GL<sup>2</sup>KNet, we construct diverse receptive fields with varying dilation rates to capture features across different spatial coverage ranges. Building on this, we utilize multi-scale features within the layers and apply weighted aggregation based on a Scale Selection Matrix (SSMatrix). The SSMatrix dynamically adjusts the receptive field coverage according to the target size, enabling more refined selection of scale features. Based on precise scale features captured, we first design a Directed Guiding Box (DGBox) within the SA-SFEM, using its shape and position information to supervise the sampling points of the convolution kernels, thereby fitting them to deformations of object. This facilitates the extraction of orientation features near the object region, allowing for accurate refinement of both scale and orientation. Experiments show that our network achieves a mAP of 80.10% on the DOTA-v1.0 dataset, while reducing computational complexity compared to the baseline model. |
|---|---|
| ISSN: | 2072-4292 |