PAMFPN: Position-Aware Multi-Kernel Feature Pyramid Network with Adaptive Sparse Attention for Robust Object Detection in Remote Sensing Imagery

Deep learning methods have achieved remarkable success in remote sensing object detection. Existing object detection methods focus on integrating convolutional neural networks (CNNs) and Transformer networks to explore local and global representations to improve performance. However, existing method...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaofei Yang, Suihua Xue, Lin Li, Sihuan Li, Yudong Fang, Xiaofeng Zhang, Xiaohui Huang
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/17/13/2213
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Deep learning methods have achieved remarkable success in remote sensing object detection. Existing object detection methods focus on integrating convolutional neural networks (CNNs) and Transformer networks to explore local and global representations to improve performance. However, existing methods relying on fixed convolutional kernels and dense global attention mechanisms suffer from computational redundancy and insufficient discriminative feature extraction, particularly for small and rotation-sensitive targets. To address these limitations, we propose a Dynamic Multi-Kernel Position-Aware Feature Pyramid Network (PAMFPN), which integrates adaptive sparse position modeling and multi-kernel dynamic fusion to achieve robust feature representation. Firstly, we design a position-interactive context module (PICM) that incorporates distance-aware sparse attention and dynamic positional encoding. It selectively focuses computation on sparse targets through a decay function that suppresses background noise while enhancing spatial correlations of critical regions. Secondly, we design a dual-kernel adaptive fusion (DKAF) architecture by combining region-sensitive attention (RSA) and reconfigurable context aggregation (RCA). RSA employs orthogonal large-kernel convolutions to capture anisotropic spatial features for arbitrarily oriented targets, while RCA dynamically adjusts the kernel scales based on content complexity, effectively addressing scale variations and intraclass diversity. Extensive experiments on three benchmark datasets (DOTA-v1.0, SSDD, HWPUVHR-10) demonstrate the effectiveness and versatility of the proposed PAMFPN. This work bridges the gap between efficient computation and robust feature fusion in remote sensing detection, offering a universal solution for real-world applications.
ISSN:2072-4292