RPFusionNet: An Efficient Semantic Segmentation Method for Large-Scale Remote Sensing Images via Parallel Region–Patch Fusion

Mainstream deep learning segmentation models are designed for small-sized images, and when applied to high-resolution remote sensing images, the limited information contained in small-sized images greatly restricts a model’s ability to capture complex contextual information at a global scale. To mit...

Full description

Saved in:
Bibliographic Details
Main Authors: Shiyan Pang, Weimin Zeng, Yepeng Shi, Zhiqi Zuo, Kejiang Xiao, Yujun Wu
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/17/13/2158
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Mainstream deep learning segmentation models are designed for small-sized images, and when applied to high-resolution remote sensing images, the limited information contained in small-sized images greatly restricts a model’s ability to capture complex contextual information at a global scale. To mitigate this challenge, we present RPFusionNet, a novel parallel semantic segmentation framework that is specifically designed to efficiently integrate both local and global features. RPFusionNet leverages two distinct feature representations: REGION (representing large areas) and PATCH (representing smaller regions). This framework comprises two parallel branches: the REGION branch initially downsamples the entire image, then extracts features via a convolutional neural network (CNN)-based encoder, and subsequently captures multi-level information using pooled kernels of varying sizes. This design enables the model to adapt effectively to objects of different scales. In contrast, the PATCH branch utilizes a pixel-level feature extractor to enrich the high-dimensional features of the local region, thereby enhancing the representation of fine-grained details. To model the semantic correlation between the two branches, we have developed the Region–Patch scale fusion module. This module ensures that the network can comprehend a wider range of image contexts while preserving local details, thus bridging the gap between regional and local information. Extensive experiments were conducted on three public datasets: WBDS, AIDS, and Vaihingen. Compared to other state-of-the-art methods, our network achieved the highest accuracy on all three datasets, with an IoU score of 92.08% on the WBDS dataset, 89.99% on the AIDS dataset, and 88.44% on the Vaihingen dataset.
ISSN:2072-4292