GOFENet: A Hybrid Transformer–CNN Network Integrating GEOBIA-Based Object Priors for Semantic Segmentation of Remote Sensing Images

Geographic object-based image analysis (GEOBIA) has demonstrated substantial utility in remote sensing tasks. However, its integration with deep learning remains largely confined to image-level classification. This is primarily due to the irregular shapes and fragmented boundaries of segmented objec...

Full description

Saved in:
Bibliographic Details
Main Authors: Tao He, Jianyu Chen, Delu Pan
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/17/15/2652
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849405955139174400
author Tao He
Jianyu Chen
Delu Pan
author_facet Tao He
Jianyu Chen
Delu Pan
author_sort Tao He
collection DOAJ
description Geographic object-based image analysis (GEOBIA) has demonstrated substantial utility in remote sensing tasks. However, its integration with deep learning remains largely confined to image-level classification. This is primarily due to the irregular shapes and fragmented boundaries of segmented objects, which limit its applicability in semantic segmentation. While convolutional neural networks (CNNs) excel at local feature extraction, they inherently struggle to capture long-range dependencies. In contrast, Transformer-based models are well suited for global context modeling but often lack fine-grained local detail. To overcome these limitations, we propose GOFENet (Geo-Object Feature Enhanced Network)—a hybrid semantic segmentation architecture that effectively fuses object-level priors into deep feature representations. GOFENet employs a dual-encoder design combining CNN and Swin Transformer architectures, enabling multi-scale feature fusion through skip connections to preserve both local and global semantics. An auxiliary branch incorporating cascaded atrous convolutions is introduced to inject information of segmented objects into the learning process. Furthermore, we develop a cross-channel selection module (CSM) for refined channel-wise attention, a feature enhancement module (FEM) to merge global and local representations, and a shallow–deep feature fusion module (SDFM) to integrate pixel- and object-level cues across scales. Experimental results on the GID and LoveDA datasets demonstrate that GOFENet achieves superior segmentation performance, with 66.02% <i>mIoU</i> and 51.92% <i>mIoU</i>, respectively. The model exhibits strong capability in delineating large-scale land cover features, producing sharper object boundaries and reducing classification noise, while preserving the integrity and discriminability of land cover categories.
format Article
id doaj-art-80df0aaed624404ca56b975503ed4f7f
institution Kabale University
issn 2072-4292
language English
publishDate 2025-07-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj-art-80df0aaed624404ca56b975503ed4f7f2025-08-20T03:36:32ZengMDPI AGRemote Sensing2072-42922025-07-011715265210.3390/rs17152652GOFENet: A Hybrid Transformer–CNN Network Integrating GEOBIA-Based Object Priors for Semantic Segmentation of Remote Sensing ImagesTao He0Jianyu Chen1Delu Pan2Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou 511458, ChinaSouthern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou 511458, ChinaSouthern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou 511458, ChinaGeographic object-based image analysis (GEOBIA) has demonstrated substantial utility in remote sensing tasks. However, its integration with deep learning remains largely confined to image-level classification. This is primarily due to the irregular shapes and fragmented boundaries of segmented objects, which limit its applicability in semantic segmentation. While convolutional neural networks (CNNs) excel at local feature extraction, they inherently struggle to capture long-range dependencies. In contrast, Transformer-based models are well suited for global context modeling but often lack fine-grained local detail. To overcome these limitations, we propose GOFENet (Geo-Object Feature Enhanced Network)—a hybrid semantic segmentation architecture that effectively fuses object-level priors into deep feature representations. GOFENet employs a dual-encoder design combining CNN and Swin Transformer architectures, enabling multi-scale feature fusion through skip connections to preserve both local and global semantics. An auxiliary branch incorporating cascaded atrous convolutions is introduced to inject information of segmented objects into the learning process. Furthermore, we develop a cross-channel selection module (CSM) for refined channel-wise attention, a feature enhancement module (FEM) to merge global and local representations, and a shallow–deep feature fusion module (SDFM) to integrate pixel- and object-level cues across scales. Experimental results on the GID and LoveDA datasets demonstrate that GOFENet achieves superior segmentation performance, with 66.02% <i>mIoU</i> and 51.92% <i>mIoU</i>, respectively. The model exhibits strong capability in delineating large-scale land cover features, producing sharper object boundaries and reducing classification noise, while preserving the integrity and discriminability of land cover categories.https://www.mdpi.com/2072-4292/17/15/2652multi-scale optimized segmentationland cover classificationglobal–local contextsemantic segmentationgeographic object-based image analysis
spellingShingle Tao He
Jianyu Chen
Delu Pan
GOFENet: A Hybrid Transformer–CNN Network Integrating GEOBIA-Based Object Priors for Semantic Segmentation of Remote Sensing Images
Remote Sensing
multi-scale optimized segmentation
land cover classification
global–local context
semantic segmentation
geographic object-based image analysis
title GOFENet: A Hybrid Transformer–CNN Network Integrating GEOBIA-Based Object Priors for Semantic Segmentation of Remote Sensing Images
title_full GOFENet: A Hybrid Transformer–CNN Network Integrating GEOBIA-Based Object Priors for Semantic Segmentation of Remote Sensing Images
title_fullStr GOFENet: A Hybrid Transformer–CNN Network Integrating GEOBIA-Based Object Priors for Semantic Segmentation of Remote Sensing Images
title_full_unstemmed GOFENet: A Hybrid Transformer–CNN Network Integrating GEOBIA-Based Object Priors for Semantic Segmentation of Remote Sensing Images
title_short GOFENet: A Hybrid Transformer–CNN Network Integrating GEOBIA-Based Object Priors for Semantic Segmentation of Remote Sensing Images
title_sort gofenet a hybrid transformer cnn network integrating geobia based object priors for semantic segmentation of remote sensing images
topic multi-scale optimized segmentation
land cover classification
global–local context
semantic segmentation
geographic object-based image analysis
url https://www.mdpi.com/2072-4292/17/15/2652
work_keys_str_mv AT taohe gofenetahybridtransformercnnnetworkintegratinggeobiabasedobjectpriorsforsemanticsegmentationofremotesensingimages
AT jianyuchen gofenetahybridtransformercnnnetworkintegratinggeobiabasedobjectpriorsforsemanticsegmentationofremotesensingimages
AT delupan gofenetahybridtransformercnnnetworkintegratinggeobiabasedobjectpriorsforsemanticsegmentationofremotesensingimages