GOFENet: A Hybrid Transformer–CNN Network Integrating GEOBIA-Based Object Priors for Semantic Segmentation of Remote Sensing Images

Geographic object-based image analysis (GEOBIA) has demonstrated substantial utility in remote sensing tasks. However, its integration with deep learning remains largely confined to image-level classification. This is primarily due to the irregular shapes and fragmented boundaries of segmented objec...

Full description

Saved in:

Bibliographic Details
Main Authors:	Tao He, Jianyu Chen, Delu Pan
Format:	Article
Language:	English
Published:	MDPI AG 2025-07-01
Series:	Remote Sensing
Subjects:	multi-scale optimized segmentation land cover classification global–local context semantic segmentation geographic object-based image analysis
Online Access:	https://www.mdpi.com/2072-4292/17/15/2652
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849405955139174400
author	Tao He Jianyu Chen Delu Pan
author_facet	Tao He Jianyu Chen Delu Pan
author_sort	Tao He
collection	DOAJ
description	Geographic object-based image analysis (GEOBIA) has demonstrated substantial utility in remote sensing tasks. However, its integration with deep learning remains largely confined to image-level classification. This is primarily due to the irregular shapes and fragmented boundaries of segmented objects, which limit its applicability in semantic segmentation. While convolutional neural networks (CNNs) excel at local feature extraction, they inherently struggle to capture long-range dependencies. In contrast, Transformer-based models are well suited for global context modeling but often lack fine-grained local detail. To overcome these limitations, we propose GOFENet (Geo-Object Feature Enhanced Network)—a hybrid semantic segmentation architecture that effectively fuses object-level priors into deep feature representations. GOFENet employs a dual-encoder design combining CNN and Swin Transformer architectures, enabling multi-scale feature fusion through skip connections to preserve both local and global semantics. An auxiliary branch incorporating cascaded atrous convolutions is introduced to inject information of segmented objects into the learning process. Furthermore, we develop a cross-channel selection module (CSM) for refined channel-wise attention, a feature enhancement module (FEM) to merge global and local representations, and a shallow–deep feature fusion module (SDFM) to integrate pixel- and object-level cues across scales. Experimental results on the GID and LoveDA datasets demonstrate that GOFENet achieves superior segmentation performance, with 66.02% <i>mIoU</i> and 51.92% <i>mIoU</i>, respectively. The model exhibits strong capability in delineating large-scale land cover features, producing sharper object boundaries and reducing classification noise, while preserving the integrity and discriminability of land cover categories.
format	Article
id	doaj-art-80df0aaed624404ca56b975503ed4f7f
institution	Kabale University
issn	2072-4292
language	English
publishDate	2025-07-01
publisher	MDPI AG
record_format	Article
series	Remote Sensing
spelling	doaj-art-80df0aaed624404ca56b975503ed4f7f2025-08-20T03:36:32ZengMDPI AGRemote Sensing2072-42922025-07-011715265210.3390/rs17152652GOFENet: A Hybrid Transformer–CNN Network Integrating GEOBIA-Based Object Priors for Semantic Segmentation of Remote Sensing ImagesTao He0Jianyu Chen1Delu Pan2Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou 511458, ChinaSouthern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou 511458, ChinaSouthern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou 511458, ChinaGeographic object-based image analysis (GEOBIA) has demonstrated substantial utility in remote sensing tasks. However, its integration with deep learning remains largely confined to image-level classification. This is primarily due to the irregular shapes and fragmented boundaries of segmented objects, which limit its applicability in semantic segmentation. While convolutional neural networks (CNNs) excel at local feature extraction, they inherently struggle to capture long-range dependencies. In contrast, Transformer-based models are well suited for global context modeling but often lack fine-grained local detail. To overcome these limitations, we propose GOFENet (Geo-Object Feature Enhanced Network)—a hybrid semantic segmentation architecture that effectively fuses object-level priors into deep feature representations. GOFENet employs a dual-encoder design combining CNN and Swin Transformer architectures, enabling multi-scale feature fusion through skip connections to preserve both local and global semantics. An auxiliary branch incorporating cascaded atrous convolutions is introduced to inject information of segmented objects into the learning process. Furthermore, we develop a cross-channel selection module (CSM) for refined channel-wise attention, a feature enhancement module (FEM) to merge global and local representations, and a shallow–deep feature fusion module (SDFM) to integrate pixel- and object-level cues across scales. Experimental results on the GID and LoveDA datasets demonstrate that GOFENet achieves superior segmentation performance, with 66.02% <i>mIoU</i> and 51.92% <i>mIoU</i>, respectively. The model exhibits strong capability in delineating large-scale land cover features, producing sharper object boundaries and reducing classification noise, while preserving the integrity and discriminability of land cover categories.https://www.mdpi.com/2072-4292/17/15/2652multi-scale optimized segmentationland cover classificationglobal–local contextsemantic segmentationgeographic object-based image analysis
spellingShingle	Tao He Jianyu Chen Delu Pan GOFENet: A Hybrid Transformer–CNN Network Integrating GEOBIA-Based Object Priors for Semantic Segmentation of Remote Sensing Images Remote Sensing multi-scale optimized segmentation land cover classification global–local context semantic segmentation geographic object-based image analysis
title	GOFENet: A Hybrid Transformer–CNN Network Integrating GEOBIA-Based Object Priors for Semantic Segmentation of Remote Sensing Images
title_full	GOFENet: A Hybrid Transformer–CNN Network Integrating GEOBIA-Based Object Priors for Semantic Segmentation of Remote Sensing Images
title_fullStr	GOFENet: A Hybrid Transformer–CNN Network Integrating GEOBIA-Based Object Priors for Semantic Segmentation of Remote Sensing Images
title_full_unstemmed	GOFENet: A Hybrid Transformer–CNN Network Integrating GEOBIA-Based Object Priors for Semantic Segmentation of Remote Sensing Images
title_short	GOFENet: A Hybrid Transformer–CNN Network Integrating GEOBIA-Based Object Priors for Semantic Segmentation of Remote Sensing Images
title_sort	gofenet a hybrid transformer cnn network integrating geobia based object priors for semantic segmentation of remote sensing images
topic	multi-scale optimized segmentation land cover classification global–local context semantic segmentation geographic object-based image analysis
url	https://www.mdpi.com/2072-4292/17/15/2652
work_keys_str_mv	AT taohe gofenetahybridtransformercnnnetworkintegratinggeobiabasedobjectpriorsforsemanticsegmentationofremotesensingimages AT jianyuchen gofenetahybridtransformercnnnetworkintegratinggeobiabasedobjectpriorsforsemanticsegmentationofremotesensingimages AT delupan gofenetahybridtransformercnnnetworkintegratinggeobiabasedobjectpriorsforsemanticsegmentationofremotesensingimages

GOFENet: A Hybrid Transformer–CNN Network Integrating GEOBIA-Based Object Priors for Semantic Segmentation of Remote Sensing Images

Similar Items