TDFNet: twice decoding V-Mamba-CNN Fusion features for building extraction

Building extraction from remote sensing imagery is vital for various human activities. But it is challenging due to diverse building appearances and complex backgrounds. Research shows the importance of both global context and spatial details for accurate building extraction. Therefore, methods inte...

Full description

Saved in:

Bibliographic Details
Main Authors:	Wenlong Wang, Peng Yu, Mengmeng Li, Xiaojing Zhong, Yuanrong He, Hua Su, Yunxuan Zhou
Format:	Article
Language:	English
Published:	Taylor & Francis Group 2025-07-01
Series:	Geo-spatial Information Science
Subjects:	Building extraction V-Mamba remote sensing twice decoding
Online Access:	https://www.tandfonline.com/doi/10.1080/10095020.2025.2514812
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849427555448258560
author	Wenlong Wang Peng Yu Mengmeng Li Xiaojing Zhong Yuanrong He Hua Su Yunxuan Zhou
author_facet	Wenlong Wang Peng Yu Mengmeng Li Xiaojing Zhong Yuanrong He Hua Su Yunxuan Zhou
author_sort	Wenlong Wang
collection	DOAJ
description	Building extraction from remote sensing imagery is vital for various human activities. But it is challenging due to diverse building appearances and complex backgrounds. Research shows the importance of both global context and spatial details for accurate building extraction. Therefore, methods integrating convolutional neural networks (CNNs) and visual transformers (ViTs) are popular nowadays. However, current methods combining these two methods inadequately merge their features and only perform decoding once, leading to issues like unclear boundaries, internal voids, and susceptibility to non-building elements in complex scenarios with low inter-class and high intra-class variability. To address these issues, this paper introduces a novel extraction method called TDFNet. We first replace ViT with V-Mamba, which has linear complexity, and combine it with CNN for feature extraction. A bidirectional fusion module (BFM) is then designed to comprehensively integrate spatial details and global information, thereby enabling accurate identification of boundaries between adjacent buildings, and maintaining the structural integrity of buildings to avoid internal holes. During the decoding process, we propose an Encoder-Decoder Fusion Module (EDFM) to initially merge features from different stages of the encoder and decoder, thereby diminishing the model’s susceptibility to non-building elements with features similar to those of buildings, and consequently reducing the incidence of erroneous extractions. Subsequently, a twice decoding strategy is implemented to enhance the learning of multi-scale features significantly, thereby mitigating the impact of tree occlusions and shadows. Our method yields the state-of-the-art (SOTA) performance on three public building datasets.
format	Article
id	doaj-art-744e7f62cbea45eb960c7fdbdb2f51eb
institution	Kabale University
issn	1009-5020 1993-5153
language	English
publishDate	2025-07-01
publisher	Taylor & Francis Group
record_format	Article
series	Geo-spatial Information Science
spelling	doaj-art-744e7f62cbea45eb960c7fdbdb2f51eb2025-08-20T03:28:59ZengTaylor & Francis GroupGeo-spatial Information Science1009-50201993-51532025-07-0112010.1080/10095020.2025.2514812TDFNet: twice decoding V-Mamba-CNN Fusion features for building extractionWenlong Wang0Peng Yu1Mengmeng Li2Xiaojing Zhong3Yuanrong He4Hua Su5Yunxuan Zhou6College of Computer and Information Engineering, Xiamen University of Technology, Xiamen, ChinaCollege of Computer and Information Engineering, Xiamen University of Technology, Xiamen, ChinaKey Laboratory of Spatial Data Mining and Information Sharing of Ministry of Education, The Academy of Digital China, Fuzhou University, Fuzhou, ChinaCollege of Harbour and Coastal Engineering, Jimei University/Xiamen Key Laboratory of Green and Smart Coastal Engineering, Xiamen, ChinaCollege of Computer and Information Engineering, Xiamen University of Technology, Xiamen, ChinaKey Laboratory of Spatial Data Mining and Information Sharing of Ministry of Education, The Academy of Digital China, Fuzhou University, Fuzhou, ChinaState Key Laboratory of Estuarine and Coastal Research, East China Normal University, Shanghai, ChinaBuilding extraction from remote sensing imagery is vital for various human activities. But it is challenging due to diverse building appearances and complex backgrounds. Research shows the importance of both global context and spatial details for accurate building extraction. Therefore, methods integrating convolutional neural networks (CNNs) and visual transformers (ViTs) are popular nowadays. However, current methods combining these two methods inadequately merge their features and only perform decoding once, leading to issues like unclear boundaries, internal voids, and susceptibility to non-building elements in complex scenarios with low inter-class and high intra-class variability. To address these issues, this paper introduces a novel extraction method called TDFNet. We first replace ViT with V-Mamba, which has linear complexity, and combine it with CNN for feature extraction. A bidirectional fusion module (BFM) is then designed to comprehensively integrate spatial details and global information, thereby enabling accurate identification of boundaries between adjacent buildings, and maintaining the structural integrity of buildings to avoid internal holes. During the decoding process, we propose an Encoder-Decoder Fusion Module (EDFM) to initially merge features from different stages of the encoder and decoder, thereby diminishing the model’s susceptibility to non-building elements with features similar to those of buildings, and consequently reducing the incidence of erroneous extractions. Subsequently, a twice decoding strategy is implemented to enhance the learning of multi-scale features significantly, thereby mitigating the impact of tree occlusions and shadows. Our method yields the state-of-the-art (SOTA) performance on three public building datasets.https://www.tandfonline.com/doi/10.1080/10095020.2025.2514812Building extractionV-Mambaremote sensingtwice decoding
spellingShingle	Wenlong Wang Peng Yu Mengmeng Li Xiaojing Zhong Yuanrong He Hua Su Yunxuan Zhou TDFNet: twice decoding V-Mamba-CNN Fusion features for building extraction Geo-spatial Information Science Building extraction V-Mamba remote sensing twice decoding
title	TDFNet: twice decoding V-Mamba-CNN Fusion features for building extraction
title_full	TDFNet: twice decoding V-Mamba-CNN Fusion features for building extraction
title_fullStr	TDFNet: twice decoding V-Mamba-CNN Fusion features for building extraction
title_full_unstemmed	TDFNet: twice decoding V-Mamba-CNN Fusion features for building extraction
title_short	TDFNet: twice decoding V-Mamba-CNN Fusion features for building extraction
title_sort	tdfnet twice decoding v mamba cnn fusion features for building extraction
topic	Building extraction V-Mamba remote sensing twice decoding
url	https://www.tandfonline.com/doi/10.1080/10095020.2025.2514812
work_keys_str_mv	AT wenlongwang tdfnettwicedecodingvmambacnnfusionfeaturesforbuildingextraction AT pengyu tdfnettwicedecodingvmambacnnfusionfeaturesforbuildingextraction AT mengmengli tdfnettwicedecodingvmambacnnfusionfeaturesforbuildingextraction AT xiaojingzhong tdfnettwicedecodingvmambacnnfusionfeaturesforbuildingextraction AT yuanronghe tdfnettwicedecodingvmambacnnfusionfeaturesforbuildingextraction AT huasu tdfnettwicedecodingvmambacnnfusionfeaturesforbuildingextraction AT yunxuanzhou tdfnettwicedecodingvmambacnnfusionfeaturesforbuildingextraction

TDFNet: twice decoding V-Mamba-CNN Fusion features for building extraction

Similar Items