Weighted Feature Fusion Network Based on Large Kernel Convolution and Transformer for Multi-Modal Remote Sensing Image Segmentation

The heterogeneity and complexity of multi-modal data in high-resolution remote sensing images posed a severe challenge to existing cross-modal networks that aim to fuse complementary information of high-resolution optical and elevation data information (DSM) to achieve accurate semantic segmentation...

Full description

Saved in:
Bibliographic Details
Main Authors: Jianxia Wang, Shaozu Qiu, Jia Cai, Xiaoming Zhang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11123171/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849223083840241664
author Jianxia Wang
Shaozu Qiu
Jia Cai
Xiaoming Zhang
author_facet Jianxia Wang
Shaozu Qiu
Jia Cai
Xiaoming Zhang
author_sort Jianxia Wang
collection DOAJ
description The heterogeneity and complexity of multi-modal data in high-resolution remote sensing images posed a severe challenge to existing cross-modal networks that aim to fuse complementary information of high-resolution optical and elevation data information (DSM) to achieve accurate semantic segmentation. To solve this problem, a weighted feature fusion network based on large kernel convolution and Transformer (LTFCNet) was proposed. The model uses two parallel encoders to extract the features of different modalities, an improved cross-fusion module to enhance the encoder’s feature extraction capability, and a gate module based on large kernel and Transformer to achieve multi-modal fusion. Finally, a Difference information Feature Fusion Module (DFFM) leveraging attention to differential regions is used to achieve cross-level feature fusion and enhance small object detection. To evaluate the network, we compare it with several state-of-the-art models (SOTA), using the Potsdam and Vaihingen datasets. The experimental results demonstrate that the proposed model outperforms other SOTA models by approximately 2% in the mIoU metric, validating its effectiveness in multi-modal feature fusion.
format Article
id doaj-art-028efef0ae6b4a288053e5e5ea5119e2
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-028efef0ae6b4a288053e5e5ea5119e22025-08-25T23:18:08ZengIEEEIEEE Access2169-35362025-01-011314531914533310.1109/ACCESS.2025.359811611123171Weighted Feature Fusion Network Based on Large Kernel Convolution and Transformer for Multi-Modal Remote Sensing Image SegmentationJianxia Wang0Shaozu Qiu1https://orcid.org/0009-0003-5911-4029Jia Cai2https://orcid.org/0009-0007-9873-7084Xiaoming Zhang3College of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, ChinaCollege of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, ChinaCollege of Continuing Education, Hebei Open University, Shijiazhuang, ChinaCollege of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, ChinaThe heterogeneity and complexity of multi-modal data in high-resolution remote sensing images posed a severe challenge to existing cross-modal networks that aim to fuse complementary information of high-resolution optical and elevation data information (DSM) to achieve accurate semantic segmentation. To solve this problem, a weighted feature fusion network based on large kernel convolution and Transformer (LTFCNet) was proposed. The model uses two parallel encoders to extract the features of different modalities, an improved cross-fusion module to enhance the encoder’s feature extraction capability, and a gate module based on large kernel and Transformer to achieve multi-modal fusion. Finally, a Difference information Feature Fusion Module (DFFM) leveraging attention to differential regions is used to achieve cross-level feature fusion and enhance small object detection. To evaluate the network, we compare it with several state-of-the-art models (SOTA), using the Potsdam and Vaihingen datasets. The experimental results demonstrate that the proposed model outperforms other SOTA models by approximately 2% in the mIoU metric, validating its effectiveness in multi-modal feature fusion.https://ieeexplore.ieee.org/document/11123171/Multi-modallarge convolution kernelremote sensing imagesemantic segmentationfeature fusion
spellingShingle Jianxia Wang
Shaozu Qiu
Jia Cai
Xiaoming Zhang
Weighted Feature Fusion Network Based on Large Kernel Convolution and Transformer for Multi-Modal Remote Sensing Image Segmentation
IEEE Access
Multi-modal
large convolution kernel
remote sensing image
semantic segmentation
feature fusion
title Weighted Feature Fusion Network Based on Large Kernel Convolution and Transformer for Multi-Modal Remote Sensing Image Segmentation
title_full Weighted Feature Fusion Network Based on Large Kernel Convolution and Transformer for Multi-Modal Remote Sensing Image Segmentation
title_fullStr Weighted Feature Fusion Network Based on Large Kernel Convolution and Transformer for Multi-Modal Remote Sensing Image Segmentation
title_full_unstemmed Weighted Feature Fusion Network Based on Large Kernel Convolution and Transformer for Multi-Modal Remote Sensing Image Segmentation
title_short Weighted Feature Fusion Network Based on Large Kernel Convolution and Transformer for Multi-Modal Remote Sensing Image Segmentation
title_sort weighted feature fusion network based on large kernel convolution and transformer for multi modal remote sensing image segmentation
topic Multi-modal
large convolution kernel
remote sensing image
semantic segmentation
feature fusion
url https://ieeexplore.ieee.org/document/11123171/
work_keys_str_mv AT jianxiawang weightedfeaturefusionnetworkbasedonlargekernelconvolutionandtransformerformultimodalremotesensingimagesegmentation
AT shaozuqiu weightedfeaturefusionnetworkbasedonlargekernelconvolutionandtransformerformultimodalremotesensingimagesegmentation
AT jiacai weightedfeaturefusionnetworkbasedonlargekernelconvolutionandtransformerformultimodalremotesensingimagesegmentation
AT xiaomingzhang weightedfeaturefusionnetworkbasedonlargekernelconvolutionandtransformerformultimodalremotesensingimagesegmentation