Weighted Feature Fusion Network Based on Large Kernel Convolution and Transformer for Multi-Modal Remote Sensing Image Segmentation
The heterogeneity and complexity of multi-modal data in high-resolution remote sensing images posed a severe challenge to existing cross-modal networks that aim to fuse complementary information of high-resolution optical and elevation data information (DSM) to achieve accurate semantic segmentation...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11123171/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849223083840241664 |
|---|---|
| author | Jianxia Wang Shaozu Qiu Jia Cai Xiaoming Zhang |
| author_facet | Jianxia Wang Shaozu Qiu Jia Cai Xiaoming Zhang |
| author_sort | Jianxia Wang |
| collection | DOAJ |
| description | The heterogeneity and complexity of multi-modal data in high-resolution remote sensing images posed a severe challenge to existing cross-modal networks that aim to fuse complementary information of high-resolution optical and elevation data information (DSM) to achieve accurate semantic segmentation. To solve this problem, a weighted feature fusion network based on large kernel convolution and Transformer (LTFCNet) was proposed. The model uses two parallel encoders to extract the features of different modalities, an improved cross-fusion module to enhance the encoder’s feature extraction capability, and a gate module based on large kernel and Transformer to achieve multi-modal fusion. Finally, a Difference information Feature Fusion Module (DFFM) leveraging attention to differential regions is used to achieve cross-level feature fusion and enhance small object detection. To evaluate the network, we compare it with several state-of-the-art models (SOTA), using the Potsdam and Vaihingen datasets. The experimental results demonstrate that the proposed model outperforms other SOTA models by approximately 2% in the mIoU metric, validating its effectiveness in multi-modal feature fusion. |
| format | Article |
| id | doaj-art-028efef0ae6b4a288053e5e5ea5119e2 |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-028efef0ae6b4a288053e5e5ea5119e22025-08-25T23:18:08ZengIEEEIEEE Access2169-35362025-01-011314531914533310.1109/ACCESS.2025.359811611123171Weighted Feature Fusion Network Based on Large Kernel Convolution and Transformer for Multi-Modal Remote Sensing Image SegmentationJianxia Wang0Shaozu Qiu1https://orcid.org/0009-0003-5911-4029Jia Cai2https://orcid.org/0009-0007-9873-7084Xiaoming Zhang3College of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, ChinaCollege of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, ChinaCollege of Continuing Education, Hebei Open University, Shijiazhuang, ChinaCollege of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, ChinaThe heterogeneity and complexity of multi-modal data in high-resolution remote sensing images posed a severe challenge to existing cross-modal networks that aim to fuse complementary information of high-resolution optical and elevation data information (DSM) to achieve accurate semantic segmentation. To solve this problem, a weighted feature fusion network based on large kernel convolution and Transformer (LTFCNet) was proposed. The model uses two parallel encoders to extract the features of different modalities, an improved cross-fusion module to enhance the encoder’s feature extraction capability, and a gate module based on large kernel and Transformer to achieve multi-modal fusion. Finally, a Difference information Feature Fusion Module (DFFM) leveraging attention to differential regions is used to achieve cross-level feature fusion and enhance small object detection. To evaluate the network, we compare it with several state-of-the-art models (SOTA), using the Potsdam and Vaihingen datasets. The experimental results demonstrate that the proposed model outperforms other SOTA models by approximately 2% in the mIoU metric, validating its effectiveness in multi-modal feature fusion.https://ieeexplore.ieee.org/document/11123171/Multi-modallarge convolution kernelremote sensing imagesemantic segmentationfeature fusion |
| spellingShingle | Jianxia Wang Shaozu Qiu Jia Cai Xiaoming Zhang Weighted Feature Fusion Network Based on Large Kernel Convolution and Transformer for Multi-Modal Remote Sensing Image Segmentation IEEE Access Multi-modal large convolution kernel remote sensing image semantic segmentation feature fusion |
| title | Weighted Feature Fusion Network Based on Large Kernel Convolution and Transformer for Multi-Modal Remote Sensing Image Segmentation |
| title_full | Weighted Feature Fusion Network Based on Large Kernel Convolution and Transformer for Multi-Modal Remote Sensing Image Segmentation |
| title_fullStr | Weighted Feature Fusion Network Based on Large Kernel Convolution and Transformer for Multi-Modal Remote Sensing Image Segmentation |
| title_full_unstemmed | Weighted Feature Fusion Network Based on Large Kernel Convolution and Transformer for Multi-Modal Remote Sensing Image Segmentation |
| title_short | Weighted Feature Fusion Network Based on Large Kernel Convolution and Transformer for Multi-Modal Remote Sensing Image Segmentation |
| title_sort | weighted feature fusion network based on large kernel convolution and transformer for multi modal remote sensing image segmentation |
| topic | Multi-modal large convolution kernel remote sensing image semantic segmentation feature fusion |
| url | https://ieeexplore.ieee.org/document/11123171/ |
| work_keys_str_mv | AT jianxiawang weightedfeaturefusionnetworkbasedonlargekernelconvolutionandtransformerformultimodalremotesensingimagesegmentation AT shaozuqiu weightedfeaturefusionnetworkbasedonlargekernelconvolutionandtransformerformultimodalremotesensingimagesegmentation AT jiacai weightedfeaturefusionnetworkbasedonlargekernelconvolutionandtransformerformultimodalremotesensingimagesegmentation AT xiaomingzhang weightedfeaturefusionnetworkbasedonlargekernelconvolutionandtransformerformultimodalremotesensingimagesegmentation |