DLNet: A Dual-Level Network with Self- and Cross-Attention for High-Resolution Remote Sensing Segmentation

With advancements in remote sensing technologies, high-resolution imagery has become increasingly accessible, supporting applications in urban planning, environmental monitoring, and precision agriculture. However, semantic segmentation of such imagery remains challenging due to complex spatial stru...

Full description

Saved in:

Bibliographic Details
Main Authors:	Weijun Meng, Lianlei Shan, Sugang Ma, Dan Liu, Bin Hu
Format:	Article
Language:	English
Published:	MDPI AG 2025-03-01
Series:	Remote Sensing
Subjects:	high-resolution imagery remote sensing semantic segmentation self-attention cross-attention
Online Access:	https://www.mdpi.com/2072-4292/17/7/1119
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849769424915005440
author	Weijun Meng Lianlei Shan Sugang Ma Dan Liu Bin Hu
author_facet	Weijun Meng Lianlei Shan Sugang Ma Dan Liu Bin Hu
author_sort	Weijun Meng
collection	DOAJ
description	With advancements in remote sensing technologies, high-resolution imagery has become increasingly accessible, supporting applications in urban planning, environmental monitoring, and precision agriculture. However, semantic segmentation of such imagery remains challenging due to complex spatial structures, fine-grained details, and land cover variations. Existing methods often struggle with ineffective feature representation, suboptimal fusion of global and local information, and high computational costs, limiting segmentation accuracy and efficiency. To address these challenges, we propose the dual-level network (DLNet), an enhanced framework incorporating self-attention and cross-attention mechanisms for improved multi-scale feature extraction and fusion. The self-attention module captures long-range dependencies to enhance contextual understanding, while the cross-attention module facilitates bidirectional interaction between global and local features, improving spatial coherence and segmentation quality. Additionally, DLNet optimizes computational efficiency by balancing feature refinement and memory consumption, making it suitable for large-scale remote sensing applications. Extensive experiments on benchmark datasets, including DeepGlobe and Inria Aerial, demonstrate that DLNet achieves state-of-the-art segmentation accuracy while maintaining computational efficiency. On the DeepGlobe dataset, DLNet achieves a <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>76.9</mn><mo>%</mo></mrow></semantics></math></inline-formula> mean intersection over union (mIoU), outperforming existing models such as GLNet (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>71.6</mn><mo>%</mo></mrow></semantics></math></inline-formula>) and EHSNet (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>76.3</mn><mo>%</mo></mrow></semantics></math></inline-formula>), while requiring lower memory (1443 MB) and maintaining a competitive inference speed of 518.3 ms per image. On the Inria Aerial dataset, DLNet attains an mIoU of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>73.6</mn><mo>%</mo></mrow></semantics></math></inline-formula>, surpassing GLNet (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>71.2</mn><mo>%</mo></mrow></semantics></math></inline-formula>) while reducing computational cost and achieving an inference speed of 119.4 ms per image. These results highlight DLNet’s effectiveness in achieving precise and efficient segmentation in high-resolution remote sensing imagery.
format	Article
id	doaj-art-4028ebb2abce46218a9715bdcf62d990
institution	DOAJ
issn	2072-4292
language	English
publishDate	2025-03-01
publisher	MDPI AG
record_format	Article
series	Remote Sensing
spelling	doaj-art-4028ebb2abce46218a9715bdcf62d9902025-08-20T03:03:25ZengMDPI AGRemote Sensing2072-42922025-03-01177111910.3390/rs17071119DLNet: A Dual-Level Network with Self- and Cross-Attention for High-Resolution Remote Sensing SegmentationWeijun Meng0Lianlei Shan1Sugang Ma2Dan Liu3Bin Hu4School of Computer Science and Technology, Xi’an University of Posts and Telecommunications, Xi’an 710121, ChinaSchool of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 101408, ChinaSchool of Computer Science and Technology, Xi’an University of Posts and Telecommunications, Xi’an 710121, ChinaDepartment of Management, Kean University, Union, NJ 07083, USADepartment of Computer Science and Technology, Kean University, Union, NJ 07083, USAWith advancements in remote sensing technologies, high-resolution imagery has become increasingly accessible, supporting applications in urban planning, environmental monitoring, and precision agriculture. However, semantic segmentation of such imagery remains challenging due to complex spatial structures, fine-grained details, and land cover variations. Existing methods often struggle with ineffective feature representation, suboptimal fusion of global and local information, and high computational costs, limiting segmentation accuracy and efficiency. To address these challenges, we propose the dual-level network (DLNet), an enhanced framework incorporating self-attention and cross-attention mechanisms for improved multi-scale feature extraction and fusion. The self-attention module captures long-range dependencies to enhance contextual understanding, while the cross-attention module facilitates bidirectional interaction between global and local features, improving spatial coherence and segmentation quality. Additionally, DLNet optimizes computational efficiency by balancing feature refinement and memory consumption, making it suitable for large-scale remote sensing applications. Extensive experiments on benchmark datasets, including DeepGlobe and Inria Aerial, demonstrate that DLNet achieves state-of-the-art segmentation accuracy while maintaining computational efficiency. On the DeepGlobe dataset, DLNet achieves a <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>76.9</mn><mo>%</mo></mrow></semantics></math></inline-formula> mean intersection over union (mIoU), outperforming existing models such as GLNet (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>71.6</mn><mo>%</mo></mrow></semantics></math></inline-formula>) and EHSNet (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>76.3</mn><mo>%</mo></mrow></semantics></math></inline-formula>), while requiring lower memory (1443 MB) and maintaining a competitive inference speed of 518.3 ms per image. On the Inria Aerial dataset, DLNet attains an mIoU of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>73.6</mn><mo>%</mo></mrow></semantics></math></inline-formula>, surpassing GLNet (<inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>71.2</mn><mo>%</mo></mrow></semantics></math></inline-formula>) while reducing computational cost and achieving an inference speed of 119.4 ms per image. These results highlight DLNet’s effectiveness in achieving precise and efficient segmentation in high-resolution remote sensing imagery.https://www.mdpi.com/2072-4292/17/7/1119high-resolution imageryremote sensingsemantic segmentationself-attentioncross-attention
spellingShingle	Weijun Meng Lianlei Shan Sugang Ma Dan Liu Bin Hu DLNet: A Dual-Level Network with Self- and Cross-Attention for High-Resolution Remote Sensing Segmentation Remote Sensing high-resolution imagery remote sensing semantic segmentation self-attention cross-attention
title	DLNet: A Dual-Level Network with Self- and Cross-Attention for High-Resolution Remote Sensing Segmentation
title_full	DLNet: A Dual-Level Network with Self- and Cross-Attention for High-Resolution Remote Sensing Segmentation
title_fullStr	DLNet: A Dual-Level Network with Self- and Cross-Attention for High-Resolution Remote Sensing Segmentation
title_full_unstemmed	DLNet: A Dual-Level Network with Self- and Cross-Attention for High-Resolution Remote Sensing Segmentation
title_short	DLNet: A Dual-Level Network with Self- and Cross-Attention for High-Resolution Remote Sensing Segmentation
title_sort	dlnet a dual level network with self and cross attention for high resolution remote sensing segmentation
topic	high-resolution imagery remote sensing semantic segmentation self-attention cross-attention
url	https://www.mdpi.com/2072-4292/17/7/1119
work_keys_str_mv	AT weijunmeng dlnetaduallevelnetworkwithselfandcrossattentionforhighresolutionremotesensingsegmentation AT lianleishan dlnetaduallevelnetworkwithselfandcrossattentionforhighresolutionremotesensingsegmentation AT sugangma dlnetaduallevelnetworkwithselfandcrossattentionforhighresolutionremotesensingsegmentation AT danliu dlnetaduallevelnetworkwithselfandcrossattentionforhighresolutionremotesensingsegmentation AT binhu dlnetaduallevelnetworkwithselfandcrossattentionforhighresolutionremotesensingsegmentation

DLNet: A Dual-Level Network with Self- and Cross-Attention for High-Resolution Remote Sensing Segmentation

Similar Items