Synergistic Semantic Segmentation and Height Estimation for Monocular Remote Sensing Images via Cross-Task Interaction
Semantic segmentation and height estimation in remote sensing imagery are two pivotal tasks for scene understanding, and they are highly interrelated. Although deep learning methods have achieved remarkable progress in these tasks in recent years, several challenges remain. Recent studies have shown...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Remote Sensing |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2072-4292/17/9/1637 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Semantic segmentation and height estimation in remote sensing imagery are two pivotal tasks for scene understanding, and they are highly interrelated. Although deep learning methods have achieved remarkable progress in these tasks in recent years, several challenges remain. Recent studies have shown that multi-task learning methods can enhance the complementarity of task-related features, thus maximizing the prediction accuracy of multiple tasks at a low computational cost. However, due to factors such as complex semantic categories and the inconsistent spatial scales of remotely sensed images, existing multi-task learning methods often fail to achieve better results on these two tasks. To address this issue, we propose CTME-Net, a novel architecture termed the Cross-Task Mutual Enhancement Network, designed to jointly perform height estimation and semantic segmentation tasks on remote sensing imagery. Firstly, to generate discriminative initial features for each task branch and activate dedicated pathways for cross-task feature disentanglement, we design a universal initial feature embedding module for each downstream task. Secondly, to address the impact of redundancy in general features during global–local fusion, we develop an Adaptive Task-specific Feature Distillation Module that enhances the model’s ability to acquire task-specific features. Finally, we propose a task feature interaction module to optimize features across tasks through mutual optimization, maximizing task-specific feature expression. We conduct extensive experiments on the ISPRS Vaihingen and Potsdam datasets to validate the effectiveness of our approach. The results demonstrate that our proposed method outperforms existing methods in both height estimation and semantic segmentation. |
|---|---|
| ISSN: | 2072-4292 |