Remote Sensing Image Semantic Segmentation Sample Generation Using a Decoupled Latent Diffusion Framework

This paper addresses the challenges of sample scarcity and class imbalance in remote sensing image semantic segmentation by proposing a decoupled synthetic sample generation framework based on a latent diffusion model. The method consists of two stages. In the label generation stage, we fine-tune a...

Full description

Saved in:
Bibliographic Details
Main Authors: Yue Xu, Honghao Liu, Ruixia Yang, Zhengchao Chen
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/17/13/2143
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850115741643177984
author Yue Xu
Honghao Liu
Ruixia Yang
Zhengchao Chen
author_facet Yue Xu
Honghao Liu
Ruixia Yang
Zhengchao Chen
author_sort Yue Xu
collection DOAJ
description This paper addresses the challenges of sample scarcity and class imbalance in remote sensing image semantic segmentation by proposing a decoupled synthetic sample generation framework based on a latent diffusion model. The method consists of two stages. In the label generation stage, we fine-tune a pretrained latent diffusion model with LoRA to generate semantic label masks from textual descriptions. A novel proportion-aware loss function explicitly penalizes deviations from the desired class distribution in the generated mask. In the image generation stage, we use ControlNet to train a multi-condition image generation network that takes the synthesized mask, along with its text description, as input and produces a realistic remote sensing image. The base Stable Diffusion model’s weights remain frozen during this process, with the trainable ControlNet ensuring that outputs are structurally and semantically aligned with the input labels. This two-stage approach yields coherent image–mask pairs that are well-suited for training segmentation models. Experiments show that models trained on the synthetic samples produced by the proposed method achieve high visual quality and semantic consistency. The proportion-aware loss effectively mitigates the impact of minority classes, boosting segmentation performance on under-represented categories. Results also reveal that adding a suitable proportion of synthetic sample improves segmentation accuracy, whereas an excessive share can cause over-fitting or misclassification. Comparative tests across multiple models confirm the generality and robustness of the approach.
format Article
id doaj-art-5864c69930c449c3aeeed2624f3e0e30
institution OA Journals
issn 2072-4292
language English
publishDate 2025-06-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj-art-5864c69930c449c3aeeed2624f3e0e302025-08-20T02:36:30ZengMDPI AGRemote Sensing2072-42922025-06-011713214310.3390/rs17132143Remote Sensing Image Semantic Segmentation Sample Generation Using a Decoupled Latent Diffusion FrameworkYue Xu0Honghao Liu1Ruixia Yang2Zhengchao Chen3State Key Laboratory of Remote Sensing and Digital Earth, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100101, ChinaState Key Laboratory of Remote Sensing and Digital Earth, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100101, ChinaInternational Research Center of Big Data for Sustainable Development Goals, Beijing 100094, ChinaState Key Laboratory of Remote Sensing and Digital Earth, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100101, ChinaThis paper addresses the challenges of sample scarcity and class imbalance in remote sensing image semantic segmentation by proposing a decoupled synthetic sample generation framework based on a latent diffusion model. The method consists of two stages. In the label generation stage, we fine-tune a pretrained latent diffusion model with LoRA to generate semantic label masks from textual descriptions. A novel proportion-aware loss function explicitly penalizes deviations from the desired class distribution in the generated mask. In the image generation stage, we use ControlNet to train a multi-condition image generation network that takes the synthesized mask, along with its text description, as input and produces a realistic remote sensing image. The base Stable Diffusion model’s weights remain frozen during this process, with the trainable ControlNet ensuring that outputs are structurally and semantically aligned with the input labels. This two-stage approach yields coherent image–mask pairs that are well-suited for training segmentation models. Experiments show that models trained on the synthetic samples produced by the proposed method achieve high visual quality and semantic consistency. The proportion-aware loss effectively mitigates the impact of minority classes, boosting segmentation performance on under-represented categories. Results also reveal that adding a suitable proportion of synthetic sample improves segmentation accuracy, whereas an excessive share can cause over-fitting or misclassification. Comparative tests across multiple models confirm the generality and robustness of the approach.https://www.mdpi.com/2072-4292/17/13/2143remote sensingsemantic segmentationsample synthesisdiffusion modelsclass imbalancelatent diffusion
spellingShingle Yue Xu
Honghao Liu
Ruixia Yang
Zhengchao Chen
Remote Sensing Image Semantic Segmentation Sample Generation Using a Decoupled Latent Diffusion Framework
Remote Sensing
remote sensing
semantic segmentation
sample synthesis
diffusion models
class imbalance
latent diffusion
title Remote Sensing Image Semantic Segmentation Sample Generation Using a Decoupled Latent Diffusion Framework
title_full Remote Sensing Image Semantic Segmentation Sample Generation Using a Decoupled Latent Diffusion Framework
title_fullStr Remote Sensing Image Semantic Segmentation Sample Generation Using a Decoupled Latent Diffusion Framework
title_full_unstemmed Remote Sensing Image Semantic Segmentation Sample Generation Using a Decoupled Latent Diffusion Framework
title_short Remote Sensing Image Semantic Segmentation Sample Generation Using a Decoupled Latent Diffusion Framework
title_sort remote sensing image semantic segmentation sample generation using a decoupled latent diffusion framework
topic remote sensing
semantic segmentation
sample synthesis
diffusion models
class imbalance
latent diffusion
url https://www.mdpi.com/2072-4292/17/13/2143
work_keys_str_mv AT yuexu remotesensingimagesemanticsegmentationsamplegenerationusingadecoupledlatentdiffusionframework
AT honghaoliu remotesensingimagesemanticsegmentationsamplegenerationusingadecoupledlatentdiffusionframework
AT ruixiayang remotesensingimagesemanticsegmentationsamplegenerationusingadecoupledlatentdiffusionframework
AT zhengchaochen remotesensingimagesemanticsegmentationsamplegenerationusingadecoupledlatentdiffusionframework