Remote Sensing Image Semantic Segmentation Sample Generation Using a Decoupled Latent Diffusion Framework
This paper addresses the challenges of sample scarcity and class imbalance in remote sensing image semantic segmentation by proposing a decoupled synthetic sample generation framework based on a latent diffusion model. The method consists of two stages. In the label generation stage, we fine-tune a...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-06-01
|
| Series: | Remote Sensing |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2072-4292/17/13/2143 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850115741643177984 |
|---|---|
| author | Yue Xu Honghao Liu Ruixia Yang Zhengchao Chen |
| author_facet | Yue Xu Honghao Liu Ruixia Yang Zhengchao Chen |
| author_sort | Yue Xu |
| collection | DOAJ |
| description | This paper addresses the challenges of sample scarcity and class imbalance in remote sensing image semantic segmentation by proposing a decoupled synthetic sample generation framework based on a latent diffusion model. The method consists of two stages. In the label generation stage, we fine-tune a pretrained latent diffusion model with LoRA to generate semantic label masks from textual descriptions. A novel proportion-aware loss function explicitly penalizes deviations from the desired class distribution in the generated mask. In the image generation stage, we use ControlNet to train a multi-condition image generation network that takes the synthesized mask, along with its text description, as input and produces a realistic remote sensing image. The base Stable Diffusion model’s weights remain frozen during this process, with the trainable ControlNet ensuring that outputs are structurally and semantically aligned with the input labels. This two-stage approach yields coherent image–mask pairs that are well-suited for training segmentation models. Experiments show that models trained on the synthetic samples produced by the proposed method achieve high visual quality and semantic consistency. The proportion-aware loss effectively mitigates the impact of minority classes, boosting segmentation performance on under-represented categories. Results also reveal that adding a suitable proportion of synthetic sample improves segmentation accuracy, whereas an excessive share can cause over-fitting or misclassification. Comparative tests across multiple models confirm the generality and robustness of the approach. |
| format | Article |
| id | doaj-art-5864c69930c449c3aeeed2624f3e0e30 |
| institution | OA Journals |
| issn | 2072-4292 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Remote Sensing |
| spelling | doaj-art-5864c69930c449c3aeeed2624f3e0e302025-08-20T02:36:30ZengMDPI AGRemote Sensing2072-42922025-06-011713214310.3390/rs17132143Remote Sensing Image Semantic Segmentation Sample Generation Using a Decoupled Latent Diffusion FrameworkYue Xu0Honghao Liu1Ruixia Yang2Zhengchao Chen3State Key Laboratory of Remote Sensing and Digital Earth, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100101, ChinaState Key Laboratory of Remote Sensing and Digital Earth, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100101, ChinaInternational Research Center of Big Data for Sustainable Development Goals, Beijing 100094, ChinaState Key Laboratory of Remote Sensing and Digital Earth, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100101, ChinaThis paper addresses the challenges of sample scarcity and class imbalance in remote sensing image semantic segmentation by proposing a decoupled synthetic sample generation framework based on a latent diffusion model. The method consists of two stages. In the label generation stage, we fine-tune a pretrained latent diffusion model with LoRA to generate semantic label masks from textual descriptions. A novel proportion-aware loss function explicitly penalizes deviations from the desired class distribution in the generated mask. In the image generation stage, we use ControlNet to train a multi-condition image generation network that takes the synthesized mask, along with its text description, as input and produces a realistic remote sensing image. The base Stable Diffusion model’s weights remain frozen during this process, with the trainable ControlNet ensuring that outputs are structurally and semantically aligned with the input labels. This two-stage approach yields coherent image–mask pairs that are well-suited for training segmentation models. Experiments show that models trained on the synthetic samples produced by the proposed method achieve high visual quality and semantic consistency. The proportion-aware loss effectively mitigates the impact of minority classes, boosting segmentation performance on under-represented categories. Results also reveal that adding a suitable proportion of synthetic sample improves segmentation accuracy, whereas an excessive share can cause over-fitting or misclassification. Comparative tests across multiple models confirm the generality and robustness of the approach.https://www.mdpi.com/2072-4292/17/13/2143remote sensingsemantic segmentationsample synthesisdiffusion modelsclass imbalancelatent diffusion |
| spellingShingle | Yue Xu Honghao Liu Ruixia Yang Zhengchao Chen Remote Sensing Image Semantic Segmentation Sample Generation Using a Decoupled Latent Diffusion Framework Remote Sensing remote sensing semantic segmentation sample synthesis diffusion models class imbalance latent diffusion |
| title | Remote Sensing Image Semantic Segmentation Sample Generation Using a Decoupled Latent Diffusion Framework |
| title_full | Remote Sensing Image Semantic Segmentation Sample Generation Using a Decoupled Latent Diffusion Framework |
| title_fullStr | Remote Sensing Image Semantic Segmentation Sample Generation Using a Decoupled Latent Diffusion Framework |
| title_full_unstemmed | Remote Sensing Image Semantic Segmentation Sample Generation Using a Decoupled Latent Diffusion Framework |
| title_short | Remote Sensing Image Semantic Segmentation Sample Generation Using a Decoupled Latent Diffusion Framework |
| title_sort | remote sensing image semantic segmentation sample generation using a decoupled latent diffusion framework |
| topic | remote sensing semantic segmentation sample synthesis diffusion models class imbalance latent diffusion |
| url | https://www.mdpi.com/2072-4292/17/13/2143 |
| work_keys_str_mv | AT yuexu remotesensingimagesemanticsegmentationsamplegenerationusingadecoupledlatentdiffusionframework AT honghaoliu remotesensingimagesemanticsegmentationsamplegenerationusingadecoupledlatentdiffusionframework AT ruixiayang remotesensingimagesemanticsegmentationsamplegenerationusingadecoupledlatentdiffusionframework AT zhengchaochen remotesensingimagesemanticsegmentationsamplegenerationusingadecoupledlatentdiffusionframework |