SceneDiffusion: Scene Generation Model Embedded with Spatial Constraints

Spatial scenes, as fundamental units of geospatial cognition, encompass rich objects and spatial relationships, and their generation techniques hold significant application value in disaster simulation and emergency drills, delayed spatial reconstruction and analysis, and other fields. However, exis...

Full description

Saved in:
Bibliographic Details
Main Authors: Shanshan Yu, Jiaxin Zhu, Jiaqi Li, Xunchun Li, Kai Wang, Jian Tu, Danhuai Guo
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:ISPRS International Journal of Geo-Information
Subjects:
Online Access:https://www.mdpi.com/2220-9964/14/7/250
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Spatial scenes, as fundamental units of geospatial cognition, encompass rich objects and spatial relationships, and their generation techniques hold significant application value in disaster simulation and emergency drills, delayed spatial reconstruction and analysis, and other fields. However, existing studies still face limitations in modeling complex spatial relationships during scene generation, leading to insufficient semantic consistency and geographical accuracy. The advancement of Geospatial Artificial Intelligence (GeoAI) offers a new technical pathway for the intelligent modeling of spatial scenes. Against this backdrop, we propose SceneDiffusion, a scene generation model embedded with spatial constraints, and construct a geospatial scene dataset incorporating spatial relationship descriptions and geographic semantics, aiming to enhance the understanding and modeling capabilities of GeoAI models for spatial information. Specifically, SceneDiffusion employs a spatial scene representation framework to uniformly characterize objects and their topological, directional, and distance relationships, enhances the interactive modeling of objects and relationships through a Spatial relationship Attention-aware Graph (SAG) module, and finally generates high-quality scene images conforming to geographic semantics using a Layout information-guided Conditional Diffusion (LCD) module. Both qualitative and quantitative experiments demonstrate the superiority of SceneDiffusion, achieving a 56.6% reduction in FID and a 35.3% improvement in SSIM compared to baseline methods. Ablation studies confirm the importance of multi-relational modeling with attention mechanisms. By generating scenes that satisfy spatial distribution constraints, this work provides technical support for applications such as emergency scene simulation and virtual scene construction, while also offering insights for theoretical research and methodological innovation in GeoAI.
ISSN:2220-9964