Decoupled Latent Diffusion Model for Enhancing Image Generation

Latent Diffusion Models have emerged as an efficient alternative to conventional diffusion approaches by compressing high-dimensional images into a lower-dimensional latent space using a Variational Autoencoder (VAE) and performing diffusion in that space. In standard Latent Diffusion Model (LDM), t...

Full description

Saved in:

Bibliographic Details
Main Authors:	Hyun-Tae Choi, Kensuke Nakamura, Byung-Woo Hong
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Denoising diffusion model latent representation image generation
Online Access:	https://ieeexplore.ieee.org/document/11091282/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849245913476759552
author	Hyun-Tae Choi Kensuke Nakamura Byung-Woo Hong
author_facet	Hyun-Tae Choi Kensuke Nakamura Byung-Woo Hong
author_sort	Hyun-Tae Choi
collection	DOAJ
description	Latent Diffusion Models have emerged as an efficient alternative to conventional diffusion approaches by compressing high-dimensional images into a lower-dimensional latent space using a Variational Autoencoder (VAE) and performing diffusion in that space. In standard Latent Diffusion Model (LDM), the latent code is formed by sampling from a Gaussian distribution (i.e., combining both the mean and the standard deviation), which helps regularize the latent space but appears to contribute little beyond the deterministic component. Motivated by recent empirical observations that the decoder relies primarily on the latent mean, our work reexamines this paradigm and proposes a decoupled latent diffusion model that focuses on a simplified latent representation. Specifically, we compare three configurations: (i) the standard latent code, (ii) a concatenated representation that explicitly preserves both mean and variance, and (iii) a deterministic mean-only representation. Our extensive experiments on multiple benchmark datasets demonstrate that, when compared to the standard approach, the mean-only configuration not only maintains but in many cases improves synthesis quality by producing sharper and more coherent images while reducing unnecessary noise. These findings suggest that a simplified, deterministic latent representation can yield more stable and efficient generative models, challenging the conventional reliance on latent sampling in diffusion-based image synthesis.
format	Article
id	doaj-art-0754ed6ec3f041b4abd4a58ebccf0d34
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-0754ed6ec3f041b4abd4a58ebccf0d342025-08-20T03:58:40ZengIEEEIEEE Access2169-35362025-01-011313050513051610.1109/ACCESS.2025.359216311091282Decoupled Latent Diffusion Model for Enhancing Image GenerationHyun-Tae Choi0https://orcid.org/0000-0001-8268-0705Kensuke Nakamura1https://orcid.org/0000-0002-6858-3551Byung-Woo Hong2https://orcid.org/0000-0003-2752-3939Department of Artificial Intelligence, Chung-Ang University, Seoul, South KoreaDepartment of Artificial Intelligence, Chung-Ang University, Seoul, South KoreaDepartment of Artificial Intelligence, Chung-Ang University, Seoul, South KoreaLatent Diffusion Models have emerged as an efficient alternative to conventional diffusion approaches by compressing high-dimensional images into a lower-dimensional latent space using a Variational Autoencoder (VAE) and performing diffusion in that space. In standard Latent Diffusion Model (LDM), the latent code is formed by sampling from a Gaussian distribution (i.e., combining both the mean and the standard deviation), which helps regularize the latent space but appears to contribute little beyond the deterministic component. Motivated by recent empirical observations that the decoder relies primarily on the latent mean, our work reexamines this paradigm and proposes a decoupled latent diffusion model that focuses on a simplified latent representation. Specifically, we compare three configurations: (i) the standard latent code, (ii) a concatenated representation that explicitly preserves both mean and variance, and (iii) a deterministic mean-only representation. Our extensive experiments on multiple benchmark datasets demonstrate that, when compared to the standard approach, the mean-only configuration not only maintains but in many cases improves synthesis quality by producing sharper and more coherent images while reducing unnecessary noise. These findings suggest that a simplified, deterministic latent representation can yield more stable and efficient generative models, challenging the conventional reliance on latent sampling in diffusion-based image synthesis.https://ieeexplore.ieee.org/document/11091282/Denoising diffusion modellatent representationimage generation
spellingShingle	Hyun-Tae Choi Kensuke Nakamura Byung-Woo Hong Decoupled Latent Diffusion Model for Enhancing Image Generation IEEE Access Denoising diffusion model latent representation image generation
title	Decoupled Latent Diffusion Model for Enhancing Image Generation
title_full	Decoupled Latent Diffusion Model for Enhancing Image Generation
title_fullStr	Decoupled Latent Diffusion Model for Enhancing Image Generation
title_full_unstemmed	Decoupled Latent Diffusion Model for Enhancing Image Generation
title_short	Decoupled Latent Diffusion Model for Enhancing Image Generation
title_sort	decoupled latent diffusion model for enhancing image generation
topic	Denoising diffusion model latent representation image generation
url	https://ieeexplore.ieee.org/document/11091282/
work_keys_str_mv	AT hyuntaechoi decoupledlatentdiffusionmodelforenhancingimagegeneration AT kensukenakamura decoupledlatentdiffusionmodelforenhancingimagegeneration AT byungwoohong decoupledlatentdiffusionmodelforenhancingimagegeneration

Decoupled Latent Diffusion Model for Enhancing Image Generation

Similar Items