Spatial Compression Methods for Latent Diffusion Models

Diffusion models are a family of generative models that provide the best possible quality in many areas, such as image, video, and audio generation. Due to the iterative nature of the work of diffusion models, their speed is several times inferior to other generation methods, which increases the cos...

Full description

Saved in:
Bibliographic Details
Main Authors: Vladimir Abramov, Mikhail Gromov
Format: Article
Language:Russian
Published: The Fund for Promotion of Internet media, IT education, human development «League Internet Media» 2025-04-01
Series:Современные информационные технологии и IT-образование
Subjects:
Online Access:https://sitito.cs.msu.ru/index.php/SITITO/article/view/1189
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Diffusion models are a family of generative models that provide the best possible quality in many areas, such as image, video, and audio generation. Due to the iterative nature of the work of diffusion models, their speed is several times inferior to other generation methods, which increases the cost and training time many times. As a solution to this problem, it was proposed to compress the workspace of the diffusion model. Using spatial compression methods, it is possible to solve the main problems of diffusion models, as well as to obtain previously unavailable generation quality (for example, 4K image generation). Currently, many new works on spatial compression are aimed at working with video, since too many resources are still required when generating high-resolution videos, which limits the maximum duration of the generated video. The development of spatial compression methods helps to solve many practical problems. The paper provides an overview of space compression methods for latent diffusion models.
ISSN:2411-1473