VSRDiff: Learning Inter-Frame Temporal Coherence in Diffusion Model for Video Super-Resolution

Video Super-Resolution (VSR) aims to reconstruct high-quality high-resolution (HR) videos from low-resolution (LR) inputs. Recent studies have explored diffusion models (DMs) for VSR by exploiting their generative priors to produce realistic details. However, the inherent randomness of diffusion mod...

Full description

Saved in:

Bibliographic Details
Main Authors:	Linlin Liu, Lele Niu, Jun Tang, Yong Ding
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Video super-resolution diffusion models denoising diffusion probabilistic models deep learning convolutional neural network
Online Access:	https://ieeexplore.ieee.org/document/10840194/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832590356522205184
author	Linlin Liu Lele Niu Jun Tang Yong Ding
author_facet	Linlin Liu Lele Niu Jun Tang Yong Ding
author_sort	Linlin Liu
collection	DOAJ
description	Video Super-Resolution (VSR) aims to reconstruct high-quality high-resolution (HR) videos from low-resolution (LR) inputs. Recent studies have explored diffusion models (DMs) for VSR by exploiting their generative priors to produce realistic details. However, the inherent randomness of diffusion models presents significant challenges for controlling content. In particular, current DM-based VSR methods often neglect inter-frame temporal coherence and reconstruction-oriented objectives, leading to visual distortion and temporal inconsistency. In this paper, we introduce VSRDiff, a DM-based framework for VSR that emphasizes inter-frame temporal coherence and adopts a novel reconstruction perspective. Specifically, the Inter-Frame Aggregation Guidance (IFAG) module is developed to learn contextual inter-frame aggregation guidance, alleviating visual distortion caused by the randomness of diffusion models. Furthermore, the Progressive Reconstruction Sampling (PRS) approach is employed to generate reconstruction-oriented latents, balancing fidelity and detail richness. Additionally, temporal consistency is enhanced through second-order bidirectional latent propagation using the Flow-guided Latent Correction (FLC) module. Extensive experiments on the REDS4 and Vid4 datasets demonstrate that VSRDiff achieves highly competitive VSR performance with more realistic details, surpassing existing state-of-the-art methods in both visual fidelity and temporal consistency. Specifically, VSRDiff achieves the best scores on the REDS4 dataset in LPIPS, DISTS, and NIQE, with values of 0.1137, 0.0445, and 2.970, respectively. The result will be released at <uri>https://github.com/aigcvsr/VSRDiff</uri>.
format	Article
id	doaj-art-c115a6f4eda34d0682b78d8cadf24ab3
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-c115a6f4eda34d0682b78d8cadf24ab32025-01-24T00:01:24ZengIEEEIEEE Access2169-35362025-01-0113114471146210.1109/ACCESS.2025.352975810840194VSRDiff: Learning Inter-Frame Temporal Coherence in Diffusion Model for Video Super-ResolutionLinlin Liu0https://orcid.org/0009-0008-1914-7033Lele Niu1https://orcid.org/0009-0005-1395-1707Jun Tang2https://orcid.org/0000-0003-0122-9512Yong Ding3https://orcid.org/0000-0002-5226-7511College of Integrated Circuits, Zhejiang University, Hangzhou, ChinaCollege of Integrated Circuits, Zhejiang University, Hangzhou, ChinaCollege of Integrated Circuits, Zhejiang University, Hangzhou, ChinaCollege of Integrated Circuits, Zhejiang University, Hangzhou, ChinaVideo Super-Resolution (VSR) aims to reconstruct high-quality high-resolution (HR) videos from low-resolution (LR) inputs. Recent studies have explored diffusion models (DMs) for VSR by exploiting their generative priors to produce realistic details. However, the inherent randomness of diffusion models presents significant challenges for controlling content. In particular, current DM-based VSR methods often neglect inter-frame temporal coherence and reconstruction-oriented objectives, leading to visual distortion and temporal inconsistency. In this paper, we introduce VSRDiff, a DM-based framework for VSR that emphasizes inter-frame temporal coherence and adopts a novel reconstruction perspective. Specifically, the Inter-Frame Aggregation Guidance (IFAG) module is developed to learn contextual inter-frame aggregation guidance, alleviating visual distortion caused by the randomness of diffusion models. Furthermore, the Progressive Reconstruction Sampling (PRS) approach is employed to generate reconstruction-oriented latents, balancing fidelity and detail richness. Additionally, temporal consistency is enhanced through second-order bidirectional latent propagation using the Flow-guided Latent Correction (FLC) module. Extensive experiments on the REDS4 and Vid4 datasets demonstrate that VSRDiff achieves highly competitive VSR performance with more realistic details, surpassing existing state-of-the-art methods in both visual fidelity and temporal consistency. Specifically, VSRDiff achieves the best scores on the REDS4 dataset in LPIPS, DISTS, and NIQE, with values of 0.1137, 0.0445, and 2.970, respectively. The result will be released at <uri>https://github.com/aigcvsr/VSRDiff</uri>.https://ieeexplore.ieee.org/document/10840194/Video super-resolutiondiffusion modelsdenoising diffusion probabilistic modelsdeep learningconvolutional neural network
spellingShingle	Linlin Liu Lele Niu Jun Tang Yong Ding VSRDiff: Learning Inter-Frame Temporal Coherence in Diffusion Model for Video Super-Resolution IEEE Access Video super-resolution diffusion models denoising diffusion probabilistic models deep learning convolutional neural network
title	VSRDiff: Learning Inter-Frame Temporal Coherence in Diffusion Model for Video Super-Resolution
title_full	VSRDiff: Learning Inter-Frame Temporal Coherence in Diffusion Model for Video Super-Resolution
title_fullStr	VSRDiff: Learning Inter-Frame Temporal Coherence in Diffusion Model for Video Super-Resolution
title_full_unstemmed	VSRDiff: Learning Inter-Frame Temporal Coherence in Diffusion Model for Video Super-Resolution
title_short	VSRDiff: Learning Inter-Frame Temporal Coherence in Diffusion Model for Video Super-Resolution
title_sort	vsrdiff learning inter frame temporal coherence in diffusion model for video super resolution
topic	Video super-resolution diffusion models denoising diffusion probabilistic models deep learning convolutional neural network
url	https://ieeexplore.ieee.org/document/10840194/
work_keys_str_mv	AT linlinliu vsrdifflearninginterframetemporalcoherenceindiffusionmodelforvideosuperresolution AT leleniu vsrdifflearninginterframetemporalcoherenceindiffusionmodelforvideosuperresolution AT juntang vsrdifflearninginterframetemporalcoherenceindiffusionmodelforvideosuperresolution AT yongding vsrdifflearninginterframetemporalcoherenceindiffusionmodelforvideosuperresolution

VSRDiff: Learning Inter-Frame Temporal Coherence in Diffusion Model for Video Super-Resolution

Similar Items