Recurrent Flow Update Model Using Image Pyramid Structure for 4K Video Frame Interpolation

Video frame interpolation (VFI) is a task that generates intermediate frames from two consecutive frames. Previous studies have employed two main approaches to extract the necessary information from both frames: pixel-level synthesis and flow-based methods. However, when synthesizing high-resolution...

Full description

Saved in:
Bibliographic Details
Main Authors: Sangjin Lee, Chajin Shin, Hong-Goo Kang, Sangyoun Lee
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/1/290
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841548935170097152
author Sangjin Lee
Chajin Shin
Hong-Goo Kang
Sangyoun Lee
author_facet Sangjin Lee
Chajin Shin
Hong-Goo Kang
Sangyoun Lee
author_sort Sangjin Lee
collection DOAJ
description Video frame interpolation (VFI) is a task that generates intermediate frames from two consecutive frames. Previous studies have employed two main approaches to extract the necessary information from both frames: pixel-level synthesis and flow-based methods. However, when synthesizing high-resolution videos using VFI, each approach has its limitations. Pixel-level synthesis based on the transformer architecture requires high complexity to achieve 4K video results. In the case of flow-based methods, forward warping can produce holes where pixels are not allocated, while backward warping approaches struggle to obtain accurate backward flow. Additionally, there are challenges during the training stage; previous works have often generated suboptimal results by training multi-stage model architectures separately. To address these issues, we propose a Recurrent Flow Update (RFU) model trained in an end-to-end manner. We introduce a global flow update module that leverages global information to mitigate the weaknesses of forward flow and gradually correct errors. We demonstrate the effectiveness of our method through several ablation studies. Our approach achieves state-of-the-art performance not only on the XTest and Davis datasets, which have 4K resolution, but also on the SNU-FILM dataset, which features large motions at low resolution.
format Article
id doaj-art-94f296ab8a4248d1b00c6e5c31673706
institution Kabale University
issn 1424-8220
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj-art-94f296ab8a4248d1b00c6e5c316737062025-01-10T13:21:29ZengMDPI AGSensors1424-82202025-01-0125129010.3390/s25010290Recurrent Flow Update Model Using Image Pyramid Structure for 4K Video Frame InterpolationSangjin Lee0Chajin Shin1Hong-Goo Kang2Sangyoun Lee3School of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Republic of KoreaSchool of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Republic of KoreaSchool of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Republic of KoreaSchool of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Republic of KoreaVideo frame interpolation (VFI) is a task that generates intermediate frames from two consecutive frames. Previous studies have employed two main approaches to extract the necessary information from both frames: pixel-level synthesis and flow-based methods. However, when synthesizing high-resolution videos using VFI, each approach has its limitations. Pixel-level synthesis based on the transformer architecture requires high complexity to achieve 4K video results. In the case of flow-based methods, forward warping can produce holes where pixels are not allocated, while backward warping approaches struggle to obtain accurate backward flow. Additionally, there are challenges during the training stage; previous works have often generated suboptimal results by training multi-stage model architectures separately. To address these issues, we propose a Recurrent Flow Update (RFU) model trained in an end-to-end manner. We introduce a global flow update module that leverages global information to mitigate the weaknesses of forward flow and gradually correct errors. We demonstrate the effectiveness of our method through several ablation studies. Our approach achieves state-of-the-art performance not only on the XTest and Davis datasets, which have 4K resolution, but also on the SNU-FILM dataset, which features large motions at low resolution.https://www.mdpi.com/1424-8220/25/1/290video frame interpolationend-to-end learninghierarchical flow refinementdifference map
spellingShingle Sangjin Lee
Chajin Shin
Hong-Goo Kang
Sangyoun Lee
Recurrent Flow Update Model Using Image Pyramid Structure for 4K Video Frame Interpolation
Sensors
video frame interpolation
end-to-end learning
hierarchical flow refinement
difference map
title Recurrent Flow Update Model Using Image Pyramid Structure for 4K Video Frame Interpolation
title_full Recurrent Flow Update Model Using Image Pyramid Structure for 4K Video Frame Interpolation
title_fullStr Recurrent Flow Update Model Using Image Pyramid Structure for 4K Video Frame Interpolation
title_full_unstemmed Recurrent Flow Update Model Using Image Pyramid Structure for 4K Video Frame Interpolation
title_short Recurrent Flow Update Model Using Image Pyramid Structure for 4K Video Frame Interpolation
title_sort recurrent flow update model using image pyramid structure for 4k video frame interpolation
topic video frame interpolation
end-to-end learning
hierarchical flow refinement
difference map
url https://www.mdpi.com/1424-8220/25/1/290
work_keys_str_mv AT sangjinlee recurrentflowupdatemodelusingimagepyramidstructurefor4kvideoframeinterpolation
AT chajinshin recurrentflowupdatemodelusingimagepyramidstructurefor4kvideoframeinterpolation
AT honggookang recurrentflowupdatemodelusingimagepyramidstructurefor4kvideoframeinterpolation
AT sangyounlee recurrentflowupdatemodelusingimagepyramidstructurefor4kvideoframeinterpolation