Recurrent Flow Update Model Using Image Pyramid Structure for 4K Video Frame Interpolation
Video frame interpolation (VFI) is a task that generates intermediate frames from two consecutive frames. Previous studies have employed two main approaches to extract the necessary information from both frames: pixel-level synthesis and flow-based methods. However, when synthesizing high-resolution...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-01-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/25/1/290 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841548935170097152 |
---|---|
author | Sangjin Lee Chajin Shin Hong-Goo Kang Sangyoun Lee |
author_facet | Sangjin Lee Chajin Shin Hong-Goo Kang Sangyoun Lee |
author_sort | Sangjin Lee |
collection | DOAJ |
description | Video frame interpolation (VFI) is a task that generates intermediate frames from two consecutive frames. Previous studies have employed two main approaches to extract the necessary information from both frames: pixel-level synthesis and flow-based methods. However, when synthesizing high-resolution videos using VFI, each approach has its limitations. Pixel-level synthesis based on the transformer architecture requires high complexity to achieve 4K video results. In the case of flow-based methods, forward warping can produce holes where pixels are not allocated, while backward warping approaches struggle to obtain accurate backward flow. Additionally, there are challenges during the training stage; previous works have often generated suboptimal results by training multi-stage model architectures separately. To address these issues, we propose a Recurrent Flow Update (RFU) model trained in an end-to-end manner. We introduce a global flow update module that leverages global information to mitigate the weaknesses of forward flow and gradually correct errors. We demonstrate the effectiveness of our method through several ablation studies. Our approach achieves state-of-the-art performance not only on the XTest and Davis datasets, which have 4K resolution, but also on the SNU-FILM dataset, which features large motions at low resolution. |
format | Article |
id | doaj-art-94f296ab8a4248d1b00c6e5c31673706 |
institution | Kabale University |
issn | 1424-8220 |
language | English |
publishDate | 2025-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Sensors |
spelling | doaj-art-94f296ab8a4248d1b00c6e5c316737062025-01-10T13:21:29ZengMDPI AGSensors1424-82202025-01-0125129010.3390/s25010290Recurrent Flow Update Model Using Image Pyramid Structure for 4K Video Frame InterpolationSangjin Lee0Chajin Shin1Hong-Goo Kang2Sangyoun Lee3School of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Republic of KoreaSchool of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Republic of KoreaSchool of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Republic of KoreaSchool of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Republic of KoreaVideo frame interpolation (VFI) is a task that generates intermediate frames from two consecutive frames. Previous studies have employed two main approaches to extract the necessary information from both frames: pixel-level synthesis and flow-based methods. However, when synthesizing high-resolution videos using VFI, each approach has its limitations. Pixel-level synthesis based on the transformer architecture requires high complexity to achieve 4K video results. In the case of flow-based methods, forward warping can produce holes where pixels are not allocated, while backward warping approaches struggle to obtain accurate backward flow. Additionally, there are challenges during the training stage; previous works have often generated suboptimal results by training multi-stage model architectures separately. To address these issues, we propose a Recurrent Flow Update (RFU) model trained in an end-to-end manner. We introduce a global flow update module that leverages global information to mitigate the weaknesses of forward flow and gradually correct errors. We demonstrate the effectiveness of our method through several ablation studies. Our approach achieves state-of-the-art performance not only on the XTest and Davis datasets, which have 4K resolution, but also on the SNU-FILM dataset, which features large motions at low resolution.https://www.mdpi.com/1424-8220/25/1/290video frame interpolationend-to-end learninghierarchical flow refinementdifference map |
spellingShingle | Sangjin Lee Chajin Shin Hong-Goo Kang Sangyoun Lee Recurrent Flow Update Model Using Image Pyramid Structure for 4K Video Frame Interpolation Sensors video frame interpolation end-to-end learning hierarchical flow refinement difference map |
title | Recurrent Flow Update Model Using Image Pyramid Structure for 4K Video Frame Interpolation |
title_full | Recurrent Flow Update Model Using Image Pyramid Structure for 4K Video Frame Interpolation |
title_fullStr | Recurrent Flow Update Model Using Image Pyramid Structure for 4K Video Frame Interpolation |
title_full_unstemmed | Recurrent Flow Update Model Using Image Pyramid Structure for 4K Video Frame Interpolation |
title_short | Recurrent Flow Update Model Using Image Pyramid Structure for 4K Video Frame Interpolation |
title_sort | recurrent flow update model using image pyramid structure for 4k video frame interpolation |
topic | video frame interpolation end-to-end learning hierarchical flow refinement difference map |
url | https://www.mdpi.com/1424-8220/25/1/290 |
work_keys_str_mv | AT sangjinlee recurrentflowupdatemodelusingimagepyramidstructurefor4kvideoframeinterpolation AT chajinshin recurrentflowupdatemodelusingimagepyramidstructurefor4kvideoframeinterpolation AT honggookang recurrentflowupdatemodelusingimagepyramidstructurefor4kvideoframeinterpolation AT sangyounlee recurrentflowupdatemodelusingimagepyramidstructurefor4kvideoframeinterpolation |