VG-CGARN: Video Generation Using Convolutional Generative Adversarial and Recurrent Networks

Generating dynamic videos from static images and accurately modeling object motion within scenes are fundamental challenges in computer vision, with broad applications in video enhancement, photo animation, and visual scene understanding. This paper proposes a novel hybrid framework that combines co...

Full description

Saved in:
Bibliographic Details
Main Authors: Fatemeh Sobhani Manesh, Amin Nazari, Muharram Mansoorizadeh, MirHossein Dezfoulian
Format: Article
Language:English
Published: University of science and culture 2025-04-01
Series:International Journal of Web Research
Subjects:
Online Access:https://ijwr.usc.ac.ir/article_221691_14280a8d79682e6da4ec6512fb2d9842.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Generating dynamic videos from static images and accurately modeling object motion within scenes are fundamental challenges in computer vision, with broad applications in video enhancement, photo animation, and visual scene understanding. This paper proposes a novel hybrid framework that combines convolutional neural networks (CNNs), recurrent neural networks (RNNs) with long short-term memory (LSTM) units, and generative adversarial networks (GANs) to synthesize temporally consistent and spatially realistic video sequences from still images. The architecture incorporates splicing techniques, the Lucas-Kanade motion estimation algorithm, and a loop feedback mechanism to address key limitations of existing approaches, including motion instability, temporal noise, and degraded video quality over time. CNNs extract spatial features, LSTMs model temporal dynamics, and GANs enhance visual realism through adversarial training. Experimental results on the KTH dataset, comprising 600 videos of fundamental human actions, demonstrate that the proposed method achieves substantial improvements over baseline models, reaching a peak PSNR of 35.8 and SSIM of 0.96—representing a 20% performance gain. The model successfully generates high-quality, 10-second videos at a resolution of 720×1280 pixels with significantly reduced noise, confirming the effectiveness of the integrated splicing and feedback strategy for stable and coherent video generation.
ISSN:2645-4343