VG-CGARN: Video Generation Using Convolutional Generative Adversarial and Recurrent Networks
Generating dynamic videos from static images and accurately modeling object motion within scenes are fundamental challenges in computer vision, with broad applications in video enhancement, photo animation, and visual scene understanding. This paper proposes a novel hybrid framework that combines co...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
University of science and culture
2025-04-01
|
| Series: | International Journal of Web Research |
| Subjects: | |
| Online Access: | https://ijwr.usc.ac.ir/article_221691_14280a8d79682e6da4ec6512fb2d9842.pdf |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Generating dynamic videos from static images and accurately modeling object motion within scenes are fundamental challenges in computer vision, with broad applications in video enhancement, photo animation, and visual scene understanding. This paper proposes a novel hybrid framework that combines convolutional neural networks (CNNs), recurrent neural networks (RNNs) with long short-term memory (LSTM) units, and generative adversarial networks (GANs) to synthesize temporally consistent and spatially realistic video sequences from still images. The architecture incorporates splicing techniques, the Lucas-Kanade motion estimation algorithm, and a loop feedback mechanism to address key limitations of existing approaches, including motion instability, temporal noise, and degraded video quality over time. CNNs extract spatial features, LSTMs model temporal dynamics, and GANs enhance visual realism through adversarial training. Experimental results on the KTH dataset, comprising 600 videos of fundamental human actions, demonstrate that the proposed method achieves substantial improvements over baseline models, reaching a peak PSNR of 35.8 and SSIM of 0.96—representing a 20% performance gain. The model successfully generates high-quality, 10-second videos at a resolution of 720×1280 pixels with significantly reduced noise, confirming the effectiveness of the integrated splicing and feedback strategy for stable and coherent video generation. |
|---|---|
| ISSN: | 2645-4343 |