Improving Image Quality and Controllability in Speed + Angular Velocity to Image Generation Through Synthetic Data for Driving Simulator Generation

With advancements in cross-modal techniques, methods for generating images and videos from text or speech have become increasingly practical. However, research on video generation from modalities other than text or speech remains limited. One major reason for this shortage is the lack of large-scale...

Full description

Saved in:
Bibliographic Details
Main Authors: Yuto Imai, Tomoya Senda, Yusuke Kajiwara
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10824770/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832590313156247552
author Yuto Imai
Tomoya Senda
Yusuke Kajiwara
author_facet Yuto Imai
Tomoya Senda
Yusuke Kajiwara
author_sort Yuto Imai
collection DOAJ
description With advancements in cross-modal techniques, methods for generating images and videos from text or speech have become increasingly practical. However, research on video generation from modalities other than text or speech remains limited. One major reason for this shortage is the lack of large-scale datasets, which leads to skewed data distributions and causes issues such as unresponsiveness to untrained patterns, image collapse, and scene skipping. To address these challenges, this study focuses on a driving simulation generation model, which produces driving scenarios from speed and steering angle information. We propose a synthetic dataset for speed + angular velocity to image (SAV2IMG) generation. Specifically, we create a half-synthetic dataset, in which only the query (control inputs) is synthetic and the response (images) is derived from real data, as well as a fully synthetic dataset, in which both the query and response are synthetic. By doing so, we construct a training environment that enables the model to handle diverse driving patterns and previously unseen control conditions. We conducted experiments comparing three training conditions for SAV2IMG: using real data only, using real plus half-synthetic data, and using real plus both half- and fully synthetic data. The results demonstrate improved image quality as measured by FID, enhanced controllability, and flexible adaptability to unknown control conditions. Moreover, employing fully synthetic data generated from 3D city models allowed for stable responses to unfamiliar scenarios. At the same time, we found that simple physical models failed to fully reproduce complex control patterns. These findings are not only valuable for improving SAV2IMG but also hold broader implications for vehicle-related generative tasks and cross-modal generation models in general. They provide a meaningful foundation for future model development and data augmentation strategies.
format Article
id doaj-art-24a39bb8b7064867b7467893002d659f
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-24a39bb8b7064867b7467893002d659f2025-01-24T00:01:56ZengIEEEIEEE Access2169-35362025-01-0113121681217710.1109/ACCESS.2025.352571410824770Improving Image Quality and Controllability in Speed + Angular Velocity to Image Generation Through Synthetic Data for Driving Simulator GenerationYuto Imai0Tomoya Senda1Yusuke Kajiwara2https://orcid.org/0000-0001-5895-3312Division of Production System Science, Komatsu University, Komatsu, Ishikawa, JapanDivision of Production System Science, Komatsu University, Komatsu, Ishikawa, JapanDivision of Production System Science, Komatsu University, Komatsu, Ishikawa, JapanWith advancements in cross-modal techniques, methods for generating images and videos from text or speech have become increasingly practical. However, research on video generation from modalities other than text or speech remains limited. One major reason for this shortage is the lack of large-scale datasets, which leads to skewed data distributions and causes issues such as unresponsiveness to untrained patterns, image collapse, and scene skipping. To address these challenges, this study focuses on a driving simulation generation model, which produces driving scenarios from speed and steering angle information. We propose a synthetic dataset for speed + angular velocity to image (SAV2IMG) generation. Specifically, we create a half-synthetic dataset, in which only the query (control inputs) is synthetic and the response (images) is derived from real data, as well as a fully synthetic dataset, in which both the query and response are synthetic. By doing so, we construct a training environment that enables the model to handle diverse driving patterns and previously unseen control conditions. We conducted experiments comparing three training conditions for SAV2IMG: using real data only, using real plus half-synthetic data, and using real plus both half- and fully synthetic data. The results demonstrate improved image quality as measured by FID, enhanced controllability, and flexible adaptability to unknown control conditions. Moreover, employing fully synthetic data generated from 3D city models allowed for stable responses to unfamiliar scenarios. At the same time, we found that simple physical models failed to fully reproduce complex control patterns. These findings are not only valuable for improving SAV2IMG but also hold broader implications for vehicle-related generative tasks and cross-modal generation models in general. They provide a meaningful foundation for future model development and data augmentation strategies.https://ieeexplore.ieee.org/document/10824770/Synthetic datasetspeed + angular velocity to image generationcross-modal generationscene skip problem
spellingShingle Yuto Imai
Tomoya Senda
Yusuke Kajiwara
Improving Image Quality and Controllability in Speed + Angular Velocity to Image Generation Through Synthetic Data for Driving Simulator Generation
IEEE Access
Synthetic dataset
speed + angular velocity to image generation
cross-modal generation
scene skip problem
title Improving Image Quality and Controllability in Speed + Angular Velocity to Image Generation Through Synthetic Data for Driving Simulator Generation
title_full Improving Image Quality and Controllability in Speed + Angular Velocity to Image Generation Through Synthetic Data for Driving Simulator Generation
title_fullStr Improving Image Quality and Controllability in Speed + Angular Velocity to Image Generation Through Synthetic Data for Driving Simulator Generation
title_full_unstemmed Improving Image Quality and Controllability in Speed + Angular Velocity to Image Generation Through Synthetic Data for Driving Simulator Generation
title_short Improving Image Quality and Controllability in Speed + Angular Velocity to Image Generation Through Synthetic Data for Driving Simulator Generation
title_sort improving image quality and controllability in speed angular velocity to image generation through synthetic data for driving simulator generation
topic Synthetic dataset
speed + angular velocity to image generation
cross-modal generation
scene skip problem
url https://ieeexplore.ieee.org/document/10824770/
work_keys_str_mv AT yutoimai improvingimagequalityandcontrollabilityinspeedangularvelocitytoimagegenerationthroughsyntheticdatafordrivingsimulatorgeneration
AT tomoyasenda improvingimagequalityandcontrollabilityinspeedangularvelocitytoimagegenerationthroughsyntheticdatafordrivingsimulatorgeneration
AT yusukekajiwara improvingimagequalityandcontrollabilityinspeedangularvelocitytoimagegenerationthroughsyntheticdatafordrivingsimulatorgeneration