Camera Pose Generation Based on Unity3D

Deep learning models performing complex tasks require the support of datasets. With the advancement of virtual reality technology, the use of virtual datasets in deep learning models is becoming more and more widespread. Indoor scenes represents a significant area of interest for the application of...

Full description

Saved in:
Bibliographic Details
Main Authors: Hao Luo, Wenjie Luo, Wenzhu Yang
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/16/4/315
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Deep learning models performing complex tasks require the support of datasets. With the advancement of virtual reality technology, the use of virtual datasets in deep learning models is becoming more and more widespread. Indoor scenes represents a significant area of interest for the application of machine vision technologies. Existing virtual indoor datasets exhibit deficiencies with regard to camera poses, resulting in problems such as occlusion, object omission, and objects having too small of a proportion of the image, and perform poorly in the training for object detection and simultaneous localization and mapping (SLAM) tasks. Aiming at the problems regarding the capacity of cameras to comprehensively capture scene objects, this study presents an enhanced algorithm based on rapidly exploring random tree star (RRT*) for the generation of camera poses in a 3D indoor scene. Meanwhile, in order to generate multimodal data for various deep learning tasks, this study designs an automatic image acquisition module under the Unity3D platform. The experimental results from running the model on several mainstream virtual indoor datasets—such as 3D-FRONT and Hypersim—indicate that the image sequences generated in this study show enhancements in terms of object capture rate and efficiency. Even in cluttered environments such as those in SceneNet RGB-D, the object capture rate remains stable at around 75%. Compared with the image sequences from the original datasets, those generated in this study achieve improvements in the object detection and SLAM tasks, with increases of up to approximately 30% in mAP for the YOLOv10 object detection task and up to approximately 10% in SR for the ORB-SLAM algorithm.
ISSN:2078-2489