SVPDSA: Selective View Perception Data Synthesis With Annotations Using Lightweight Diffusion Network

The generation of high-quality annotated image datasets with low computational cost and automated labeling is essential for advancing computer vision systems. However, manual labeling of real images is often labor intensive and expensive. To overcome these challenges, proposed a model named SVPDSA,...

Full description

Saved in:
Bibliographic Details
Main Authors: S. Raghavendra, Vijayalakshmi, Vainidhi, S. K. Abhilash, Venu Madhav Nookala, P. V. Arun Kumar, Ramyashree
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11079597/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849420476774875136
author S. Raghavendra
Vijayalakshmi
Vainidhi
S. K. Abhilash
Venu Madhav Nookala
P. V. Arun Kumar
Ramyashree
author_facet S. Raghavendra
Vijayalakshmi
Vainidhi
S. K. Abhilash
Venu Madhav Nookala
P. V. Arun Kumar
Ramyashree
author_sort S. Raghavendra
collection DOAJ
description The generation of high-quality annotated image datasets with low computational cost and automated labeling is essential for advancing computer vision systems. However, manual labeling of real images is often labor intensive and expensive. To overcome these challenges, proposed a model named SVPDSA, a generic dataset generation model that incorporates residual and attention block pruning, reduced sampling steps, and network quantization. The model comprises two main training phases: Refined LDM((Latent Diffusion Model) retraining and P-Decoder(Perception decoder) training. A compressed Unet-based diffusion model, pre-trained on the LAION-5B dataset, serves as the foundation for efficient text-to-image synthesis. The model is trained for approximately 50,000 iterations with a learning rate of 0.0001, ensuring lightweight yet effective generation. The proposed model efficiently generates diverse synthetic images with high-quality perception annotations. The proposed approach utilizes a lightweight trained diffusion model and extends text-guided image synthesis to perception data generation, ensuring the quality of the generated datasets while offering a flexible solution for label generation. A decoder module is introduced to expand latent code features and generate labeled annotations for tasks such as semantic segmentation, instance segmentation, and depth estimation. Training the decoder requires fewer than 100 manually labeled images, enabling the creation of an infinitely large annotated dataset. Evaluation on the Cityscapes dataset demonstrates that SVPDSA matches or surpasses existing methods like Mask2Former and DatasetDM in key object classes, including cars, buses, and bicycles. It achieves a mean IoU of 42.7 with ResNet-50 and 41.4 with Swin-B using only 9 real images and 38k synthetic samples, showcasing its efficiency in generating high-quality annotations with minimal real data. Deploying the proposed models on edge devices results in less than a 5-second inference time.This research contributes toward building resource-efficient data generation systems suitable for constrained training environments.
format Article
id doaj-art-d05d719fd2f341a199d8b3788d4c9dac
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-d05d719fd2f341a199d8b3788d4c9dac2025-08-20T03:31:45ZengIEEEIEEE Access2169-35362025-01-011312405112406710.1109/ACCESS.2025.358854211079597SVPDSA: Selective View Perception Data Synthesis With Annotations Using Lightweight Diffusion NetworkS. Raghavendra0https://orcid.org/0000-0003-2733-3916 Vijayalakshmi1https://orcid.org/0009-0007-5619-3779 Vainidhi2S. K. Abhilash3https://orcid.org/0000-0002-1119-4782Venu Madhav Nookala4https://orcid.org/0000-0002-0078-5050P. V. Arun Kumar5 Ramyashree6https://orcid.org/0000-0002-0237-2444Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, IndiaManipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, IndiaManipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, IndiaKPIT Technologies, Bengaluru, IndiaKPIT Technologies, Bengaluru, IndiaKPIT Technologies, Bengaluru, IndiaManipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, IndiaThe generation of high-quality annotated image datasets with low computational cost and automated labeling is essential for advancing computer vision systems. However, manual labeling of real images is often labor intensive and expensive. To overcome these challenges, proposed a model named SVPDSA, a generic dataset generation model that incorporates residual and attention block pruning, reduced sampling steps, and network quantization. The model comprises two main training phases: Refined LDM((Latent Diffusion Model) retraining and P-Decoder(Perception decoder) training. A compressed Unet-based diffusion model, pre-trained on the LAION-5B dataset, serves as the foundation for efficient text-to-image synthesis. The model is trained for approximately 50,000 iterations with a learning rate of 0.0001, ensuring lightweight yet effective generation. The proposed model efficiently generates diverse synthetic images with high-quality perception annotations. The proposed approach utilizes a lightweight trained diffusion model and extends text-guided image synthesis to perception data generation, ensuring the quality of the generated datasets while offering a flexible solution for label generation. A decoder module is introduced to expand latent code features and generate labeled annotations for tasks such as semantic segmentation, instance segmentation, and depth estimation. Training the decoder requires fewer than 100 manually labeled images, enabling the creation of an infinitely large annotated dataset. Evaluation on the Cityscapes dataset demonstrates that SVPDSA matches or surpasses existing methods like Mask2Former and DatasetDM in key object classes, including cars, buses, and bicycles. It achieves a mean IoU of 42.7 with ResNet-50 and 41.4 with Swin-B using only 9 real images and 38k synthetic samples, showcasing its efficiency in generating high-quality annotations with minimal real data. Deploying the proposed models on edge devices results in less than a 5-second inference time.This research contributes toward building resource-efficient data generation systems suitable for constrained training environments.https://ieeexplore.ieee.org/document/11079597/Annotation generationcomputer visiondepth estimationdiffusion modeledge deploymentinstance segmentation
spellingShingle S. Raghavendra
Vijayalakshmi
Vainidhi
S. K. Abhilash
Venu Madhav Nookala
P. V. Arun Kumar
Ramyashree
SVPDSA: Selective View Perception Data Synthesis With Annotations Using Lightweight Diffusion Network
IEEE Access
Annotation generation
computer vision
depth estimation
diffusion model
edge deployment
instance segmentation
title SVPDSA: Selective View Perception Data Synthesis With Annotations Using Lightweight Diffusion Network
title_full SVPDSA: Selective View Perception Data Synthesis With Annotations Using Lightweight Diffusion Network
title_fullStr SVPDSA: Selective View Perception Data Synthesis With Annotations Using Lightweight Diffusion Network
title_full_unstemmed SVPDSA: Selective View Perception Data Synthesis With Annotations Using Lightweight Diffusion Network
title_short SVPDSA: Selective View Perception Data Synthesis With Annotations Using Lightweight Diffusion Network
title_sort svpdsa selective view perception data synthesis with annotations using lightweight diffusion network
topic Annotation generation
computer vision
depth estimation
diffusion model
edge deployment
instance segmentation
url https://ieeexplore.ieee.org/document/11079597/
work_keys_str_mv AT sraghavendra svpdsaselectiveviewperceptiondatasynthesiswithannotationsusinglightweightdiffusionnetwork
AT vijayalakshmi svpdsaselectiveviewperceptiondatasynthesiswithannotationsusinglightweightdiffusionnetwork
AT vainidhi svpdsaselectiveviewperceptiondatasynthesiswithannotationsusinglightweightdiffusionnetwork
AT skabhilash svpdsaselectiveviewperceptiondatasynthesiswithannotationsusinglightweightdiffusionnetwork
AT venumadhavnookala svpdsaselectiveviewperceptiondatasynthesiswithannotationsusinglightweightdiffusionnetwork
AT pvarunkumar svpdsaselectiveviewperceptiondatasynthesiswithannotationsusinglightweightdiffusionnetwork
AT ramyashree svpdsaselectiveviewperceptiondatasynthesiswithannotationsusinglightweightdiffusionnetwork