Realistic synthetic dataset generation for cyber-physical systems: a performance evaluation

Abstract Cyber-physical systems (CPS) represent the seamless integration of physical and digital systems connected through network of sensors, actuators, and controllers. Research in the field of CPS is crucial given its application in healthcare, smart cities, transportation, energy management and...

Full description

Saved in:
Bibliographic Details
Main Authors: Yaa Acquaah, Kaushik Roy
Format: Article
Language:English
Published: Springer 2025-07-01
Series:Discover Applied Sciences
Online Access:https://doi.org/10.1007/s42452-025-07292-8
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849334425723076608
author Yaa Acquaah
Kaushik Roy
author_facet Yaa Acquaah
Kaushik Roy
author_sort Yaa Acquaah
collection DOAJ
description Abstract Cyber-physical systems (CPS) represent the seamless integration of physical and digital systems connected through network of sensors, actuators, and controllers. Research in the field of CPS is crucial given its application in healthcare, smart cities, transportation, energy management and autonomous vehicles. The development and validation of models, algorithms, and control systems for CPS heavily relies on the availability of quality CPS datasets. However, conducting data collection experiments on complex CPS are constrained by considerable cost, lack of expertise and time involved. Consequently, the availability of genuine CPS dataset is limited. In response to this challenge, this paper leverages DoppelGANger, a Generative Adversarial Network (GAN) based model to generate synthetic CPS data using real four different CPS datasets across various domains: Water Distribution Testbed (WDT), Hardware in the Loop Industrial Control System (HAI), Gas pipeline, and Power system datasets. Furthermore, the synthetic datasets are compared to the real-world data through in-depth statistical analyses, visualization methods, and anomaly detection approaches. The difference in Silhouette scores across all CPS datasets is under 0.53, with the smallest difference of 0.071 in the HAI dataset and the largest difference of 0.525 in the Power System dataset. The Power system dataset exhibited the smallest difference in validation Mean Absolute Error (MAE), with values of 0.0184 for the real data and 0.0196 for the synthetic data. However, the correlation analysis revealed differences in feature relationships between real and synthetic datasets. Furthermore, the t-SNE visualizations for WDT demonstrated improved structural alignment between synthetic and real data as training epochs increased. Nevertheless, the alignment remained inconsistent for the HAI, Gas pipeline, and Power system datasets with increasing number of epochs. These observations highlight the need for more advanced synthetic data generation models within the realm of CPS to improve the fidelity and reliability of synthetic datasets.
format Article
id doaj-art-cf6ee4bc9ae846309227bd78f7eca715
institution Kabale University
issn 3004-9261
language English
publishDate 2025-07-01
publisher Springer
record_format Article
series Discover Applied Sciences
spelling doaj-art-cf6ee4bc9ae846309227bd78f7eca7152025-08-20T03:45:34ZengSpringerDiscover Applied Sciences3004-92612025-07-017713010.1007/s42452-025-07292-8Realistic synthetic dataset generation for cyber-physical systems: a performance evaluationYaa Acquaah0Kaushik Roy1Department of Computer Science, North Carolina Agricultural & Technical State UniversityDepartment of Computer Science, North Carolina Agricultural & Technical State UniversityAbstract Cyber-physical systems (CPS) represent the seamless integration of physical and digital systems connected through network of sensors, actuators, and controllers. Research in the field of CPS is crucial given its application in healthcare, smart cities, transportation, energy management and autonomous vehicles. The development and validation of models, algorithms, and control systems for CPS heavily relies on the availability of quality CPS datasets. However, conducting data collection experiments on complex CPS are constrained by considerable cost, lack of expertise and time involved. Consequently, the availability of genuine CPS dataset is limited. In response to this challenge, this paper leverages DoppelGANger, a Generative Adversarial Network (GAN) based model to generate synthetic CPS data using real four different CPS datasets across various domains: Water Distribution Testbed (WDT), Hardware in the Loop Industrial Control System (HAI), Gas pipeline, and Power system datasets. Furthermore, the synthetic datasets are compared to the real-world data through in-depth statistical analyses, visualization methods, and anomaly detection approaches. The difference in Silhouette scores across all CPS datasets is under 0.53, with the smallest difference of 0.071 in the HAI dataset and the largest difference of 0.525 in the Power System dataset. The Power system dataset exhibited the smallest difference in validation Mean Absolute Error (MAE), with values of 0.0184 for the real data and 0.0196 for the synthetic data. However, the correlation analysis revealed differences in feature relationships between real and synthetic datasets. Furthermore, the t-SNE visualizations for WDT demonstrated improved structural alignment between synthetic and real data as training epochs increased. Nevertheless, the alignment remained inconsistent for the HAI, Gas pipeline, and Power system datasets with increasing number of epochs. These observations highlight the need for more advanced synthetic data generation models within the realm of CPS to improve the fidelity and reliability of synthetic datasets.https://doi.org/10.1007/s42452-025-07292-8
spellingShingle Yaa Acquaah
Kaushik Roy
Realistic synthetic dataset generation for cyber-physical systems: a performance evaluation
Discover Applied Sciences
title Realistic synthetic dataset generation for cyber-physical systems: a performance evaluation
title_full Realistic synthetic dataset generation for cyber-physical systems: a performance evaluation
title_fullStr Realistic synthetic dataset generation for cyber-physical systems: a performance evaluation
title_full_unstemmed Realistic synthetic dataset generation for cyber-physical systems: a performance evaluation
title_short Realistic synthetic dataset generation for cyber-physical systems: a performance evaluation
title_sort realistic synthetic dataset generation for cyber physical systems a performance evaluation
url https://doi.org/10.1007/s42452-025-07292-8
work_keys_str_mv AT yaaacquaah realisticsyntheticdatasetgenerationforcyberphysicalsystemsaperformanceevaluation
AT kaushikroy realisticsyntheticdatasetgenerationforcyberphysicalsystemsaperformanceevaluation