Realistic synthetic dataset generation for cyber-physical systems: a performance evaluation
Abstract Cyber-physical systems (CPS) represent the seamless integration of physical and digital systems connected through network of sensors, actuators, and controllers. Research in the field of CPS is crucial given its application in healthcare, smart cities, transportation, energy management and...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-07-01
|
| Series: | Discover Applied Sciences |
| Online Access: | https://doi.org/10.1007/s42452-025-07292-8 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849334425723076608 |
|---|---|
| author | Yaa Acquaah Kaushik Roy |
| author_facet | Yaa Acquaah Kaushik Roy |
| author_sort | Yaa Acquaah |
| collection | DOAJ |
| description | Abstract Cyber-physical systems (CPS) represent the seamless integration of physical and digital systems connected through network of sensors, actuators, and controllers. Research in the field of CPS is crucial given its application in healthcare, smart cities, transportation, energy management and autonomous vehicles. The development and validation of models, algorithms, and control systems for CPS heavily relies on the availability of quality CPS datasets. However, conducting data collection experiments on complex CPS are constrained by considerable cost, lack of expertise and time involved. Consequently, the availability of genuine CPS dataset is limited. In response to this challenge, this paper leverages DoppelGANger, a Generative Adversarial Network (GAN) based model to generate synthetic CPS data using real four different CPS datasets across various domains: Water Distribution Testbed (WDT), Hardware in the Loop Industrial Control System (HAI), Gas pipeline, and Power system datasets. Furthermore, the synthetic datasets are compared to the real-world data through in-depth statistical analyses, visualization methods, and anomaly detection approaches. The difference in Silhouette scores across all CPS datasets is under 0.53, with the smallest difference of 0.071 in the HAI dataset and the largest difference of 0.525 in the Power System dataset. The Power system dataset exhibited the smallest difference in validation Mean Absolute Error (MAE), with values of 0.0184 for the real data and 0.0196 for the synthetic data. However, the correlation analysis revealed differences in feature relationships between real and synthetic datasets. Furthermore, the t-SNE visualizations for WDT demonstrated improved structural alignment between synthetic and real data as training epochs increased. Nevertheless, the alignment remained inconsistent for the HAI, Gas pipeline, and Power system datasets with increasing number of epochs. These observations highlight the need for more advanced synthetic data generation models within the realm of CPS to improve the fidelity and reliability of synthetic datasets. |
| format | Article |
| id | doaj-art-cf6ee4bc9ae846309227bd78f7eca715 |
| institution | Kabale University |
| issn | 3004-9261 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Springer |
| record_format | Article |
| series | Discover Applied Sciences |
| spelling | doaj-art-cf6ee4bc9ae846309227bd78f7eca7152025-08-20T03:45:34ZengSpringerDiscover Applied Sciences3004-92612025-07-017713010.1007/s42452-025-07292-8Realistic synthetic dataset generation for cyber-physical systems: a performance evaluationYaa Acquaah0Kaushik Roy1Department of Computer Science, North Carolina Agricultural & Technical State UniversityDepartment of Computer Science, North Carolina Agricultural & Technical State UniversityAbstract Cyber-physical systems (CPS) represent the seamless integration of physical and digital systems connected through network of sensors, actuators, and controllers. Research in the field of CPS is crucial given its application in healthcare, smart cities, transportation, energy management and autonomous vehicles. The development and validation of models, algorithms, and control systems for CPS heavily relies on the availability of quality CPS datasets. However, conducting data collection experiments on complex CPS are constrained by considerable cost, lack of expertise and time involved. Consequently, the availability of genuine CPS dataset is limited. In response to this challenge, this paper leverages DoppelGANger, a Generative Adversarial Network (GAN) based model to generate synthetic CPS data using real four different CPS datasets across various domains: Water Distribution Testbed (WDT), Hardware in the Loop Industrial Control System (HAI), Gas pipeline, and Power system datasets. Furthermore, the synthetic datasets are compared to the real-world data through in-depth statistical analyses, visualization methods, and anomaly detection approaches. The difference in Silhouette scores across all CPS datasets is under 0.53, with the smallest difference of 0.071 in the HAI dataset and the largest difference of 0.525 in the Power System dataset. The Power system dataset exhibited the smallest difference in validation Mean Absolute Error (MAE), with values of 0.0184 for the real data and 0.0196 for the synthetic data. However, the correlation analysis revealed differences in feature relationships between real and synthetic datasets. Furthermore, the t-SNE visualizations for WDT demonstrated improved structural alignment between synthetic and real data as training epochs increased. Nevertheless, the alignment remained inconsistent for the HAI, Gas pipeline, and Power system datasets with increasing number of epochs. These observations highlight the need for more advanced synthetic data generation models within the realm of CPS to improve the fidelity and reliability of synthetic datasets.https://doi.org/10.1007/s42452-025-07292-8 |
| spellingShingle | Yaa Acquaah Kaushik Roy Realistic synthetic dataset generation for cyber-physical systems: a performance evaluation Discover Applied Sciences |
| title | Realistic synthetic dataset generation for cyber-physical systems: a performance evaluation |
| title_full | Realistic synthetic dataset generation for cyber-physical systems: a performance evaluation |
| title_fullStr | Realistic synthetic dataset generation for cyber-physical systems: a performance evaluation |
| title_full_unstemmed | Realistic synthetic dataset generation for cyber-physical systems: a performance evaluation |
| title_short | Realistic synthetic dataset generation for cyber-physical systems: a performance evaluation |
| title_sort | realistic synthetic dataset generation for cyber physical systems a performance evaluation |
| url | https://doi.org/10.1007/s42452-025-07292-8 |
| work_keys_str_mv | AT yaaacquaah realisticsyntheticdatasetgenerationforcyberphysicalsystemsaperformanceevaluation AT kaushikroy realisticsyntheticdatasetgenerationforcyberphysicalsystemsaperformanceevaluation |