Text-Conditioned Diffusion-Based Synthetic Data Generation for Turbine Engine Sensor Analysis and RUL Estimation
This paper introduces a novel framework for generating synthetic time-series data from turbine engine sensor readings using a text-conditioned diffusion model. The approach begins with dataset preprocessing, including correlation analysis, feature selection, and normalization. Principal Component An...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-04-01
|
| Series: | Machines |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2075-1702/13/5/374 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849327071747112960 |
|---|---|
| author | Luis Pablo Mora-de-León David Solís-Martín Juan Galán-Páez Joaquín Borrego-Díaz |
| author_facet | Luis Pablo Mora-de-León David Solís-Martín Juan Galán-Páez Joaquín Borrego-Díaz |
| author_sort | Luis Pablo Mora-de-León |
| collection | DOAJ |
| description | This paper introduces a novel framework for generating synthetic time-series data from turbine engine sensor readings using a text-conditioned diffusion model. The approach begins with dataset preprocessing, including correlation analysis, feature selection, and normalization. Principal Component Analysis (PCA) transforms the normalized signals into three components, mapped to the RGB channels of an image. These components, combined with engine identifiers and cycle information, form compact 19 × 19 × 3 pixel images, later scaled to 512 × 512 × 3 pixels. A variational autoencoder (VAE)-based diffusion model, fine-tuned on these images, leverages text prompts describing engine characteristics to generate high-quality synthetic samples. A reverse transformation pipeline reconstructs synthetic images back into time-series signals, preserving the original engine-specific attributes while removing padding artifacts. The quality of the synthetic data is assessed by training Remaining Useful Life (RUL) estimation models and comparing performance across original, synthetic, and combined datasets. Results demonstrate that synthetic data can be beneficial for model training, particularly in the early epochs when working with limited datasets. Compared to existing approaches, which rely on generative adversarial networks (GANs) or deterministic transformations, the proposed framework offers enhanced data fidelity and adaptability. This study highlights the potential of text-conditioned diffusion models for augmenting time-series datasets in industrial Prognostics and Health Management (PHM) applications. |
| format | Article |
| id | doaj-art-4903c8f08e8649fe967dd01f31daf56f |
| institution | Kabale University |
| issn | 2075-1702 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Machines |
| spelling | doaj-art-4903c8f08e8649fe967dd01f31daf56f2025-08-20T03:47:58ZengMDPI AGMachines2075-17022025-04-0113537410.3390/machines13050374Text-Conditioned Diffusion-Based Synthetic Data Generation for Turbine Engine Sensor Analysis and RUL EstimationLuis Pablo Mora-de-León0David Solís-Martín1Juan Galán-Páez2Joaquín Borrego-Díaz3Departament of Computer Science and Artificial Intelligence, Universidad de Sevilla, 41012 Sevilla, SpainDepartament of Computer Science and Artificial Intelligence, Universidad de Sevilla, 41012 Sevilla, SpainDepartament of Computer Science and Artificial Intelligence, Universidad de Sevilla, 41012 Sevilla, SpainDepartament of Computer Science and Artificial Intelligence, Universidad de Sevilla, 41012 Sevilla, SpainThis paper introduces a novel framework for generating synthetic time-series data from turbine engine sensor readings using a text-conditioned diffusion model. The approach begins with dataset preprocessing, including correlation analysis, feature selection, and normalization. Principal Component Analysis (PCA) transforms the normalized signals into three components, mapped to the RGB channels of an image. These components, combined with engine identifiers and cycle information, form compact 19 × 19 × 3 pixel images, later scaled to 512 × 512 × 3 pixels. A variational autoencoder (VAE)-based diffusion model, fine-tuned on these images, leverages text prompts describing engine characteristics to generate high-quality synthetic samples. A reverse transformation pipeline reconstructs synthetic images back into time-series signals, preserving the original engine-specific attributes while removing padding artifacts. The quality of the synthetic data is assessed by training Remaining Useful Life (RUL) estimation models and comparing performance across original, synthetic, and combined datasets. Results demonstrate that synthetic data can be beneficial for model training, particularly in the early epochs when working with limited datasets. Compared to existing approaches, which rely on generative adversarial networks (GANs) or deterministic transformations, the proposed framework offers enhanced data fidelity and adaptability. This study highlights the potential of text-conditioned diffusion models for augmenting time-series datasets in industrial Prognostics and Health Management (PHM) applications.https://www.mdpi.com/2075-1702/13/5/374predictive maintenanceprognostics and health management (PHM)remaining useful life (RUL) |
| spellingShingle | Luis Pablo Mora-de-León David Solís-Martín Juan Galán-Páez Joaquín Borrego-Díaz Text-Conditioned Diffusion-Based Synthetic Data Generation for Turbine Engine Sensor Analysis and RUL Estimation Machines predictive maintenance prognostics and health management (PHM) remaining useful life (RUL) |
| title | Text-Conditioned Diffusion-Based Synthetic Data Generation for Turbine Engine Sensor Analysis and RUL Estimation |
| title_full | Text-Conditioned Diffusion-Based Synthetic Data Generation for Turbine Engine Sensor Analysis and RUL Estimation |
| title_fullStr | Text-Conditioned Diffusion-Based Synthetic Data Generation for Turbine Engine Sensor Analysis and RUL Estimation |
| title_full_unstemmed | Text-Conditioned Diffusion-Based Synthetic Data Generation for Turbine Engine Sensor Analysis and RUL Estimation |
| title_short | Text-Conditioned Diffusion-Based Synthetic Data Generation for Turbine Engine Sensor Analysis and RUL Estimation |
| title_sort | text conditioned diffusion based synthetic data generation for turbine engine sensor analysis and rul estimation |
| topic | predictive maintenance prognostics and health management (PHM) remaining useful life (RUL) |
| url | https://www.mdpi.com/2075-1702/13/5/374 |
| work_keys_str_mv | AT luispablomoradeleon textconditioneddiffusionbasedsyntheticdatagenerationforturbineenginesensoranalysisandrulestimation AT davidsolismartin textconditioneddiffusionbasedsyntheticdatagenerationforturbineenginesensoranalysisandrulestimation AT juangalanpaez textconditioneddiffusionbasedsyntheticdatagenerationforturbineenginesensoranalysisandrulestimation AT joaquinborregodiaz textconditioneddiffusionbasedsyntheticdatagenerationforturbineenginesensoranalysisandrulestimation |