Text-Conditioned Diffusion-Based Synthetic Data Generation for Turbine Engine Sensor Analysis and RUL Estimation

This paper introduces a novel framework for generating synthetic time-series data from turbine engine sensor readings using a text-conditioned diffusion model. The approach begins with dataset preprocessing, including correlation analysis, feature selection, and normalization. Principal Component An...

Full description

Saved in:
Bibliographic Details
Main Authors: Luis Pablo Mora-de-León, David Solís-Martín, Juan Galán-Páez, Joaquín Borrego-Díaz
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Machines
Subjects:
Online Access:https://www.mdpi.com/2075-1702/13/5/374
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849327071747112960
author Luis Pablo Mora-de-León
David Solís-Martín
Juan Galán-Páez
Joaquín Borrego-Díaz
author_facet Luis Pablo Mora-de-León
David Solís-Martín
Juan Galán-Páez
Joaquín Borrego-Díaz
author_sort Luis Pablo Mora-de-León
collection DOAJ
description This paper introduces a novel framework for generating synthetic time-series data from turbine engine sensor readings using a text-conditioned diffusion model. The approach begins with dataset preprocessing, including correlation analysis, feature selection, and normalization. Principal Component Analysis (PCA) transforms the normalized signals into three components, mapped to the RGB channels of an image. These components, combined with engine identifiers and cycle information, form compact 19 × 19 × 3 pixel images, later scaled to 512 × 512 × 3 pixels. A variational autoencoder (VAE)-based diffusion model, fine-tuned on these images, leverages text prompts describing engine characteristics to generate high-quality synthetic samples. A reverse transformation pipeline reconstructs synthetic images back into time-series signals, preserving the original engine-specific attributes while removing padding artifacts. The quality of the synthetic data is assessed by training Remaining Useful Life (RUL) estimation models and comparing performance across original, synthetic, and combined datasets. Results demonstrate that synthetic data can be beneficial for model training, particularly in the early epochs when working with limited datasets. Compared to existing approaches, which rely on generative adversarial networks (GANs) or deterministic transformations, the proposed framework offers enhanced data fidelity and adaptability. This study highlights the potential of text-conditioned diffusion models for augmenting time-series datasets in industrial Prognostics and Health Management (PHM) applications.
format Article
id doaj-art-4903c8f08e8649fe967dd01f31daf56f
institution Kabale University
issn 2075-1702
language English
publishDate 2025-04-01
publisher MDPI AG
record_format Article
series Machines
spelling doaj-art-4903c8f08e8649fe967dd01f31daf56f2025-08-20T03:47:58ZengMDPI AGMachines2075-17022025-04-0113537410.3390/machines13050374Text-Conditioned Diffusion-Based Synthetic Data Generation for Turbine Engine Sensor Analysis and RUL EstimationLuis Pablo Mora-de-León0David Solís-Martín1Juan Galán-Páez2Joaquín Borrego-Díaz3Departament of Computer Science and Artificial Intelligence, Universidad de Sevilla, 41012 Sevilla, SpainDepartament of Computer Science and Artificial Intelligence, Universidad de Sevilla, 41012 Sevilla, SpainDepartament of Computer Science and Artificial Intelligence, Universidad de Sevilla, 41012 Sevilla, SpainDepartament of Computer Science and Artificial Intelligence, Universidad de Sevilla, 41012 Sevilla, SpainThis paper introduces a novel framework for generating synthetic time-series data from turbine engine sensor readings using a text-conditioned diffusion model. The approach begins with dataset preprocessing, including correlation analysis, feature selection, and normalization. Principal Component Analysis (PCA) transforms the normalized signals into three components, mapped to the RGB channels of an image. These components, combined with engine identifiers and cycle information, form compact 19 × 19 × 3 pixel images, later scaled to 512 × 512 × 3 pixels. A variational autoencoder (VAE)-based diffusion model, fine-tuned on these images, leverages text prompts describing engine characteristics to generate high-quality synthetic samples. A reverse transformation pipeline reconstructs synthetic images back into time-series signals, preserving the original engine-specific attributes while removing padding artifacts. The quality of the synthetic data is assessed by training Remaining Useful Life (RUL) estimation models and comparing performance across original, synthetic, and combined datasets. Results demonstrate that synthetic data can be beneficial for model training, particularly in the early epochs when working with limited datasets. Compared to existing approaches, which rely on generative adversarial networks (GANs) or deterministic transformations, the proposed framework offers enhanced data fidelity and adaptability. This study highlights the potential of text-conditioned diffusion models for augmenting time-series datasets in industrial Prognostics and Health Management (PHM) applications.https://www.mdpi.com/2075-1702/13/5/374predictive maintenanceprognostics and health management (PHM)remaining useful life (RUL)
spellingShingle Luis Pablo Mora-de-León
David Solís-Martín
Juan Galán-Páez
Joaquín Borrego-Díaz
Text-Conditioned Diffusion-Based Synthetic Data Generation for Turbine Engine Sensor Analysis and RUL Estimation
Machines
predictive maintenance
prognostics and health management (PHM)
remaining useful life (RUL)
title Text-Conditioned Diffusion-Based Synthetic Data Generation for Turbine Engine Sensor Analysis and RUL Estimation
title_full Text-Conditioned Diffusion-Based Synthetic Data Generation for Turbine Engine Sensor Analysis and RUL Estimation
title_fullStr Text-Conditioned Diffusion-Based Synthetic Data Generation for Turbine Engine Sensor Analysis and RUL Estimation
title_full_unstemmed Text-Conditioned Diffusion-Based Synthetic Data Generation for Turbine Engine Sensor Analysis and RUL Estimation
title_short Text-Conditioned Diffusion-Based Synthetic Data Generation for Turbine Engine Sensor Analysis and RUL Estimation
title_sort text conditioned diffusion based synthetic data generation for turbine engine sensor analysis and rul estimation
topic predictive maintenance
prognostics and health management (PHM)
remaining useful life (RUL)
url https://www.mdpi.com/2075-1702/13/5/374
work_keys_str_mv AT luispablomoradeleon textconditioneddiffusionbasedsyntheticdatagenerationforturbineenginesensoranalysisandrulestimation
AT davidsolismartin textconditioneddiffusionbasedsyntheticdatagenerationforturbineenginesensoranalysisandrulestimation
AT juangalanpaez textconditioneddiffusionbasedsyntheticdatagenerationforturbineenginesensoranalysisandrulestimation
AT joaquinborregodiaz textconditioneddiffusionbasedsyntheticdatagenerationforturbineenginesensoranalysisandrulestimation