Generative Artificial Intelligence for Synthetic Spectral Data Augmentation in Sensor-Based Plastic Recycling

The reliance on deep learning models for sensor-based material classification amplifies the demand for labeled training data. However, acquiring large-scale, annotated spectral data for applications such as near-infrared (NIR) reflectance spectroscopy in plastic sorting remains a significant challen...

Full description

Saved in:
Bibliographic Details
Main Authors: Roman-David Kulko, Andreas Hanus, Benedikt Elser
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/13/4114
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849427457320419328
author Roman-David Kulko
Andreas Hanus
Benedikt Elser
author_facet Roman-David Kulko
Andreas Hanus
Benedikt Elser
author_sort Roman-David Kulko
collection DOAJ
description The reliance on deep learning models for sensor-based material classification amplifies the demand for labeled training data. However, acquiring large-scale, annotated spectral data for applications such as near-infrared (NIR) reflectance spectroscopy in plastic sorting remains a significant challenge due to high acquisition costs and environmental variability. This paper investigates the potential of large language models (LLMs) in synthetic spectral data generation. Specifically, it examines whether LLMs have acquired sufficient implicit knowledge to assist in generating spectral data and introduce meaningful variations that enhance model performance when used for data augmentation. Classification accuracy is reported exclusively as a proxy for structural plausibility of the augmented spectra; maximizing augmentation performance itself is not the study’s goal. From as little as one empirical mean spectrum per class, LLM-guided simulation produced data that enabled up to 86% accuracy, evidence that the generated variation preserves class-distinguishing information. While the approach performs best for spectral distinct polymers, overlapping classes remain challenging. Additionally, the transfer of optimized augmentation parameters to unseen classes indicates potential for generalization across material types. While plastic sorting serves as a case study, the methodology may be applicable to other domains such as agriculture or food quality assessment, where spectral data are limited. The study outlines a novel path toward scalable, AI-supported data augmentation in spectroscopy-based classification systems.
format Article
id doaj-art-7a0c9dd2533a4a7cadcb91fb8c193060
institution Kabale University
issn 1424-8220
language English
publishDate 2025-07-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj-art-7a0c9dd2533a4a7cadcb91fb8c1930602025-08-20T03:29:02ZengMDPI AGSensors1424-82202025-07-012513411410.3390/s25134114Generative Artificial Intelligence for Synthetic Spectral Data Augmentation in Sensor-Based Plastic RecyclingRoman-David Kulko0Andreas Hanus1Benedikt Elser2Technologie Campus Grafenau, Technische Hochschule Deggendorf, 94481 Grafenau, GermanySesotec GmbH, Regener Straße 130, 94513 Schönberg, GermanyTechnologie Campus Grafenau, Technische Hochschule Deggendorf, 94481 Grafenau, GermanyThe reliance on deep learning models for sensor-based material classification amplifies the demand for labeled training data. However, acquiring large-scale, annotated spectral data for applications such as near-infrared (NIR) reflectance spectroscopy in plastic sorting remains a significant challenge due to high acquisition costs and environmental variability. This paper investigates the potential of large language models (LLMs) in synthetic spectral data generation. Specifically, it examines whether LLMs have acquired sufficient implicit knowledge to assist in generating spectral data and introduce meaningful variations that enhance model performance when used for data augmentation. Classification accuracy is reported exclusively as a proxy for structural plausibility of the augmented spectra; maximizing augmentation performance itself is not the study’s goal. From as little as one empirical mean spectrum per class, LLM-guided simulation produced data that enabled up to 86% accuracy, evidence that the generated variation preserves class-distinguishing information. While the approach performs best for spectral distinct polymers, overlapping classes remain challenging. Additionally, the transfer of optimized augmentation parameters to unseen classes indicates potential for generalization across material types. While plastic sorting serves as a case study, the methodology may be applicable to other domains such as agriculture or food quality assessment, where spectral data are limited. The study outlines a novel path toward scalable, AI-supported data augmentation in spectroscopy-based classification systems.https://www.mdpi.com/1424-8220/25/13/4114generative artificial intelligencesynthetic data generationspectral data augmentationnear-infrared spectroscopydeep learning for recyclingplastic waste sorting
spellingShingle Roman-David Kulko
Andreas Hanus
Benedikt Elser
Generative Artificial Intelligence for Synthetic Spectral Data Augmentation in Sensor-Based Plastic Recycling
Sensors
generative artificial intelligence
synthetic data generation
spectral data augmentation
near-infrared spectroscopy
deep learning for recycling
plastic waste sorting
title Generative Artificial Intelligence for Synthetic Spectral Data Augmentation in Sensor-Based Plastic Recycling
title_full Generative Artificial Intelligence for Synthetic Spectral Data Augmentation in Sensor-Based Plastic Recycling
title_fullStr Generative Artificial Intelligence for Synthetic Spectral Data Augmentation in Sensor-Based Plastic Recycling
title_full_unstemmed Generative Artificial Intelligence for Synthetic Spectral Data Augmentation in Sensor-Based Plastic Recycling
title_short Generative Artificial Intelligence for Synthetic Spectral Data Augmentation in Sensor-Based Plastic Recycling
title_sort generative artificial intelligence for synthetic spectral data augmentation in sensor based plastic recycling
topic generative artificial intelligence
synthetic data generation
spectral data augmentation
near-infrared spectroscopy
deep learning for recycling
plastic waste sorting
url https://www.mdpi.com/1424-8220/25/13/4114
work_keys_str_mv AT romandavidkulko generativeartificialintelligenceforsyntheticspectraldataaugmentationinsensorbasedplasticrecycling
AT andreashanus generativeartificialintelligenceforsyntheticspectraldataaugmentationinsensorbasedplasticrecycling
AT benediktelser generativeartificialintelligenceforsyntheticspectraldataaugmentationinsensorbasedplasticrecycling