Infrared spectrum analysis of organic molecules with neural networks using standard reference data sets in combination with real-world data

Abstract In this study, we propose a neural network- based approach to analyze IR spectra and detect the presence of functional groups. Our neural network architecture is based on the concept of learning split representations. We demonstrate that our method achieves favorable validation performance...

Full description

Saved in:
Bibliographic Details
Main Authors: Dev Punjabi, Yu-Chieh Huang, Laura Holzhauer, Pierre Tremouilhac, Pascal Friederich, Nicole Jung, Stefan Bräse
Format: Article
Language:English
Published: BMC 2025-02-01
Series:Journal of Cheminformatics
Subjects:
Online Access:https://doi.org/10.1186/s13321-025-00960-2
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850185315235397632
author Dev Punjabi
Yu-Chieh Huang
Laura Holzhauer
Pierre Tremouilhac
Pascal Friederich
Nicole Jung
Stefan Bräse
author_facet Dev Punjabi
Yu-Chieh Huang
Laura Holzhauer
Pierre Tremouilhac
Pascal Friederich
Nicole Jung
Stefan Bräse
author_sort Dev Punjabi
collection DOAJ
description Abstract In this study, we propose a neural network- based approach to analyze IR spectra and detect the presence of functional groups. Our neural network architecture is based on the concept of learning split representations. We demonstrate that our method achieves favorable validation performance using the NIST dataset. Furthermore, by incorporating additional data from the open-access research data repository Chemotion, we show that our model improves the classification performance for nitriles and amides. Scientific contribution: Our method exclusively uses IR data as input for a neural network, making its performance, unlike other well-performing models, independent of additional data types obtained from analytical measurements. Furthermore, our proposed method leverages a deep learning model that outperforms previous approaches, achieving F1 scores above 0.7 to identify 17 functional groups. By incorporating real-world data from various laboratories, we demonstrate how open-access, specialized research data repositories can serve as yet unexplored, valuable benchmark datasets for future machine learning research.
format Article
id doaj-art-285a00e3c4b140fa878fbd5773ff7fbd
institution OA Journals
issn 1758-2946
language English
publishDate 2025-02-01
publisher BMC
record_format Article
series Journal of Cheminformatics
spelling doaj-art-285a00e3c4b140fa878fbd5773ff7fbd2025-08-20T02:16:45ZengBMCJournal of Cheminformatics1758-29462025-02-0117111310.1186/s13321-025-00960-2Infrared spectrum analysis of organic molecules with neural networks using standard reference data sets in combination with real-world dataDev Punjabi0Yu-Chieh Huang1Laura Holzhauer2Pierre Tremouilhac3Pascal Friederich4Nicole Jung5Stefan Bräse6Institute of Biological and Chemical Systems, Karlsruhe Institute of Technology (KIT)Institute of Biological and Chemical Systems, Karlsruhe Institute of Technology (KIT)Institute of Biological and Chemical Systems, Karlsruhe Institute of Technology (KIT)Institute of Biological and Chemical Systems, Karlsruhe Institute of Technology (KIT)Institute of Theoretical Informatics, Karlsruhe Institute of Technology (KIT)Institute of Biological and Chemical Systems, Karlsruhe Institute of Technology (KIT)Institute of Biological and Chemical Systems, Karlsruhe Institute of Technology (KIT)Abstract In this study, we propose a neural network- based approach to analyze IR spectra and detect the presence of functional groups. Our neural network architecture is based on the concept of learning split representations. We demonstrate that our method achieves favorable validation performance using the NIST dataset. Furthermore, by incorporating additional data from the open-access research data repository Chemotion, we show that our model improves the classification performance for nitriles and amides. Scientific contribution: Our method exclusively uses IR data as input for a neural network, making its performance, unlike other well-performing models, independent of additional data types obtained from analytical measurements. Furthermore, our proposed method leverages a deep learning model that outperforms previous approaches, achieving F1 scores above 0.7 to identify 17 functional groups. By incorporating real-world data from various laboratories, we demonstrate how open-access, specialized research data repositories can serve as yet unexplored, valuable benchmark datasets for future machine learning research.https://doi.org/10.1186/s13321-025-00960-2Infrared spectraMachine learningData analysisOpen databases
spellingShingle Dev Punjabi
Yu-Chieh Huang
Laura Holzhauer
Pierre Tremouilhac
Pascal Friederich
Nicole Jung
Stefan Bräse
Infrared spectrum analysis of organic molecules with neural networks using standard reference data sets in combination with real-world data
Journal of Cheminformatics
Infrared spectra
Machine learning
Data analysis
Open databases
title Infrared spectrum analysis of organic molecules with neural networks using standard reference data sets in combination with real-world data
title_full Infrared spectrum analysis of organic molecules with neural networks using standard reference data sets in combination with real-world data
title_fullStr Infrared spectrum analysis of organic molecules with neural networks using standard reference data sets in combination with real-world data
title_full_unstemmed Infrared spectrum analysis of organic molecules with neural networks using standard reference data sets in combination with real-world data
title_short Infrared spectrum analysis of organic molecules with neural networks using standard reference data sets in combination with real-world data
title_sort infrared spectrum analysis of organic molecules with neural networks using standard reference data sets in combination with real world data
topic Infrared spectra
Machine learning
Data analysis
Open databases
url https://doi.org/10.1186/s13321-025-00960-2
work_keys_str_mv AT devpunjabi infraredspectrumanalysisoforganicmoleculeswithneuralnetworksusingstandardreferencedatasetsincombinationwithrealworlddata
AT yuchiehhuang infraredspectrumanalysisoforganicmoleculeswithneuralnetworksusingstandardreferencedatasetsincombinationwithrealworlddata
AT lauraholzhauer infraredspectrumanalysisoforganicmoleculeswithneuralnetworksusingstandardreferencedatasetsincombinationwithrealworlddata
AT pierretremouilhac infraredspectrumanalysisoforganicmoleculeswithneuralnetworksusingstandardreferencedatasetsincombinationwithrealworlddata
AT pascalfriederich infraredspectrumanalysisoforganicmoleculeswithneuralnetworksusingstandardreferencedatasetsincombinationwithrealworlddata
AT nicolejung infraredspectrumanalysisoforganicmoleculeswithneuralnetworksusingstandardreferencedatasetsincombinationwithrealworlddata
AT stefanbrase infraredspectrumanalysisoforganicmoleculeswithneuralnetworksusingstandardreferencedatasetsincombinationwithrealworlddata