Raman Spectra Classification of Pharmaceutical Compounds: A Benchmark of Machine Learning Models with SHAP-Based Explainability

Raman spectroscopy has become an indispensable analytical technique in pharmaceutical research, offering non-invasive, rapid, and chemically specific insights into pharmaceutical compounds. In this study, we present a comprehensive benchmark of machine learning models for classifying 32 pharmaceutic...

Full description

Saved in:
Bibliographic Details
Main Authors: Dimitris Kalatzis, Alkmini Nega, Yiannis Kiouvrekis
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Eng
Subjects:
Online Access:https://www.mdpi.com/2673-4117/6/7/145
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849409088361857024
author Dimitris Kalatzis
Alkmini Nega
Yiannis Kiouvrekis
author_facet Dimitris Kalatzis
Alkmini Nega
Yiannis Kiouvrekis
author_sort Dimitris Kalatzis
collection DOAJ
description Raman spectroscopy has become an indispensable analytical technique in pharmaceutical research, offering non-invasive, rapid, and chemically specific insights into pharmaceutical compounds. In this study, we present a comprehensive benchmark of machine learning models for classifying 32 pharmaceutical compounds based on their Raman spectral signatures. A diverse array of algorithms—including Support Vector Machines (SVMs), Random Forests, k-Nearest Neighbors (k-NN), Gradient Boosting (XGBoost, LightGBM), and 1D Convolutional Neural Networks (CNNs)—were evaluated on a publicly available dataset. The results demonstrate outstanding classification performance across models, with linear SVM achieving the highest accuracy of 99.88%, followed closely by CNN (99.26%). Ensemble methods such as Random Forest and XGBoost also yielded high accuracies above 98.3%. In addition to strong predictive performance, SHAP (SHapley Additive exPlanations) analysis was employed to interpret model decisions. CNN models, in particular, revealed well-localized and chemically meaningful spectral regions critical to classification. This combination of high accuracy and interpretability highlights the promise of explainable AI in pharmaceutical analysis and quality control, offering robust, transparent, and scalable solutions for real-world applications.
format Article
id doaj-art-17145c4f7ed346ceaff1d2b8ea3b9098
institution Kabale University
issn 2673-4117
language English
publishDate 2025-07-01
publisher MDPI AG
record_format Article
series Eng
spelling doaj-art-17145c4f7ed346ceaff1d2b8ea3b90982025-08-20T03:35:37ZengMDPI AGEng2673-41172025-07-016714510.3390/eng6070145Raman Spectra Classification of Pharmaceutical Compounds: A Benchmark of Machine Learning Models with SHAP-Based ExplainabilityDimitris Kalatzis0Alkmini Nega1Yiannis Kiouvrekis2Mathematics, Computer Science and Artificial Intelligence Laboratory, Faculty of Public and One Health, University of Thessaly, 43100 Karditsa, GreeceNational Hellenic Research Foundation, Institute of Chemical Biology, 48 Vassileos Constantinou Avenue, 11635 Athens, GreeceMathematics, Computer Science and Artificial Intelligence Laboratory, Faculty of Public and One Health, University of Thessaly, 43100 Karditsa, GreeceRaman spectroscopy has become an indispensable analytical technique in pharmaceutical research, offering non-invasive, rapid, and chemically specific insights into pharmaceutical compounds. In this study, we present a comprehensive benchmark of machine learning models for classifying 32 pharmaceutical compounds based on their Raman spectral signatures. A diverse array of algorithms—including Support Vector Machines (SVMs), Random Forests, k-Nearest Neighbors (k-NN), Gradient Boosting (XGBoost, LightGBM), and 1D Convolutional Neural Networks (CNNs)—were evaluated on a publicly available dataset. The results demonstrate outstanding classification performance across models, with linear SVM achieving the highest accuracy of 99.88%, followed closely by CNN (99.26%). Ensemble methods such as Random Forest and XGBoost also yielded high accuracies above 98.3%. In addition to strong predictive performance, SHAP (SHapley Additive exPlanations) analysis was employed to interpret model decisions. CNN models, in particular, revealed well-localized and chemically meaningful spectral regions critical to classification. This combination of high accuracy and interpretability highlights the promise of explainable AI in pharmaceutical analysis and quality control, offering robust, transparent, and scalable solutions for real-world applications.https://www.mdpi.com/2673-4117/6/7/145Raman spectroscopyexplainable artificial intelligencepharmaceutical analysisActive Pharmaceutical Ingredients (APIs)machine learningSHAP values
spellingShingle Dimitris Kalatzis
Alkmini Nega
Yiannis Kiouvrekis
Raman Spectra Classification of Pharmaceutical Compounds: A Benchmark of Machine Learning Models with SHAP-Based Explainability
Eng
Raman spectroscopy
explainable artificial intelligence
pharmaceutical analysis
Active Pharmaceutical Ingredients (APIs)
machine learning
SHAP values
title Raman Spectra Classification of Pharmaceutical Compounds: A Benchmark of Machine Learning Models with SHAP-Based Explainability
title_full Raman Spectra Classification of Pharmaceutical Compounds: A Benchmark of Machine Learning Models with SHAP-Based Explainability
title_fullStr Raman Spectra Classification of Pharmaceutical Compounds: A Benchmark of Machine Learning Models with SHAP-Based Explainability
title_full_unstemmed Raman Spectra Classification of Pharmaceutical Compounds: A Benchmark of Machine Learning Models with SHAP-Based Explainability
title_short Raman Spectra Classification of Pharmaceutical Compounds: A Benchmark of Machine Learning Models with SHAP-Based Explainability
title_sort raman spectra classification of pharmaceutical compounds a benchmark of machine learning models with shap based explainability
topic Raman spectroscopy
explainable artificial intelligence
pharmaceutical analysis
Active Pharmaceutical Ingredients (APIs)
machine learning
SHAP values
url https://www.mdpi.com/2673-4117/6/7/145
work_keys_str_mv AT dimitriskalatzis ramanspectraclassificationofpharmaceuticalcompoundsabenchmarkofmachinelearningmodelswithshapbasedexplainability
AT alkmininega ramanspectraclassificationofpharmaceuticalcompoundsabenchmarkofmachinelearningmodelswithshapbasedexplainability
AT yianniskiouvrekis ramanspectraclassificationofpharmaceuticalcompoundsabenchmarkofmachinelearningmodelswithshapbasedexplainability