Raman Spectra Classification of Pharmaceutical Compounds: A Benchmark of Machine Learning Models with SHAP-Based Explainability

Raman spectroscopy has become an indispensable analytical technique in pharmaceutical research, offering non-invasive, rapid, and chemically specific insights into pharmaceutical compounds. In this study, we present a comprehensive benchmark of machine learning models for classifying 32 pharmaceutic...

Full description

Saved in:
Bibliographic Details
Main Authors: Dimitris Kalatzis, Alkmini Nega, Yiannis Kiouvrekis
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Eng
Subjects:
Online Access:https://www.mdpi.com/2673-4117/6/7/145
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Raman spectroscopy has become an indispensable analytical technique in pharmaceutical research, offering non-invasive, rapid, and chemically specific insights into pharmaceutical compounds. In this study, we present a comprehensive benchmark of machine learning models for classifying 32 pharmaceutical compounds based on their Raman spectral signatures. A diverse array of algorithms—including Support Vector Machines (SVMs), Random Forests, k-Nearest Neighbors (k-NN), Gradient Boosting (XGBoost, LightGBM), and 1D Convolutional Neural Networks (CNNs)—were evaluated on a publicly available dataset. The results demonstrate outstanding classification performance across models, with linear SVM achieving the highest accuracy of 99.88%, followed closely by CNN (99.26%). Ensemble methods such as Random Forest and XGBoost also yielded high accuracies above 98.3%. In addition to strong predictive performance, SHAP (SHapley Additive exPlanations) analysis was employed to interpret model decisions. CNN models, in particular, revealed well-localized and chemically meaningful spectral regions critical to classification. This combination of high accuracy and interpretability highlights the promise of explainable AI in pharmaceutical analysis and quality control, offering robust, transparent, and scalable solutions for real-world applications.
ISSN:2673-4117