Raman Spectra Classification of Pharmaceutical Compounds: A Benchmark of Machine Learning Models with SHAP-Based Explainability
Raman spectroscopy has become an indispensable analytical technique in pharmaceutical research, offering non-invasive, rapid, and chemically specific insights into pharmaceutical compounds. In this study, we present a comprehensive benchmark of machine learning models for classifying 32 pharmaceutic...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-07-01
|
| Series: | Eng |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2673-4117/6/7/145 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849409088361857024 |
|---|---|
| author | Dimitris Kalatzis Alkmini Nega Yiannis Kiouvrekis |
| author_facet | Dimitris Kalatzis Alkmini Nega Yiannis Kiouvrekis |
| author_sort | Dimitris Kalatzis |
| collection | DOAJ |
| description | Raman spectroscopy has become an indispensable analytical technique in pharmaceutical research, offering non-invasive, rapid, and chemically specific insights into pharmaceutical compounds. In this study, we present a comprehensive benchmark of machine learning models for classifying 32 pharmaceutical compounds based on their Raman spectral signatures. A diverse array of algorithms—including Support Vector Machines (SVMs), Random Forests, k-Nearest Neighbors (k-NN), Gradient Boosting (XGBoost, LightGBM), and 1D Convolutional Neural Networks (CNNs)—were evaluated on a publicly available dataset. The results demonstrate outstanding classification performance across models, with linear SVM achieving the highest accuracy of 99.88%, followed closely by CNN (99.26%). Ensemble methods such as Random Forest and XGBoost also yielded high accuracies above 98.3%. In addition to strong predictive performance, SHAP (SHapley Additive exPlanations) analysis was employed to interpret model decisions. CNN models, in particular, revealed well-localized and chemically meaningful spectral regions critical to classification. This combination of high accuracy and interpretability highlights the promise of explainable AI in pharmaceutical analysis and quality control, offering robust, transparent, and scalable solutions for real-world applications. |
| format | Article |
| id | doaj-art-17145c4f7ed346ceaff1d2b8ea3b9098 |
| institution | Kabale University |
| issn | 2673-4117 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Eng |
| spelling | doaj-art-17145c4f7ed346ceaff1d2b8ea3b90982025-08-20T03:35:37ZengMDPI AGEng2673-41172025-07-016714510.3390/eng6070145Raman Spectra Classification of Pharmaceutical Compounds: A Benchmark of Machine Learning Models with SHAP-Based ExplainabilityDimitris Kalatzis0Alkmini Nega1Yiannis Kiouvrekis2Mathematics, Computer Science and Artificial Intelligence Laboratory, Faculty of Public and One Health, University of Thessaly, 43100 Karditsa, GreeceNational Hellenic Research Foundation, Institute of Chemical Biology, 48 Vassileos Constantinou Avenue, 11635 Athens, GreeceMathematics, Computer Science and Artificial Intelligence Laboratory, Faculty of Public and One Health, University of Thessaly, 43100 Karditsa, GreeceRaman spectroscopy has become an indispensable analytical technique in pharmaceutical research, offering non-invasive, rapid, and chemically specific insights into pharmaceutical compounds. In this study, we present a comprehensive benchmark of machine learning models for classifying 32 pharmaceutical compounds based on their Raman spectral signatures. A diverse array of algorithms—including Support Vector Machines (SVMs), Random Forests, k-Nearest Neighbors (k-NN), Gradient Boosting (XGBoost, LightGBM), and 1D Convolutional Neural Networks (CNNs)—were evaluated on a publicly available dataset. The results demonstrate outstanding classification performance across models, with linear SVM achieving the highest accuracy of 99.88%, followed closely by CNN (99.26%). Ensemble methods such as Random Forest and XGBoost also yielded high accuracies above 98.3%. In addition to strong predictive performance, SHAP (SHapley Additive exPlanations) analysis was employed to interpret model decisions. CNN models, in particular, revealed well-localized and chemically meaningful spectral regions critical to classification. This combination of high accuracy and interpretability highlights the promise of explainable AI in pharmaceutical analysis and quality control, offering robust, transparent, and scalable solutions for real-world applications.https://www.mdpi.com/2673-4117/6/7/145Raman spectroscopyexplainable artificial intelligencepharmaceutical analysisActive Pharmaceutical Ingredients (APIs)machine learningSHAP values |
| spellingShingle | Dimitris Kalatzis Alkmini Nega Yiannis Kiouvrekis Raman Spectra Classification of Pharmaceutical Compounds: A Benchmark of Machine Learning Models with SHAP-Based Explainability Eng Raman spectroscopy explainable artificial intelligence pharmaceutical analysis Active Pharmaceutical Ingredients (APIs) machine learning SHAP values |
| title | Raman Spectra Classification of Pharmaceutical Compounds: A Benchmark of Machine Learning Models with SHAP-Based Explainability |
| title_full | Raman Spectra Classification of Pharmaceutical Compounds: A Benchmark of Machine Learning Models with SHAP-Based Explainability |
| title_fullStr | Raman Spectra Classification of Pharmaceutical Compounds: A Benchmark of Machine Learning Models with SHAP-Based Explainability |
| title_full_unstemmed | Raman Spectra Classification of Pharmaceutical Compounds: A Benchmark of Machine Learning Models with SHAP-Based Explainability |
| title_short | Raman Spectra Classification of Pharmaceutical Compounds: A Benchmark of Machine Learning Models with SHAP-Based Explainability |
| title_sort | raman spectra classification of pharmaceutical compounds a benchmark of machine learning models with shap based explainability |
| topic | Raman spectroscopy explainable artificial intelligence pharmaceutical analysis Active Pharmaceutical Ingredients (APIs) machine learning SHAP values |
| url | https://www.mdpi.com/2673-4117/6/7/145 |
| work_keys_str_mv | AT dimitriskalatzis ramanspectraclassificationofpharmaceuticalcompoundsabenchmarkofmachinelearningmodelswithshapbasedexplainability AT alkmininega ramanspectraclassificationofpharmaceuticalcompoundsabenchmarkofmachinelearningmodelswithshapbasedexplainability AT yianniskiouvrekis ramanspectraclassificationofpharmaceuticalcompoundsabenchmarkofmachinelearningmodelswithshapbasedexplainability |