Explainable extreme gradient boosting as a machine learning tool for discrimination of the geographical origin of chili peppers using laser ablation-inductively coupled plasma mass spectrometry, X-ray fluorescence, and near-infrared spectroscopy

The spectroscopic discrimination of chili pepper samples according to geographical origin was executed using analytical techniques coupled with machine learning. First, laser ablation-inductively coupled plasma mass spectrometry (LA-ICP-MS), X-ray fluorescence (XRF), and near-infrared (NIR) spectros...

Full description

Saved in:
Bibliographic Details
Main Authors: Seongsoo Jeong, Yong-kyoung Kim, Suel Hye Hur, Hyojoo Bang, HoJin Kim, Hoeil Chung
Format: Article
Language:English
Published: Elsevier 2024-12-01
Series:Journal of Agriculture and Food Research
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666154324004836
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The spectroscopic discrimination of chili pepper samples according to geographical origin was executed using analytical techniques coupled with machine learning. First, laser ablation-inductively coupled plasma mass spectrometry (LA-ICP-MS), X-ray fluorescence (XRF), and near-infrared (NIR) spectroscopy were chosen for simple and rapid sample measurements. Second, to secure discrimination accuracy, eXtreme Gradient Boosting (XGBoost), a tree-based ensemble technique, was adopted as a potential classifier. Also, for explainable machine learning modeling, SHaply Additive exPlanation (SHAP) values of employed variables were calculated to assess how they contribute to the discrimination. The use of XGBoost improved discrimination accuracies in all three measurements compared to k-nearest neighbor (k-NN), support vector machine (SVM), and partial least squares-discriminant analysis (PLS-DA). The accuracy was 96.2 % using the LA-ICP-MS data. When the XRF and NIR data were combined, the accuracy improved to 97.5 %. The accuracy improvement was attributed to the combination of complementary atomic and molecular spectroscopic signatures of the samples.
ISSN:2666-1543