Using preprocessed datasets to construct and interpret multiclass identification models

IntroductionImage and near-infrared (NIR) spectroscopic data are widely used for constructing analytical models in precision agriculture. While model interpretation can provide valuable insights for quality control and improvement, the inherent ambiguity of individual image pixels or spectral data p...

Full description

Saved in:
Bibliographic Details
Main Authors: Cong Wang, Yufeng Fu, Ran Wan, Le Zhao, Hongbo Wang, Junwei Guo, Qiang Liu, Shan Li, Shengtao Ma, Zhicai Wang, Wei Huang, Huimin Liu, Song Yang, Cong Nie
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-08-01
Series:Frontiers in Plant Science
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fpls.2025.1597673/full
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:IntroductionImage and near-infrared (NIR) spectroscopic data are widely used for constructing analytical models in precision agriculture. While model interpretation can provide valuable insights for quality control and improvement, the inherent ambiguity of individual image pixels or spectral data points often hinders practical interpretability when using raw data directly. Furthermore, the presence of imbalanced datasets can lead to model overfitting and consequently, poor robustness. Therefore, developing alternative approaches for constructing interpretable and robust models using these data types is crucial.MethodsThis study proposes using preprocessed data—specifically, morphological features extracted from images and chemical component concentrations predicted from NIR spectra—to build multiclass identification models. Combined kernel SVM based models were proposed to identify the rice variety and cultivation region of tobacco. The determination of kernel parameters and percentage of different types of kernel functions were accomplished by PSO, which make the approach self-adaptive. Feature importance and contribution analyses were conducted using Shapley additive explanations (SHAP).ResultsThe resulting models demonstrated high robustness and accuracy, achieving classification success rates of 97.9 and 97.4% via n-fold cross validation on rice and tobacco datasets, respectively, and 97.7% on an independent test set (tobacco dataset 2). This analysis identified key variables and elucidated their specific contributions to the model predictions.DiscussionThis study expands the applicability of image and NIR spectroscopic data, offering researchers an effective methodology for investigating factors crucial to the quality control and improvement of agricultural products.
ISSN:1664-462X