Impact of calibration set size for predicting soil fertility attributes using local pXRF spectral libraries

Employing portable Energy-Dispersive X-Ray Fluorescence (pXRF) sensors and Machine Learning (ML) using calibration set sizes with reduced samples can save resources and time in measuring soil fertility attributes. The calibration of local models must have a sufficient number of samples to represent...

Full description

Saved in:
Bibliographic Details
Main Authors: José Vinícius Ribeiro, Tiago Rodrigues Tavares, José Francirlei de Oliveira, Graziela M.C. Barbosa, Fábio Luiz Melquiades
Format: Article
Language:English
Published: Elsevier 2025-06-01
Series:Soil Advances
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2950289624000319
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Employing portable Energy-Dispersive X-Ray Fluorescence (pXRF) sensors and Machine Learning (ML) using calibration set sizes with reduced samples can save resources and time in measuring soil fertility attributes. The calibration of local models must have a sufficient number of samples to represent the spectral variation of the area related to the properties of interest, avoiding excess, which would be redundant, and lack, which would cause a loss of performance. From this perspective, this study investigated the impact of different calibration set sizes on the performance of Partial Least Square Regression (PLS) models using a pXRF sensor (0‐15 kV) to predict Exchangeable Potassium (K⁺), and Calcium (Ca²⁺), Cation Exchange Capacity (CEC), and Soil Organic Carbon (SOC). Two Brazilian tropical fields composed the pXRF spectral libraries. The Cambé and Toledo datasets comprised 386 and 205 soil samples, respectively. Applying the Kennard-Stone sampling selection algorithm, calibration set sizes ranged from 36 to 276 samples, with increments of 20, for the Cambé field, and from 20 to 140 samples, also with increments of 20, for the Toledo field. Results evaluated on validation sets indicate that calibration sample size can be optimized without compromising model accuracy. Established by quantitatively comparing the models RMSEP, the optimal number of calibration samples was attribute- and field-specific. For the Cambé library, the optimal calibration set size was 116, 136, 156 and 116 for the SOC, CEC, K+ and Ca2+, respectively. For the Toledo library, the SOC, CEC, K+, and Ca2+ had optimal sample numbers of 60, 100, 80, and 80, respectively. All these models presented satisfactory RPIQ values (≥ 1.7). Therefore, to encompass all the evaluated attributes, it is suggested 156 calibration samples as a reference for developing local libraries applying pXRF sensors in tropical soils with oxide-rich mineralogy.
ISSN:2950-2896