Post-hoc Evaluation of Sample Size in a Regional Digital Soil Mapping Project
The transition from conventional soil mapping (CSM) to digital soil mapping (DSM) not only affects the final map products, but it also affects the concepts of scale, resolution, and sampling intensity. This is critical because in the CSM approach, sampling intensity is intricately linked to the desi...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-03-01
|
| Series: | Land |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2073-445X/14/3/545 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849342877946085376 |
|---|---|
| author | Daniel D. Saurette Richard J. Heck Adam W. Gillespie Aaron A. Berg Asim Biswas |
| author_facet | Daniel D. Saurette Richard J. Heck Adam W. Gillespie Aaron A. Berg Asim Biswas |
| author_sort | Daniel D. Saurette |
| collection | DOAJ |
| description | The transition from conventional soil mapping (CSM) to digital soil mapping (DSM) not only affects the final map products, but it also affects the concepts of scale, resolution, and sampling intensity. This is critical because in the CSM approach, sampling intensity is intricately linked to the desired scale of soil map publication, which provided standardization of sampling. This is not the case for DSM where sample size varies widely by project, and sampling design studies have largely focused on where to sample without due consideration for sample size. Using a regional soil survey dataset with 1791 sampled and described soil profiles, we first extracted an external validation dataset using the conditioned Latin hypercube sampling (cLHS) algorithm and then created repeated (<i>n</i> = 10) sample plans of increasing size from the remaining calibration sites using the cLHS, feature space coverage sampling (FSCS), and simple random sampling (SRS). We then trained random forest (RF) models for four soil properties: pH, CEC, clay content, and SOC at five different depths. We identified the effective sample size based on the model learning curves and compared it to the optimal sample size determined from the Jensen–Shannon divergence (D<sub>JS</sub>) applied to the environmental covariates. Maps were then generated from models that used all the calibration points (reference maps) and from models that used the optimal sample size (optimal maps) for comparison. Our findings revealed that the optimal sample sizes based on the D<sub>JS</sub> analysis were closely aligned with the effective sample sizes from the model learning curves (815 for cLHS, 832 for FSCS, and 847 for SRS). Furthermore, the comparison of the optimal maps to the reference maps showed little difference in the global statistics (concordance correlation coefficient and root mean square error) and spatial trends of the data, confirming that the optimal sample size was sufficient for creating predictions of similar accuracy to the full calibration dataset. Finally, we conclude that the Ottawa soil survey project could have saved between CAD 330,500 and CAD 374,000 (CAD = Canadian dollars) if the determination of optimal sample size tools presented herein existed during the project planning phase. This clearly illustrates the need for additional research in determining an optimal sample size for DSM and demonstrates that operationalization of DSM in public institutions requires a sound scientific basis for determining sample size. |
| format | Article |
| id | doaj-art-ddc762e0ade640c8b388afda3821b43f |
| institution | Kabale University |
| issn | 2073-445X |
| language | English |
| publishDate | 2025-03-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Land |
| spelling | doaj-art-ddc762e0ade640c8b388afda3821b43f2025-08-20T03:43:14ZengMDPI AGLand2073-445X2025-03-0114354510.3390/land14030545Post-hoc Evaluation of Sample Size in a Regional Digital Soil Mapping ProjectDaniel D. Saurette0Richard J. Heck1Adam W. Gillespie2Aaron A. Berg3Asim Biswas4School of Environmental Sciences, University of Guelph, 50 Stone Rd East, Guelph, ON N1G 2W1, CanadaSchool of Environmental Sciences, University of Guelph, 50 Stone Rd East, Guelph, ON N1G 2W1, CanadaSchool of Environmental Sciences, University of Guelph, 50 Stone Rd East, Guelph, ON N1G 2W1, CanadaDepartment of Geography, Environment & Geomatics, University of Guelph, 50 Stone Rd East, Guelph, ON N1G 2W1, CanadaSchool of Environmental Sciences, University of Guelph, 50 Stone Rd East, Guelph, ON N1G 2W1, CanadaThe transition from conventional soil mapping (CSM) to digital soil mapping (DSM) not only affects the final map products, but it also affects the concepts of scale, resolution, and sampling intensity. This is critical because in the CSM approach, sampling intensity is intricately linked to the desired scale of soil map publication, which provided standardization of sampling. This is not the case for DSM where sample size varies widely by project, and sampling design studies have largely focused on where to sample without due consideration for sample size. Using a regional soil survey dataset with 1791 sampled and described soil profiles, we first extracted an external validation dataset using the conditioned Latin hypercube sampling (cLHS) algorithm and then created repeated (<i>n</i> = 10) sample plans of increasing size from the remaining calibration sites using the cLHS, feature space coverage sampling (FSCS), and simple random sampling (SRS). We then trained random forest (RF) models for four soil properties: pH, CEC, clay content, and SOC at five different depths. We identified the effective sample size based on the model learning curves and compared it to the optimal sample size determined from the Jensen–Shannon divergence (D<sub>JS</sub>) applied to the environmental covariates. Maps were then generated from models that used all the calibration points (reference maps) and from models that used the optimal sample size (optimal maps) for comparison. Our findings revealed that the optimal sample sizes based on the D<sub>JS</sub> analysis were closely aligned with the effective sample sizes from the model learning curves (815 for cLHS, 832 for FSCS, and 847 for SRS). Furthermore, the comparison of the optimal maps to the reference maps showed little difference in the global statistics (concordance correlation coefficient and root mean square error) and spatial trends of the data, confirming that the optimal sample size was sufficient for creating predictions of similar accuracy to the full calibration dataset. Finally, we conclude that the Ottawa soil survey project could have saved between CAD 330,500 and CAD 374,000 (CAD = Canadian dollars) if the determination of optimal sample size tools presented herein existed during the project planning phase. This clearly illustrates the need for additional research in determining an optimal sample size for DSM and demonstrates that operationalization of DSM in public institutions requires a sound scientific basis for determining sample size.https://www.mdpi.com/2073-445X/14/3/545sampling designsample sizedigital soil mappingconventional soil mappingdivergence metricsoperational soil survey |
| spellingShingle | Daniel D. Saurette Richard J. Heck Adam W. Gillespie Aaron A. Berg Asim Biswas Post-hoc Evaluation of Sample Size in a Regional Digital Soil Mapping Project Land sampling design sample size digital soil mapping conventional soil mapping divergence metrics operational soil survey |
| title | Post-hoc Evaluation of Sample Size in a Regional Digital Soil Mapping Project |
| title_full | Post-hoc Evaluation of Sample Size in a Regional Digital Soil Mapping Project |
| title_fullStr | Post-hoc Evaluation of Sample Size in a Regional Digital Soil Mapping Project |
| title_full_unstemmed | Post-hoc Evaluation of Sample Size in a Regional Digital Soil Mapping Project |
| title_short | Post-hoc Evaluation of Sample Size in a Regional Digital Soil Mapping Project |
| title_sort | post hoc evaluation of sample size in a regional digital soil mapping project |
| topic | sampling design sample size digital soil mapping conventional soil mapping divergence metrics operational soil survey |
| url | https://www.mdpi.com/2073-445X/14/3/545 |
| work_keys_str_mv | AT danieldsaurette posthocevaluationofsamplesizeinaregionaldigitalsoilmappingproject AT richardjheck posthocevaluationofsamplesizeinaregionaldigitalsoilmappingproject AT adamwgillespie posthocevaluationofsamplesizeinaregionaldigitalsoilmappingproject AT aaronaberg posthocevaluationofsamplesizeinaregionaldigitalsoilmappingproject AT asimbiswas posthocevaluationofsamplesizeinaregionaldigitalsoilmappingproject |