A Comparative Analysis of Machine Learning and Pedotransfer Functions Under Varying Data Availability in Two Greek Regions

The current study evaluates the performance of pedotransfer functions (PTFs) and machine learning (ML) algorithms in predicting the soil bulk density (BD) across two distinct regions in Greece—Kozani and Veroia—using both limited and extended sets of soil parameters. The results reveal significant r...

Full description

Saved in:
Bibliographic Details
Main Authors: Panagiotis Tziachris, Panagiota Louka, Eirini Metaxa, Miltiadis Iatrou, Konstantinos Tsiouplakis
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Agriculture
Subjects:
Online Access:https://www.mdpi.com/2077-0472/15/11/1134
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The current study evaluates the performance of pedotransfer functions (PTFs) and machine learning (ML) algorithms in predicting the soil bulk density (BD) across two distinct regions in Greece—Kozani and Veroia—using both limited and extended sets of soil parameters. The results reveal significant regional differences in prediction accuracy. In the full dataset scenario, Veroia consistently exhibits superior predictive performance across all models (PDF RMSE: 0.104, ML RMSE: 0.095) compared to Kozani (PDF RMSE: 0.133, ML RMSE: 0.122). Generally, ML models outperform PTFs in terms of the RMSE and MAE in both regions with the full dataset. However, PTFs occasionally demonstrate higher R<sup>2</sup> values (Veroia PTF R<sup>2</sup>: 0.35 vs. ML R<sup>2</sup>: 0.28), suggesting a better explanation of the overall variance despite larger errors. Notably, the effectiveness of ML appears to be affected by the availability of data. In Kozani, when restricted to basic soil properties, ML’s performance (RMSE: 0.129, R<sup>2</sup>: 0.16) becomes similar to that of PTFs (RMSE: 0.133, R<sup>2</sup>: 0.16). However, incorporating the full dataset substantially enhances ML’s predictive power (RMSE: 0.122, R<sup>2</sup>: 0.26). Conversely, in Veroia, the inclusion of more variables paradoxically results in a slight decline in ML performance (ML_min RMSE: 0.093, R<sup>2</sup>: 0.31 vs. ML RMSE: 0.095, R<sup>2</sup>: 0.28). These contrasting results emphasize the need for context-specific modeling strategies, careful feature selection, and caution against the assumption that more data or complexity inherently improves the predictive performance.
ISSN:2077-0472