A Comparative Analysis of Machine Learning and Pedotransfer Functions Under Varying Data Availability in Two Greek Regions
The current study evaluates the performance of pedotransfer functions (PTFs) and machine learning (ML) algorithms in predicting the soil bulk density (BD) across two distinct regions in Greece—Kozani and Veroia—using both limited and extended sets of soil parameters. The results reveal significant r...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Agriculture |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2077-0472/15/11/1134 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The current study evaluates the performance of pedotransfer functions (PTFs) and machine learning (ML) algorithms in predicting the soil bulk density (BD) across two distinct regions in Greece—Kozani and Veroia—using both limited and extended sets of soil parameters. The results reveal significant regional differences in prediction accuracy. In the full dataset scenario, Veroia consistently exhibits superior predictive performance across all models (PDF RMSE: 0.104, ML RMSE: 0.095) compared to Kozani (PDF RMSE: 0.133, ML RMSE: 0.122). Generally, ML models outperform PTFs in terms of the RMSE and MAE in both regions with the full dataset. However, PTFs occasionally demonstrate higher R<sup>2</sup> values (Veroia PTF R<sup>2</sup>: 0.35 vs. ML R<sup>2</sup>: 0.28), suggesting a better explanation of the overall variance despite larger errors. Notably, the effectiveness of ML appears to be affected by the availability of data. In Kozani, when restricted to basic soil properties, ML’s performance (RMSE: 0.129, R<sup>2</sup>: 0.16) becomes similar to that of PTFs (RMSE: 0.133, R<sup>2</sup>: 0.16). However, incorporating the full dataset substantially enhances ML’s predictive power (RMSE: 0.122, R<sup>2</sup>: 0.26). Conversely, in Veroia, the inclusion of more variables paradoxically results in a slight decline in ML performance (ML_min RMSE: 0.093, R<sup>2</sup>: 0.31 vs. ML RMSE: 0.095, R<sup>2</sup>: 0.28). These contrasting results emphasize the need for context-specific modeling strategies, careful feature selection, and caution against the assumption that more data or complexity inherently improves the predictive performance. |
|---|---|
| ISSN: | 2077-0472 |