Interpretable Machine Learning for Population Spatialization and Optimal Grid Scale Selection in Shanghai
Fine-scale population distribution information is crucial for applications in urban public safety, planning, and management. However, when using machine learning methods for population spatialization, issues such as data overfitting and limited interpretability need to be addressed. This study intro...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-04-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/9/4755 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849312781509066752 |
|---|---|
| author | Yuan Cao Hefeng Wang Lanxuan Guo Anbing Zhang Xiaohu Wu |
| author_facet | Yuan Cao Hefeng Wang Lanxuan Guo Anbing Zhang Xiaohu Wu |
| author_sort | Yuan Cao |
| collection | DOAJ |
| description | Fine-scale population distribution information is crucial for applications in urban public safety, planning, and management. However, when using machine learning methods for population spatialization, issues such as data overfitting and limited interpretability need to be addressed. This study introduced a combined approach using eXtreme Gradient Boosting (XGBoost) and SHapley Additive exPlanation (SHAP) to estimate population spatialization at various grid scales and interpret the key influencing factors, then we applied accuracy evaluation metrics and landscape ecology indices to identify the optimal grid scale. The results showed that the XGBoost model outperformed the WorldPop dataset in accuracy across all grid scales, with determination coefficients (R<sup>2</sup>) consistently exceeding 0.83. The SHAP analysis revealed that the primary influencing factors were the address, access, and dwelling characteristics of points of interest (POIs). The influence of these factors showed regional variations, with urban centers having a strong positive effect, while the negative influence increased with the distance to suburban areas. The population density estimates across different grid scales consistently exhibited a spatial gradient pattern of decreasing density from the urban center toward suburban areas. Based on comprehensive evaluations of accuracy and spatial heterogeneity, the 100 m grid was identified as the optimal scale for Shanghai’s population spatialization. The proposed XGBoost-SHAP population spatialization method demonstrates high reliability and generalizability, effectively explaining the heterogeneity of population distribution. This approach not only provides critical decision-making support for urban planning but also serves as a methodological reference for high-resolution population spatialization studies in other cities. |
| format | Article |
| id | doaj-art-ff334b69ac344bf99ca533bbc197e393 |
| institution | Kabale University |
| issn | 2076-3417 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-ff334b69ac344bf99ca533bbc197e3932025-08-20T03:52:57ZengMDPI AGApplied Sciences2076-34172025-04-01159475510.3390/app15094755Interpretable Machine Learning for Population Spatialization and Optimal Grid Scale Selection in ShanghaiYuan Cao0Hefeng Wang1Lanxuan Guo2Anbing Zhang3Xiaohu Wu4School of Mining and Geomatics Engineering, Hebei University of Engineering, Handan 056038, ChinaSchool of Mining and Geomatics Engineering, Hebei University of Engineering, Handan 056038, ChinaSchool of Mining and Geomatics Engineering, Hebei University of Engineering, Handan 056038, ChinaSchool of Mining and Geomatics Engineering, Hebei University of Engineering, Handan 056038, ChinaSchool of Mining and Geomatics Engineering, Hebei University of Engineering, Handan 056038, ChinaFine-scale population distribution information is crucial for applications in urban public safety, planning, and management. However, when using machine learning methods for population spatialization, issues such as data overfitting and limited interpretability need to be addressed. This study introduced a combined approach using eXtreme Gradient Boosting (XGBoost) and SHapley Additive exPlanation (SHAP) to estimate population spatialization at various grid scales and interpret the key influencing factors, then we applied accuracy evaluation metrics and landscape ecology indices to identify the optimal grid scale. The results showed that the XGBoost model outperformed the WorldPop dataset in accuracy across all grid scales, with determination coefficients (R<sup>2</sup>) consistently exceeding 0.83. The SHAP analysis revealed that the primary influencing factors were the address, access, and dwelling characteristics of points of interest (POIs). The influence of these factors showed regional variations, with urban centers having a strong positive effect, while the negative influence increased with the distance to suburban areas. The population density estimates across different grid scales consistently exhibited a spatial gradient pattern of decreasing density from the urban center toward suburban areas. Based on comprehensive evaluations of accuracy and spatial heterogeneity, the 100 m grid was identified as the optimal scale for Shanghai’s population spatialization. The proposed XGBoost-SHAP population spatialization method demonstrates high reliability and generalizability, effectively explaining the heterogeneity of population distribution. This approach not only provides critical decision-making support for urban planning but also serves as a methodological reference for high-resolution population spatialization studies in other cities.https://www.mdpi.com/2076-3417/15/9/4755population spatializationXGBoostSHAPgrid scalefeature variable |
| spellingShingle | Yuan Cao Hefeng Wang Lanxuan Guo Anbing Zhang Xiaohu Wu Interpretable Machine Learning for Population Spatialization and Optimal Grid Scale Selection in Shanghai Applied Sciences population spatialization XGBoost SHAP grid scale feature variable |
| title | Interpretable Machine Learning for Population Spatialization and Optimal Grid Scale Selection in Shanghai |
| title_full | Interpretable Machine Learning for Population Spatialization and Optimal Grid Scale Selection in Shanghai |
| title_fullStr | Interpretable Machine Learning for Population Spatialization and Optimal Grid Scale Selection in Shanghai |
| title_full_unstemmed | Interpretable Machine Learning for Population Spatialization and Optimal Grid Scale Selection in Shanghai |
| title_short | Interpretable Machine Learning for Population Spatialization and Optimal Grid Scale Selection in Shanghai |
| title_sort | interpretable machine learning for population spatialization and optimal grid scale selection in shanghai |
| topic | population spatialization XGBoost SHAP grid scale feature variable |
| url | https://www.mdpi.com/2076-3417/15/9/4755 |
| work_keys_str_mv | AT yuancao interpretablemachinelearningforpopulationspatializationandoptimalgridscaleselectioninshanghai AT hefengwang interpretablemachinelearningforpopulationspatializationandoptimalgridscaleselectioninshanghai AT lanxuanguo interpretablemachinelearningforpopulationspatializationandoptimalgridscaleselectioninshanghai AT anbingzhang interpretablemachinelearningforpopulationspatializationandoptimalgridscaleselectioninshanghai AT xiaohuwu interpretablemachinelearningforpopulationspatializationandoptimalgridscaleselectioninshanghai |