Interpretable Machine Learning for Population Spatialization and Optimal Grid Scale Selection in Shanghai

Fine-scale population distribution information is crucial for applications in urban public safety, planning, and management. However, when using machine learning methods for population spatialization, issues such as data overfitting and limited interpretability need to be addressed. This study intro...

Full description

Saved in:
Bibliographic Details
Main Authors: Yuan Cao, Hefeng Wang, Lanxuan Guo, Anbing Zhang, Xiaohu Wu
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/9/4755
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Fine-scale population distribution information is crucial for applications in urban public safety, planning, and management. However, when using machine learning methods for population spatialization, issues such as data overfitting and limited interpretability need to be addressed. This study introduced a combined approach using eXtreme Gradient Boosting (XGBoost) and SHapley Additive exPlanation (SHAP) to estimate population spatialization at various grid scales and interpret the key influencing factors, then we applied accuracy evaluation metrics and landscape ecology indices to identify the optimal grid scale. The results showed that the XGBoost model outperformed the WorldPop dataset in accuracy across all grid scales, with determination coefficients (R<sup>2</sup>) consistently exceeding 0.83. The SHAP analysis revealed that the primary influencing factors were the address, access, and dwelling characteristics of points of interest (POIs). The influence of these factors showed regional variations, with urban centers having a strong positive effect, while the negative influence increased with the distance to suburban areas. The population density estimates across different grid scales consistently exhibited a spatial gradient pattern of decreasing density from the urban center toward suburban areas. Based on comprehensive evaluations of accuracy and spatial heterogeneity, the 100 m grid was identified as the optimal scale for Shanghai’s population spatialization. The proposed XGBoost-SHAP population spatialization method demonstrates high reliability and generalizability, effectively explaining the heterogeneity of population distribution. This approach not only provides critical decision-making support for urban planning but also serves as a methodological reference for high-resolution population spatialization studies in other cities.
ISSN:2076-3417