Interpretable Machine Learning for Population Spatialization and Optimal Grid Scale Selection in Shanghai

Fine-scale population distribution information is crucial for applications in urban public safety, planning, and management. However, when using machine learning methods for population spatialization, issues such as data overfitting and limited interpretability need to be addressed. This study intro...

Full description

Saved in:
Bibliographic Details
Main Authors: Yuan Cao, Hefeng Wang, Lanxuan Guo, Anbing Zhang, Xiaohu Wu
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/9/4755
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849312781509066752
author Yuan Cao
Hefeng Wang
Lanxuan Guo
Anbing Zhang
Xiaohu Wu
author_facet Yuan Cao
Hefeng Wang
Lanxuan Guo
Anbing Zhang
Xiaohu Wu
author_sort Yuan Cao
collection DOAJ
description Fine-scale population distribution information is crucial for applications in urban public safety, planning, and management. However, when using machine learning methods for population spatialization, issues such as data overfitting and limited interpretability need to be addressed. This study introduced a combined approach using eXtreme Gradient Boosting (XGBoost) and SHapley Additive exPlanation (SHAP) to estimate population spatialization at various grid scales and interpret the key influencing factors, then we applied accuracy evaluation metrics and landscape ecology indices to identify the optimal grid scale. The results showed that the XGBoost model outperformed the WorldPop dataset in accuracy across all grid scales, with determination coefficients (R<sup>2</sup>) consistently exceeding 0.83. The SHAP analysis revealed that the primary influencing factors were the address, access, and dwelling characteristics of points of interest (POIs). The influence of these factors showed regional variations, with urban centers having a strong positive effect, while the negative influence increased with the distance to suburban areas. The population density estimates across different grid scales consistently exhibited a spatial gradient pattern of decreasing density from the urban center toward suburban areas. Based on comprehensive evaluations of accuracy and spatial heterogeneity, the 100 m grid was identified as the optimal scale for Shanghai’s population spatialization. The proposed XGBoost-SHAP population spatialization method demonstrates high reliability and generalizability, effectively explaining the heterogeneity of population distribution. This approach not only provides critical decision-making support for urban planning but also serves as a methodological reference for high-resolution population spatialization studies in other cities.
format Article
id doaj-art-ff334b69ac344bf99ca533bbc197e393
institution Kabale University
issn 2076-3417
language English
publishDate 2025-04-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-ff334b69ac344bf99ca533bbc197e3932025-08-20T03:52:57ZengMDPI AGApplied Sciences2076-34172025-04-01159475510.3390/app15094755Interpretable Machine Learning for Population Spatialization and Optimal Grid Scale Selection in ShanghaiYuan Cao0Hefeng Wang1Lanxuan Guo2Anbing Zhang3Xiaohu Wu4School of Mining and Geomatics Engineering, Hebei University of Engineering, Handan 056038, ChinaSchool of Mining and Geomatics Engineering, Hebei University of Engineering, Handan 056038, ChinaSchool of Mining and Geomatics Engineering, Hebei University of Engineering, Handan 056038, ChinaSchool of Mining and Geomatics Engineering, Hebei University of Engineering, Handan 056038, ChinaSchool of Mining and Geomatics Engineering, Hebei University of Engineering, Handan 056038, ChinaFine-scale population distribution information is crucial for applications in urban public safety, planning, and management. However, when using machine learning methods for population spatialization, issues such as data overfitting and limited interpretability need to be addressed. This study introduced a combined approach using eXtreme Gradient Boosting (XGBoost) and SHapley Additive exPlanation (SHAP) to estimate population spatialization at various grid scales and interpret the key influencing factors, then we applied accuracy evaluation metrics and landscape ecology indices to identify the optimal grid scale. The results showed that the XGBoost model outperformed the WorldPop dataset in accuracy across all grid scales, with determination coefficients (R<sup>2</sup>) consistently exceeding 0.83. The SHAP analysis revealed that the primary influencing factors were the address, access, and dwelling characteristics of points of interest (POIs). The influence of these factors showed regional variations, with urban centers having a strong positive effect, while the negative influence increased with the distance to suburban areas. The population density estimates across different grid scales consistently exhibited a spatial gradient pattern of decreasing density from the urban center toward suburban areas. Based on comprehensive evaluations of accuracy and spatial heterogeneity, the 100 m grid was identified as the optimal scale for Shanghai’s population spatialization. The proposed XGBoost-SHAP population spatialization method demonstrates high reliability and generalizability, effectively explaining the heterogeneity of population distribution. This approach not only provides critical decision-making support for urban planning but also serves as a methodological reference for high-resolution population spatialization studies in other cities.https://www.mdpi.com/2076-3417/15/9/4755population spatializationXGBoostSHAPgrid scalefeature variable
spellingShingle Yuan Cao
Hefeng Wang
Lanxuan Guo
Anbing Zhang
Xiaohu Wu
Interpretable Machine Learning for Population Spatialization and Optimal Grid Scale Selection in Shanghai
Applied Sciences
population spatialization
XGBoost
SHAP
grid scale
feature variable
title Interpretable Machine Learning for Population Spatialization and Optimal Grid Scale Selection in Shanghai
title_full Interpretable Machine Learning for Population Spatialization and Optimal Grid Scale Selection in Shanghai
title_fullStr Interpretable Machine Learning for Population Spatialization and Optimal Grid Scale Selection in Shanghai
title_full_unstemmed Interpretable Machine Learning for Population Spatialization and Optimal Grid Scale Selection in Shanghai
title_short Interpretable Machine Learning for Population Spatialization and Optimal Grid Scale Selection in Shanghai
title_sort interpretable machine learning for population spatialization and optimal grid scale selection in shanghai
topic population spatialization
XGBoost
SHAP
grid scale
feature variable
url https://www.mdpi.com/2076-3417/15/9/4755
work_keys_str_mv AT yuancao interpretablemachinelearningforpopulationspatializationandoptimalgridscaleselectioninshanghai
AT hefengwang interpretablemachinelearningforpopulationspatializationandoptimalgridscaleselectioninshanghai
AT lanxuanguo interpretablemachinelearningforpopulationspatializationandoptimalgridscaleselectioninshanghai
AT anbingzhang interpretablemachinelearningforpopulationspatializationandoptimalgridscaleselectioninshanghai
AT xiaohuwu interpretablemachinelearningforpopulationspatializationandoptimalgridscaleselectioninshanghai