A unified multimodal learning method for urban functional zone identification by fusing inner-street visual–textual information from street-view and satellite images

Urban functional zones (UFZ) are areas that divide urban space into specific uses based on the distribution of different human activities and infrastructure. UFZ mapping is to analyze the geographic information data of urban space, combine remote sensing images (RSI), point of interest (POI) data an...

Full description

Saved in:
Bibliographic Details
Main Authors: Jiajun Chen, Runyu Fan, Hongyang Niu, Zijian Xu, Jining Yan, Weijing Song, Ruyi Feng
Format: Article
Language:English
Published: Elsevier 2025-08-01
Series:International Journal of Applied Earth Observations and Geoinformation
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1569843225003322
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Urban functional zones (UFZ) are areas that divide urban space into specific uses based on the distribution of different human activities and infrastructure. UFZ mapping is to analyze the geographic information data of urban space, combine remote sensing images (RSI), point of interest (POI) data and other data sources, and use advanced spatial analysis technology to divide and visualize the UFZ. The intelligent interpretation of UFZ can provide support for urban management and planning. Previous studies on UFZ mainly focused on using remote sensing images and POI data, which can obtain the city’s macroscopic remote sensing visual features and the distribution of land use. However, these methods often ignore the inner-street details due to the absence of using inner-street perspective data and cannot capture the complex spatial relations between objects in complex urban scenes, resulting in unsatisfied UFZ results. For this purpose, we propose a unified multimodal learning method to interpret UFZ by combining remote sensing images, POI data, and street view data with inner-street details to provide a more comprehensive perspective to boost UFZ interpretation. To make full use of the inner-street perspective advantage of street view images (SVI), we not only use their visual features but also extract textual features that can reflect various human activities in street views through image captioning technology, better to capture the subtle socio-economic activity information in urban space. We conduct extensive experiments in Wuhan, Changsha, and Nanchang. The OA of this method on the test set reached 91.80%. Experimental results show a significant improvement in the model’s performance in interpreting UFZ.
ISSN:1569-8432