Self-Supervised Learning with Trilateral Redundancy Reduction for Urban Functional Zone Identification Using Street-View Imagery

In recent years, the use of street-view images for urban analysis has received much attention. Despite the abundance of raw data, existing supervised learning methods heavily rely on large-scale and high-quality labels. Faced with the challenge of label scarcity in urban scene classification tasks,...

Full description

Saved in:
Bibliographic Details
Main Authors: Kun Zhao, Juan Li, Shuai Xie, Lijian Zhou, Wenbin He, Xiaolin Chen
Format: Article
Language:English
Published: MDPI AG 2025-02-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/5/1504
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850222687417270272
author Kun Zhao
Juan Li
Shuai Xie
Lijian Zhou
Wenbin He
Xiaolin Chen
author_facet Kun Zhao
Juan Li
Shuai Xie
Lijian Zhou
Wenbin He
Xiaolin Chen
author_sort Kun Zhao
collection DOAJ
description In recent years, the use of street-view images for urban analysis has received much attention. Despite the abundance of raw data, existing supervised learning methods heavily rely on large-scale and high-quality labels. Faced with the challenge of label scarcity in urban scene classification tasks, an innovative self-supervised learning framework, Trilateral Redundancy Reduction (Tri-ReD) is proposed. In this framework, a more restrictive loss, “trilateral loss”, is proposed. By compelling the embedding of positive samples to be highly correlated, it guides the pre-trained model to learn more essential representations without semantic labels. Furthermore, a novel data augmentation strategy, tri-branch mutually exclusive augmentation (Tri-MExA), is proposed. Its aim is to reduce the uncertainties introduced by traditional random augmentation methods. As a model pre-training method, Tri-ReD framework is architecture-agnostic, performing effectively on both CNNs and ViTs, which makes it adaptable for a wide variety of downstream tasks. In this paper, 116,491 unlabeled street-view images were used to pre-train models by Tri-ReD to obtain the general representation of urban scenes at the ground level. These pre-trained models were then fine-tuned using supervised data with semantic labels (17,600 images from BIC_GSV and 12,871 from BEAUTY) for the final classification task. Experimental results demonstrate that the proposed self-supervised pre-training method outperformed the direct supervised learning approaches for urban functional zone identification by 19% on average. It also surpassed the performance of models pre-trained on ImageNet by around 11%, achieving state-of-the-art (SOTA) results in self-supervised pre-training.
format Article
id doaj-art-2aa4fe1b00ac483aa2e64ef2e84130a4
institution OA Journals
issn 1424-8220
language English
publishDate 2025-02-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj-art-2aa4fe1b00ac483aa2e64ef2e84130a42025-08-20T02:06:15ZengMDPI AGSensors1424-82202025-02-01255150410.3390/s25051504Self-Supervised Learning with Trilateral Redundancy Reduction for Urban Functional Zone Identification Using Street-View ImageryKun Zhao0Juan Li1Shuai Xie2Lijian Zhou3Wenbin He4Xiaolin Chen5School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, ChinaSchool of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, ChinaSchool of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, ChinaSchool of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, ChinaSchool of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, ChinaSchool of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, ChinaIn recent years, the use of street-view images for urban analysis has received much attention. Despite the abundance of raw data, existing supervised learning methods heavily rely on large-scale and high-quality labels. Faced with the challenge of label scarcity in urban scene classification tasks, an innovative self-supervised learning framework, Trilateral Redundancy Reduction (Tri-ReD) is proposed. In this framework, a more restrictive loss, “trilateral loss”, is proposed. By compelling the embedding of positive samples to be highly correlated, it guides the pre-trained model to learn more essential representations without semantic labels. Furthermore, a novel data augmentation strategy, tri-branch mutually exclusive augmentation (Tri-MExA), is proposed. Its aim is to reduce the uncertainties introduced by traditional random augmentation methods. As a model pre-training method, Tri-ReD framework is architecture-agnostic, performing effectively on both CNNs and ViTs, which makes it adaptable for a wide variety of downstream tasks. In this paper, 116,491 unlabeled street-view images were used to pre-train models by Tri-ReD to obtain the general representation of urban scenes at the ground level. These pre-trained models were then fine-tuned using supervised data with semantic labels (17,600 images from BIC_GSV and 12,871 from BEAUTY) for the final classification task. Experimental results demonstrate that the proposed self-supervised pre-training method outperformed the direct supervised learning approaches for urban functional zone identification by 19% on average. It also surpassed the performance of models pre-trained on ImageNet by around 11%, achieving state-of-the-art (SOTA) results in self-supervised pre-training.https://www.mdpi.com/1424-8220/25/5/1504street-view imageryself-supervised learningredundancy reductionurban scene classificationurban functional zone identification
spellingShingle Kun Zhao
Juan Li
Shuai Xie
Lijian Zhou
Wenbin He
Xiaolin Chen
Self-Supervised Learning with Trilateral Redundancy Reduction for Urban Functional Zone Identification Using Street-View Imagery
Sensors
street-view imagery
self-supervised learning
redundancy reduction
urban scene classification
urban functional zone identification
title Self-Supervised Learning with Trilateral Redundancy Reduction for Urban Functional Zone Identification Using Street-View Imagery
title_full Self-Supervised Learning with Trilateral Redundancy Reduction for Urban Functional Zone Identification Using Street-View Imagery
title_fullStr Self-Supervised Learning with Trilateral Redundancy Reduction for Urban Functional Zone Identification Using Street-View Imagery
title_full_unstemmed Self-Supervised Learning with Trilateral Redundancy Reduction for Urban Functional Zone Identification Using Street-View Imagery
title_short Self-Supervised Learning with Trilateral Redundancy Reduction for Urban Functional Zone Identification Using Street-View Imagery
title_sort self supervised learning with trilateral redundancy reduction for urban functional zone identification using street view imagery
topic street-view imagery
self-supervised learning
redundancy reduction
urban scene classification
urban functional zone identification
url https://www.mdpi.com/1424-8220/25/5/1504
work_keys_str_mv AT kunzhao selfsupervisedlearningwithtrilateralredundancyreductionforurbanfunctionalzoneidentificationusingstreetviewimagery
AT juanli selfsupervisedlearningwithtrilateralredundancyreductionforurbanfunctionalzoneidentificationusingstreetviewimagery
AT shuaixie selfsupervisedlearningwithtrilateralredundancyreductionforurbanfunctionalzoneidentificationusingstreetviewimagery
AT lijianzhou selfsupervisedlearningwithtrilateralredundancyreductionforurbanfunctionalzoneidentificationusingstreetviewimagery
AT wenbinhe selfsupervisedlearningwithtrilateralredundancyreductionforurbanfunctionalzoneidentificationusingstreetviewimagery
AT xiaolinchen selfsupervisedlearningwithtrilateralredundancyreductionforurbanfunctionalzoneidentificationusingstreetviewimagery