Self-Supervised Learning with Trilateral Redundancy Reduction for Urban Functional Zone Identification Using Street-View Imagery
In recent years, the use of street-view images for urban analysis has received much attention. Despite the abundance of raw data, existing supervised learning methods heavily rely on large-scale and high-quality labels. Faced with the challenge of label scarcity in urban scene classification tasks,...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-02-01
|
| Series: | Sensors |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1424-8220/25/5/1504 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850222687417270272 |
|---|---|
| author | Kun Zhao Juan Li Shuai Xie Lijian Zhou Wenbin He Xiaolin Chen |
| author_facet | Kun Zhao Juan Li Shuai Xie Lijian Zhou Wenbin He Xiaolin Chen |
| author_sort | Kun Zhao |
| collection | DOAJ |
| description | In recent years, the use of street-view images for urban analysis has received much attention. Despite the abundance of raw data, existing supervised learning methods heavily rely on large-scale and high-quality labels. Faced with the challenge of label scarcity in urban scene classification tasks, an innovative self-supervised learning framework, Trilateral Redundancy Reduction (Tri-ReD) is proposed. In this framework, a more restrictive loss, “trilateral loss”, is proposed. By compelling the embedding of positive samples to be highly correlated, it guides the pre-trained model to learn more essential representations without semantic labels. Furthermore, a novel data augmentation strategy, tri-branch mutually exclusive augmentation (Tri-MExA), is proposed. Its aim is to reduce the uncertainties introduced by traditional random augmentation methods. As a model pre-training method, Tri-ReD framework is architecture-agnostic, performing effectively on both CNNs and ViTs, which makes it adaptable for a wide variety of downstream tasks. In this paper, 116,491 unlabeled street-view images were used to pre-train models by Tri-ReD to obtain the general representation of urban scenes at the ground level. These pre-trained models were then fine-tuned using supervised data with semantic labels (17,600 images from BIC_GSV and 12,871 from BEAUTY) for the final classification task. Experimental results demonstrate that the proposed self-supervised pre-training method outperformed the direct supervised learning approaches for urban functional zone identification by 19% on average. It also surpassed the performance of models pre-trained on ImageNet by around 11%, achieving state-of-the-art (SOTA) results in self-supervised pre-training. |
| format | Article |
| id | doaj-art-2aa4fe1b00ac483aa2e64ef2e84130a4 |
| institution | OA Journals |
| issn | 1424-8220 |
| language | English |
| publishDate | 2025-02-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Sensors |
| spelling | doaj-art-2aa4fe1b00ac483aa2e64ef2e84130a42025-08-20T02:06:15ZengMDPI AGSensors1424-82202025-02-01255150410.3390/s25051504Self-Supervised Learning with Trilateral Redundancy Reduction for Urban Functional Zone Identification Using Street-View ImageryKun Zhao0Juan Li1Shuai Xie2Lijian Zhou3Wenbin He4Xiaolin Chen5School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, ChinaSchool of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, ChinaSchool of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, ChinaSchool of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, ChinaSchool of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, ChinaSchool of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, ChinaIn recent years, the use of street-view images for urban analysis has received much attention. Despite the abundance of raw data, existing supervised learning methods heavily rely on large-scale and high-quality labels. Faced with the challenge of label scarcity in urban scene classification tasks, an innovative self-supervised learning framework, Trilateral Redundancy Reduction (Tri-ReD) is proposed. In this framework, a more restrictive loss, “trilateral loss”, is proposed. By compelling the embedding of positive samples to be highly correlated, it guides the pre-trained model to learn more essential representations without semantic labels. Furthermore, a novel data augmentation strategy, tri-branch mutually exclusive augmentation (Tri-MExA), is proposed. Its aim is to reduce the uncertainties introduced by traditional random augmentation methods. As a model pre-training method, Tri-ReD framework is architecture-agnostic, performing effectively on both CNNs and ViTs, which makes it adaptable for a wide variety of downstream tasks. In this paper, 116,491 unlabeled street-view images were used to pre-train models by Tri-ReD to obtain the general representation of urban scenes at the ground level. These pre-trained models were then fine-tuned using supervised data with semantic labels (17,600 images from BIC_GSV and 12,871 from BEAUTY) for the final classification task. Experimental results demonstrate that the proposed self-supervised pre-training method outperformed the direct supervised learning approaches for urban functional zone identification by 19% on average. It also surpassed the performance of models pre-trained on ImageNet by around 11%, achieving state-of-the-art (SOTA) results in self-supervised pre-training.https://www.mdpi.com/1424-8220/25/5/1504street-view imageryself-supervised learningredundancy reductionurban scene classificationurban functional zone identification |
| spellingShingle | Kun Zhao Juan Li Shuai Xie Lijian Zhou Wenbin He Xiaolin Chen Self-Supervised Learning with Trilateral Redundancy Reduction for Urban Functional Zone Identification Using Street-View Imagery Sensors street-view imagery self-supervised learning redundancy reduction urban scene classification urban functional zone identification |
| title | Self-Supervised Learning with Trilateral Redundancy Reduction for Urban Functional Zone Identification Using Street-View Imagery |
| title_full | Self-Supervised Learning with Trilateral Redundancy Reduction for Urban Functional Zone Identification Using Street-View Imagery |
| title_fullStr | Self-Supervised Learning with Trilateral Redundancy Reduction for Urban Functional Zone Identification Using Street-View Imagery |
| title_full_unstemmed | Self-Supervised Learning with Trilateral Redundancy Reduction for Urban Functional Zone Identification Using Street-View Imagery |
| title_short | Self-Supervised Learning with Trilateral Redundancy Reduction for Urban Functional Zone Identification Using Street-View Imagery |
| title_sort | self supervised learning with trilateral redundancy reduction for urban functional zone identification using street view imagery |
| topic | street-view imagery self-supervised learning redundancy reduction urban scene classification urban functional zone identification |
| url | https://www.mdpi.com/1424-8220/25/5/1504 |
| work_keys_str_mv | AT kunzhao selfsupervisedlearningwithtrilateralredundancyreductionforurbanfunctionalzoneidentificationusingstreetviewimagery AT juanli selfsupervisedlearningwithtrilateralredundancyreductionforurbanfunctionalzoneidentificationusingstreetviewimagery AT shuaixie selfsupervisedlearningwithtrilateralredundancyreductionforurbanfunctionalzoneidentificationusingstreetviewimagery AT lijianzhou selfsupervisedlearningwithtrilateralredundancyreductionforurbanfunctionalzoneidentificationusingstreetviewimagery AT wenbinhe selfsupervisedlearningwithtrilateralredundancyreductionforurbanfunctionalzoneidentificationusingstreetviewimagery AT xiaolinchen selfsupervisedlearningwithtrilateralredundancyreductionforurbanfunctionalzoneidentificationusingstreetviewimagery |