Urban Road Anomaly Monitoring Using Vision–Language Models for Enhanced Safety Management
Abnormal phenomena on urban roads, including uneven surfaces, garbage, traffic congestion, floods, fallen trees, fires, and traffic accidents, present significant risks to public safety and infrastructure, necessitating real-time monitoring and early warning systems. This study develops Urban Road A...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-02-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/5/2517 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850034796382650368 |
|---|---|
| author | Hanyu Ding Yawei Du Zhengyu Xia |
| author_facet | Hanyu Ding Yawei Du Zhengyu Xia |
| author_sort | Hanyu Ding |
| collection | DOAJ |
| description | Abnormal phenomena on urban roads, including uneven surfaces, garbage, traffic congestion, floods, fallen trees, fires, and traffic accidents, present significant risks to public safety and infrastructure, necessitating real-time monitoring and early warning systems. This study develops Urban Road Anomaly Visual Large Language Models (URA-VLMs), a generative AI-based framework designed for the monitoring of diverse urban road anomalies. The InternVL was selected as a foundational model due to its adaptability for this monitoring purpose. The URA-VLMs framework features dedicated modules for anomaly detection, flood depth estimation, and safety level assessment, utilizing multi-step prompting and retrieval-augmented generation (RAG) for precise and adaptive analysis. A comprehensive dataset of 3034 annotated images depicting various urban road scenarios was developed to evaluate the models. Experimental results demonstrate the system’s effectiveness, achieving an overall anomaly detection accuracy of 93.20%, outperforming state-of-the-art models such as InternVL2.5 and ResNet34. By facilitating early detection and real-time decision-making, this generative AI approach offers a scalable and robust solution that contributes to a smarter, safer road environment. |
| format | Article |
| id | doaj-art-e6b85beb5b3b49609fe1005811819963 |
| institution | DOAJ |
| issn | 2076-3417 |
| language | English |
| publishDate | 2025-02-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-e6b85beb5b3b49609fe10058118199632025-08-20T02:57:41ZengMDPI AGApplied Sciences2076-34172025-02-01155251710.3390/app15052517Urban Road Anomaly Monitoring Using Vision–Language Models for Enhanced Safety ManagementHanyu Ding0Yawei Du1Zhengyu Xia2State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing 100024, ChinaDepartment of Civil and Environmental Engineering, The Hong Kong University of Science and Technology, Hong Kong 999077, ChinaState Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing 100024, ChinaAbnormal phenomena on urban roads, including uneven surfaces, garbage, traffic congestion, floods, fallen trees, fires, and traffic accidents, present significant risks to public safety and infrastructure, necessitating real-time monitoring and early warning systems. This study develops Urban Road Anomaly Visual Large Language Models (URA-VLMs), a generative AI-based framework designed for the monitoring of diverse urban road anomalies. The InternVL was selected as a foundational model due to its adaptability for this monitoring purpose. The URA-VLMs framework features dedicated modules for anomaly detection, flood depth estimation, and safety level assessment, utilizing multi-step prompting and retrieval-augmented generation (RAG) for precise and adaptive analysis. A comprehensive dataset of 3034 annotated images depicting various urban road scenarios was developed to evaluate the models. Experimental results demonstrate the system’s effectiveness, achieving an overall anomaly detection accuracy of 93.20%, outperforming state-of-the-art models such as InternVL2.5 and ResNet34. By facilitating early detection and real-time decision-making, this generative AI approach offers a scalable and robust solution that contributes to a smarter, safer road environment.https://www.mdpi.com/2076-3417/15/5/2517urban road safetyvision large language modelroad anomalyreal-time monitoringsafety management |
| spellingShingle | Hanyu Ding Yawei Du Zhengyu Xia Urban Road Anomaly Monitoring Using Vision–Language Models for Enhanced Safety Management Applied Sciences urban road safety vision large language model road anomaly real-time monitoring safety management |
| title | Urban Road Anomaly Monitoring Using Vision–Language Models for Enhanced Safety Management |
| title_full | Urban Road Anomaly Monitoring Using Vision–Language Models for Enhanced Safety Management |
| title_fullStr | Urban Road Anomaly Monitoring Using Vision–Language Models for Enhanced Safety Management |
| title_full_unstemmed | Urban Road Anomaly Monitoring Using Vision–Language Models for Enhanced Safety Management |
| title_short | Urban Road Anomaly Monitoring Using Vision–Language Models for Enhanced Safety Management |
| title_sort | urban road anomaly monitoring using vision language models for enhanced safety management |
| topic | urban road safety vision large language model road anomaly real-time monitoring safety management |
| url | https://www.mdpi.com/2076-3417/15/5/2517 |
| work_keys_str_mv | AT hanyuding urbanroadanomalymonitoringusingvisionlanguagemodelsforenhancedsafetymanagement AT yaweidu urbanroadanomalymonitoringusingvisionlanguagemodelsforenhancedsafetymanagement AT zhengyuxia urbanroadanomalymonitoringusingvisionlanguagemodelsforenhancedsafetymanagement |