Harnessing Semantic and Trajectory Analysis for Real-Time Pedestrian Panic Detection in Crowded Micro-Road Networks

Pedestrian panic behavior is a primary cause of overcrowding and stampede accidents in public micro-road network areas with high pedestrian density. However, reliably detecting such behaviors remains challenging due to their inherent complexity, variability, and stochastic nature. Current detection...

Full description

Saved in:
Bibliographic Details
Main Authors: Rongyong Zhao, Lingchen Han, Yuxin Cai, Bingyu Wei, Arifur Rahman, Cuiling Li, Yunlong Ma
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/10/5394
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Pedestrian panic behavior is a primary cause of overcrowding and stampede accidents in public micro-road network areas with high pedestrian density. However, reliably detecting such behaviors remains challenging due to their inherent complexity, variability, and stochastic nature. Current detection models often rely on single-modality features, which limits their effectiveness in complex and dynamic crowd scenarios. To overcome these limitations, this study proposes a contour-driven multimodal framework that first employs a CNN (CDNet) to estimate density maps and, by analyzing steep contour gradients, automatically delineates a candidate panic zone. Within these potential panic zones, pedestrian trajectories are analyzed through LSTM networks to capture irregular movements, such as counterflow and nonlinear wandering behaviors. Concurrently, semantic recognition based on Transformer models is utilized to identify verbal distress cues extracted through Baidu AI’s real-time speech-to-text conversion. The three embeddings are fused through a lightweight attention-enhanced MLP, enabling end-to-end inference at 40 FPS on a single GPU. To evaluate branch robustness under streaming conditions, the UCF Crowd dataset (150 videos without panic labels) is processed frame-by-frame at 25 FPS solely for density assessment, whereas full panic detection is validated on 30 real Itaewon-Stampede videos and 160 SUMO/Unity simulated emergencies that include explicit panic annotations. The proposed system achieves 91.7% accuracy and 88.2% F1 on the Itaewon set, outperforming all single- or dual-modality baselines and offering a deployable solution for proactive crowd safety monitoring in transport hubs, festivals, and other high-risk venues.
ISSN:2076-3417