Lightweight multi-stage temporal inference network for video crowd counting

Crowd density is an important metric for preventing excessive crowding in a particular area, but it still faces challenges such as perspective distortion, scale variation, and pedestrian occlusion. Existing studies have attempted to model the spatio-temporal dependencies in videos using LSTM and 3D...

Full description

Saved in:
Bibliographic Details
Main Authors: Wei Gao, Rui Feng, Xiaochun Sheng
Format: Article
Language:English
Published: Frontiers Media S.A. 2024-11-01
Series:Frontiers in Physics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fphy.2024.1489245/full
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Crowd density is an important metric for preventing excessive crowding in a particular area, but it still faces challenges such as perspective distortion, scale variation, and pedestrian occlusion. Existing studies have attempted to model the spatio-temporal dependencies in videos using LSTM and 3D CNNs. However, these methods suffer from large computational costs, excessive parameter redundancy, and loss of temporal information, leading to difficulties in model convergence and limited recognition performance. To address these issues, we propose a lightweight multi-stage temporal inference network (LMSTIN) for video crowd counting. LMSTIN effectively models the spatio-temporal dependencies in video sequences at a fine-grained level, enabling real-time and accurate video crowd counting. Our proposed method achieves significant performance improvements on three public crowd counting datasets.
ISSN:2296-424X