Convolutional spatio-temporal sequential inference model for human interaction behavior recognition

IntroductionHuman action recognition is a critical task with broad applications and remains a challenging problem due to the complexity of modeling dynamic interactions between individuals. Existing methods, including skeleton sequence-based and RGB video-based models, have achieved impressive accur...

Full description

Saved in:

Bibliographic Details
Main Authors:	Lizhong Jin, Rulong Fan, Xiaoling Han, Xueying Cui
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2025-07-01
Series:	Frontiers in Computer Science
Subjects:	human behavior recognition deep learning multimodal learning skeleton point sequence information time series recognition inference mode
Online Access:	https://www.frontiersin.org/articles/10.3389/fcomp.2025.1576775/full
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	IntroductionHuman action recognition is a critical task with broad applications and remains a challenging problem due to the complexity of modeling dynamic interactions between individuals. Existing methods, including skeleton sequence-based and RGB video-based models, have achieved impressive accuracy but often suffer from high computational costs and limited effectiveness in modeling human interaction behaviors.MethodsTo address these limitations, we propose a lightweight Convolutional Spatiotemporal Sequence Inference Model (CSSIModel) for recognizing human interaction behaviors. The model extracts features from skeleton sequences using DINet and from RGB video frames using ResNet-18. These multi-modal features are fused and processed using a novel multiscale two-dimensional convolutional peak-valley inference module to classify interaction behaviors.ResultsCSSIModel achieves competitive results across several benchmark datasets: 87.4% accuracy on NTU RGB+D 60 (XSub), 94.1% on NTU RGB+D 60 (XView), 80.5% on NTU RGB+D 120 (XSub), and 84.9% on NTU RGB+D 120 (XSet). These results are comparable to or exceed those of state-of-the-art methods.DiscussionThe proposed method effectively balances accuracy and computational efficiency. By significantly reducing model complexity while maintaining high performance, CSSIModel is well-suited for real-time applications and provides a valuable reference for future research in multi-modal human behavior recognition.
ISSN:	2624-9898

Convolutional spatio-temporal sequential inference model for human interaction behavior recognition

Similar Items