Advancing skeleton-based human behavior recognition: multi-stream fusion spatiotemporal graph convolutional networks

Abstract In the realm of daily human interactions, a rich tapestry of behaviors and actions is observed, encompassing a wealth of informative cues. In the era of burgeoning big data, extensive repositories of images and videos have risen to prominence as the primary conduits for disseminating inform...

Full description

Saved in:

Bibliographic Details
Main Authors:	Fenglin Liu, Chenyu Wang, Zhiqiang Tian, Shaoyi Du, Wei Zeng
Format:	Article
Language:	English
Published:	Springer 2024-12-01
Series:	Complex & Intelligent Systems
Subjects:	Behavior recognition Skeleton-based spatiotemporal graph convolutional network Multi-stream fusion Long-range dependencies
Online Access:	https://doi.org/10.1007/s40747-024-01743-2
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832571182750105600
author	Fenglin Liu Chenyu Wang Zhiqiang Tian Shaoyi Du Wei Zeng
author_facet	Fenglin Liu Chenyu Wang Zhiqiang Tian Shaoyi Du Wei Zeng
author_sort	Fenglin Liu
collection	DOAJ
description	Abstract In the realm of daily human interactions, a rich tapestry of behaviors and actions is observed, encompassing a wealth of informative cues. In the era of burgeoning big data, extensive repositories of images and videos have risen to prominence as the primary conduits for disseminating information. Grasping the intricacies of human behaviors depicted within these multimedia contexts has evolved into a pivotal quandary within the domain of computer vision. The technology of behavior recognition finds its practical application across domains such as human-computer interaction, intelligent surveillance, and anomaly detection, exhibiting a robust blend of pragmatic utility and scholarly significance. The present study introduces an innovative human body behavior recognition framework anchored in skeleton sequences and multi-stream fused spatiotemporal graph convolutional networks. Developed upon the foundation of graph convolutional networks, this method encompasses three pivotal refinements tailored to ameliorate extant challenges. First and foremost, in response to the complex task of capturing distant interdependencies among nodes within graph convolutional networks, we incorporate a spatial attention module. This module adeptly encapsulates long-term node interdependencies via precision-laden positional information, thus engendering interconnections that span diverse temporal and spatial contexts. Subsequently, to elevate the discernment of channel information within the network and to optimize the allocation of attention across distinct channels, we introduce a channel attention mechanism. This augmentation fortifies the discernment of motion-related features. Lastly, confronting the lacuna of information gaps prevalent within single-stream data, we deploy a multi-stream fusion methodology to fortify model outputs, ultimately fostering more precise prognostications concerning action classifications. Empirical results bear testament to the efficacy of the proposed multi-stream fused spatiotemporal graph convolutional network paradigm for skeleton-centric behavior recognition, evincing a pinnacle recognition accuracy of 96.0% on the expansive NTU-RGB+D skeleton dataset, alongside a zenithal accuracy of 37.3% on the Kinetics-Skeleton dataset—emanating from RGB data and furthered through pose estimation.
format	Article
id	doaj-art-91cc9da8c3af413b908f2ebfe8cd11d3
institution	Kabale University
issn	2199-4536 2198-6053
language	English
publishDate	2024-12-01
publisher	Springer
record_format	Article
series	Complex & Intelligent Systems
spelling	doaj-art-91cc9da8c3af413b908f2ebfe8cd11d32025-02-02T12:48:58ZengSpringerComplex & Intelligent Systems2199-45362198-60532024-12-0111112110.1007/s40747-024-01743-2Advancing skeleton-based human behavior recognition: multi-stream fusion spatiotemporal graph convolutional networksFenglin Liu0Chenyu Wang1Zhiqiang Tian2Shaoyi Du3Wei Zeng4School of Physics and Mechanical and Electrical Engineering, Longyan UniversitySchool of Software Engineering, Xi’an Jiaotong UniversitySchool of Software Engineering, Xi’an Jiaotong UniversityInstitute of Artificial Intelligence and Robotics, Xi’an Jiaotong UniversitySchool of Physics and Mechanical and Electrical Engineering, Longyan UniversityAbstract In the realm of daily human interactions, a rich tapestry of behaviors and actions is observed, encompassing a wealth of informative cues. In the era of burgeoning big data, extensive repositories of images and videos have risen to prominence as the primary conduits for disseminating information. Grasping the intricacies of human behaviors depicted within these multimedia contexts has evolved into a pivotal quandary within the domain of computer vision. The technology of behavior recognition finds its practical application across domains such as human-computer interaction, intelligent surveillance, and anomaly detection, exhibiting a robust blend of pragmatic utility and scholarly significance. The present study introduces an innovative human body behavior recognition framework anchored in skeleton sequences and multi-stream fused spatiotemporal graph convolutional networks. Developed upon the foundation of graph convolutional networks, this method encompasses three pivotal refinements tailored to ameliorate extant challenges. First and foremost, in response to the complex task of capturing distant interdependencies among nodes within graph convolutional networks, we incorporate a spatial attention module. This module adeptly encapsulates long-term node interdependencies via precision-laden positional information, thus engendering interconnections that span diverse temporal and spatial contexts. Subsequently, to elevate the discernment of channel information within the network and to optimize the allocation of attention across distinct channels, we introduce a channel attention mechanism. This augmentation fortifies the discernment of motion-related features. Lastly, confronting the lacuna of information gaps prevalent within single-stream data, we deploy a multi-stream fusion methodology to fortify model outputs, ultimately fostering more precise prognostications concerning action classifications. Empirical results bear testament to the efficacy of the proposed multi-stream fused spatiotemporal graph convolutional network paradigm for skeleton-centric behavior recognition, evincing a pinnacle recognition accuracy of 96.0% on the expansive NTU-RGB+D skeleton dataset, alongside a zenithal accuracy of 37.3% on the Kinetics-Skeleton dataset—emanating from RGB data and furthered through pose estimation.https://doi.org/10.1007/s40747-024-01743-2Behavior recognitionSkeleton-based spatiotemporal graph convolutional networkMulti-stream fusionLong-range dependencies
spellingShingle	Fenglin Liu Chenyu Wang Zhiqiang Tian Shaoyi Du Wei Zeng Advancing skeleton-based human behavior recognition: multi-stream fusion spatiotemporal graph convolutional networks Complex & Intelligent Systems Behavior recognition Skeleton-based spatiotemporal graph convolutional network Multi-stream fusion Long-range dependencies
title	Advancing skeleton-based human behavior recognition: multi-stream fusion spatiotemporal graph convolutional networks
title_full	Advancing skeleton-based human behavior recognition: multi-stream fusion spatiotemporal graph convolutional networks
title_fullStr	Advancing skeleton-based human behavior recognition: multi-stream fusion spatiotemporal graph convolutional networks
title_full_unstemmed	Advancing skeleton-based human behavior recognition: multi-stream fusion spatiotemporal graph convolutional networks
title_short	Advancing skeleton-based human behavior recognition: multi-stream fusion spatiotemporal graph convolutional networks
title_sort	advancing skeleton based human behavior recognition multi stream fusion spatiotemporal graph convolutional networks
topic	Behavior recognition Skeleton-based spatiotemporal graph convolutional network Multi-stream fusion Long-range dependencies
url	https://doi.org/10.1007/s40747-024-01743-2
work_keys_str_mv	AT fenglinliu advancingskeletonbasedhumanbehaviorrecognitionmultistreamfusionspatiotemporalgraphconvolutionalnetworks AT chenyuwang advancingskeletonbasedhumanbehaviorrecognitionmultistreamfusionspatiotemporalgraphconvolutionalnetworks AT zhiqiangtian advancingskeletonbasedhumanbehaviorrecognitionmultistreamfusionspatiotemporalgraphconvolutionalnetworks AT shaoyidu advancingskeletonbasedhumanbehaviorrecognitionmultistreamfusionspatiotemporalgraphconvolutionalnetworks AT weizeng advancingskeletonbasedhumanbehaviorrecognitionmultistreamfusionspatiotemporalgraphconvolutionalnetworks

Advancing skeleton-based human behavior recognition: multi-stream fusion spatiotemporal graph convolutional networks

Similar Items