Lightweight Multiscale Spatio-Temporal Graph Convolutional Network for Skeleton-Based Action Recognition

Using skeletal information to model and recognize human actions is currently a hot research subject in the realm of Human Action Recognition (HAR). Graph Convolutional Networks (GCN) have gained popularity in this discipline due to their capacity to efficiently process graph-structured data. However...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhiyun Zheng, Qilong Yuan, Huaizhu Zhang, Yizhou Wang, Junfeng Wang
Format: Article
Language:English
Published: Tsinghua University Press 2025-04-01
Series:Big Data Mining and Analytics
Subjects:
Online Access:https://www.sciopen.com/article/10.26599/BDMA.2024.9020095
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850037869127663616
author Zhiyun Zheng
Qilong Yuan
Huaizhu Zhang
Yizhou Wang
Junfeng Wang
author_facet Zhiyun Zheng
Qilong Yuan
Huaizhu Zhang
Yizhou Wang
Junfeng Wang
author_sort Zhiyun Zheng
collection DOAJ
description Using skeletal information to model and recognize human actions is currently a hot research subject in the realm of Human Action Recognition (HAR). Graph Convolutional Networks (GCN) have gained popularity in this discipline due to their capacity to efficiently process graph-structured data. However, it is challenging for current models to handle distant dependencies that commonly exist between human skeleton nodes, which hinders the development of algorithms in related fields. To solve these problems, the Lightweight Multiscale Spatio-Temporal Graph Convolutional Network (LMSTGCN) is proposed. Firstly, the Lightweight Multiscale Spatial Graph Convolutional Network (LMSGCN) is constructed to capture the information in various hierarchies, and multiple inner connections between skeleton joints are captured by dividing the input features into a number of subsets along the channel direction. Secondly, the dilated convolution is incorporated into the temporal convolution to construct Lightweight Multiscale Temporal Convolutional Network (LMTCN), which allows to obtain a wider receptive field while keeping the size of the convolution kernel unchanged. Thirdly, the Spatio-Temporal Location Attention (STLAtt) module is used to identify the most informative joints in the sequence of skeletal information at a specific frame, hence improving the model’s ability to extract features and recognize actions. Finally, multi-stream data fusion input structure is used to enhance the input data and expand the feature information. Experiments on three public datasets illustrate the effectiveness of the proposed network.
format Article
id doaj-art-f8a4eb0e6df34a72904c3f68ed2cdca3
institution DOAJ
issn 2096-0654
2097-406X
language English
publishDate 2025-04-01
publisher Tsinghua University Press
record_format Article
series Big Data Mining and Analytics
spelling doaj-art-f8a4eb0e6df34a72904c3f68ed2cdca32025-08-20T02:56:44ZengTsinghua University PressBig Data Mining and Analytics2096-06542097-406X2025-04-018231032510.26599/BDMA.2024.9020095Lightweight Multiscale Spatio-Temporal Graph Convolutional Network for Skeleton-Based Action RecognitionZhiyun Zheng0Qilong Yuan1Huaizhu Zhang2Yizhou Wang3Junfeng Wang4Lab of Cloud Computing and Big Data Processing, School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, ChinaLab of Cloud Computing and Big Data Processing, School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, ChinaLab of Cloud Computing and Big Data Processing, School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, ChinaLab of Cloud Computing and Big Data Processing, School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, ChinaLab of Cloud Computing and Big Data Processing, School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, ChinaUsing skeletal information to model and recognize human actions is currently a hot research subject in the realm of Human Action Recognition (HAR). Graph Convolutional Networks (GCN) have gained popularity in this discipline due to their capacity to efficiently process graph-structured data. However, it is challenging for current models to handle distant dependencies that commonly exist between human skeleton nodes, which hinders the development of algorithms in related fields. To solve these problems, the Lightweight Multiscale Spatio-Temporal Graph Convolutional Network (LMSTGCN) is proposed. Firstly, the Lightweight Multiscale Spatial Graph Convolutional Network (LMSGCN) is constructed to capture the information in various hierarchies, and multiple inner connections between skeleton joints are captured by dividing the input features into a number of subsets along the channel direction. Secondly, the dilated convolution is incorporated into the temporal convolution to construct Lightweight Multiscale Temporal Convolutional Network (LMTCN), which allows to obtain a wider receptive field while keeping the size of the convolution kernel unchanged. Thirdly, the Spatio-Temporal Location Attention (STLAtt) module is used to identify the most informative joints in the sequence of skeletal information at a specific frame, hence improving the model’s ability to extract features and recognize actions. Finally, multi-stream data fusion input structure is used to enhance the input data and expand the feature information. Experiments on three public datasets illustrate the effectiveness of the proposed network.https://www.sciopen.com/article/10.26599/BDMA.2024.9020095human action recognition (har)skeleton datagraph convolutional network (gcn)attention mechanism
spellingShingle Zhiyun Zheng
Qilong Yuan
Huaizhu Zhang
Yizhou Wang
Junfeng Wang
Lightweight Multiscale Spatio-Temporal Graph Convolutional Network for Skeleton-Based Action Recognition
Big Data Mining and Analytics
human action recognition (har)
skeleton data
graph convolutional network (gcn)
attention mechanism
title Lightweight Multiscale Spatio-Temporal Graph Convolutional Network for Skeleton-Based Action Recognition
title_full Lightweight Multiscale Spatio-Temporal Graph Convolutional Network for Skeleton-Based Action Recognition
title_fullStr Lightweight Multiscale Spatio-Temporal Graph Convolutional Network for Skeleton-Based Action Recognition
title_full_unstemmed Lightweight Multiscale Spatio-Temporal Graph Convolutional Network for Skeleton-Based Action Recognition
title_short Lightweight Multiscale Spatio-Temporal Graph Convolutional Network for Skeleton-Based Action Recognition
title_sort lightweight multiscale spatio temporal graph convolutional network for skeleton based action recognition
topic human action recognition (har)
skeleton data
graph convolutional network (gcn)
attention mechanism
url https://www.sciopen.com/article/10.26599/BDMA.2024.9020095
work_keys_str_mv AT zhiyunzheng lightweightmultiscalespatiotemporalgraphconvolutionalnetworkforskeletonbasedactionrecognition
AT qilongyuan lightweightmultiscalespatiotemporalgraphconvolutionalnetworkforskeletonbasedactionrecognition
AT huaizhuzhang lightweightmultiscalespatiotemporalgraphconvolutionalnetworkforskeletonbasedactionrecognition
AT yizhouwang lightweightmultiscalespatiotemporalgraphconvolutionalnetworkforskeletonbasedactionrecognition
AT junfengwang lightweightmultiscalespatiotemporalgraphconvolutionalnetworkforskeletonbasedactionrecognition