Lightweight Multiscale Spatio-Temporal Graph Convolutional Network for Skeleton-Based Action Recognition
Using skeletal information to model and recognize human actions is currently a hot research subject in the realm of Human Action Recognition (HAR). Graph Convolutional Networks (GCN) have gained popularity in this discipline due to their capacity to efficiently process graph-structured data. However...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Tsinghua University Press
2025-04-01
|
| Series: | Big Data Mining and Analytics |
| Subjects: | |
| Online Access: | https://www.sciopen.com/article/10.26599/BDMA.2024.9020095 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850037869127663616 |
|---|---|
| author | Zhiyun Zheng Qilong Yuan Huaizhu Zhang Yizhou Wang Junfeng Wang |
| author_facet | Zhiyun Zheng Qilong Yuan Huaizhu Zhang Yizhou Wang Junfeng Wang |
| author_sort | Zhiyun Zheng |
| collection | DOAJ |
| description | Using skeletal information to model and recognize human actions is currently a hot research subject in the realm of Human Action Recognition (HAR). Graph Convolutional Networks (GCN) have gained popularity in this discipline due to their capacity to efficiently process graph-structured data. However, it is challenging for current models to handle distant dependencies that commonly exist between human skeleton nodes, which hinders the development of algorithms in related fields. To solve these problems, the Lightweight Multiscale Spatio-Temporal Graph Convolutional Network (LMSTGCN) is proposed. Firstly, the Lightweight Multiscale Spatial Graph Convolutional Network (LMSGCN) is constructed to capture the information in various hierarchies, and multiple inner connections between skeleton joints are captured by dividing the input features into a number of subsets along the channel direction. Secondly, the dilated convolution is incorporated into the temporal convolution to construct Lightweight Multiscale Temporal Convolutional Network (LMTCN), which allows to obtain a wider receptive field while keeping the size of the convolution kernel unchanged. Thirdly, the Spatio-Temporal Location Attention (STLAtt) module is used to identify the most informative joints in the sequence of skeletal information at a specific frame, hence improving the model’s ability to extract features and recognize actions. Finally, multi-stream data fusion input structure is used to enhance the input data and expand the feature information. Experiments on three public datasets illustrate the effectiveness of the proposed network. |
| format | Article |
| id | doaj-art-f8a4eb0e6df34a72904c3f68ed2cdca3 |
| institution | DOAJ |
| issn | 2096-0654 2097-406X |
| language | English |
| publishDate | 2025-04-01 |
| publisher | Tsinghua University Press |
| record_format | Article |
| series | Big Data Mining and Analytics |
| spelling | doaj-art-f8a4eb0e6df34a72904c3f68ed2cdca32025-08-20T02:56:44ZengTsinghua University PressBig Data Mining and Analytics2096-06542097-406X2025-04-018231032510.26599/BDMA.2024.9020095Lightweight Multiscale Spatio-Temporal Graph Convolutional Network for Skeleton-Based Action RecognitionZhiyun Zheng0Qilong Yuan1Huaizhu Zhang2Yizhou Wang3Junfeng Wang4Lab of Cloud Computing and Big Data Processing, School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, ChinaLab of Cloud Computing and Big Data Processing, School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, ChinaLab of Cloud Computing and Big Data Processing, School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, ChinaLab of Cloud Computing and Big Data Processing, School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, ChinaLab of Cloud Computing and Big Data Processing, School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, ChinaUsing skeletal information to model and recognize human actions is currently a hot research subject in the realm of Human Action Recognition (HAR). Graph Convolutional Networks (GCN) have gained popularity in this discipline due to their capacity to efficiently process graph-structured data. However, it is challenging for current models to handle distant dependencies that commonly exist between human skeleton nodes, which hinders the development of algorithms in related fields. To solve these problems, the Lightweight Multiscale Spatio-Temporal Graph Convolutional Network (LMSTGCN) is proposed. Firstly, the Lightweight Multiscale Spatial Graph Convolutional Network (LMSGCN) is constructed to capture the information in various hierarchies, and multiple inner connections between skeleton joints are captured by dividing the input features into a number of subsets along the channel direction. Secondly, the dilated convolution is incorporated into the temporal convolution to construct Lightweight Multiscale Temporal Convolutional Network (LMTCN), which allows to obtain a wider receptive field while keeping the size of the convolution kernel unchanged. Thirdly, the Spatio-Temporal Location Attention (STLAtt) module is used to identify the most informative joints in the sequence of skeletal information at a specific frame, hence improving the model’s ability to extract features and recognize actions. Finally, multi-stream data fusion input structure is used to enhance the input data and expand the feature information. Experiments on three public datasets illustrate the effectiveness of the proposed network.https://www.sciopen.com/article/10.26599/BDMA.2024.9020095human action recognition (har)skeleton datagraph convolutional network (gcn)attention mechanism |
| spellingShingle | Zhiyun Zheng Qilong Yuan Huaizhu Zhang Yizhou Wang Junfeng Wang Lightweight Multiscale Spatio-Temporal Graph Convolutional Network for Skeleton-Based Action Recognition Big Data Mining and Analytics human action recognition (har) skeleton data graph convolutional network (gcn) attention mechanism |
| title | Lightweight Multiscale Spatio-Temporal Graph Convolutional Network for Skeleton-Based Action Recognition |
| title_full | Lightweight Multiscale Spatio-Temporal Graph Convolutional Network for Skeleton-Based Action Recognition |
| title_fullStr | Lightweight Multiscale Spatio-Temporal Graph Convolutional Network for Skeleton-Based Action Recognition |
| title_full_unstemmed | Lightweight Multiscale Spatio-Temporal Graph Convolutional Network for Skeleton-Based Action Recognition |
| title_short | Lightweight Multiscale Spatio-Temporal Graph Convolutional Network for Skeleton-Based Action Recognition |
| title_sort | lightweight multiscale spatio temporal graph convolutional network for skeleton based action recognition |
| topic | human action recognition (har) skeleton data graph convolutional network (gcn) attention mechanism |
| url | https://www.sciopen.com/article/10.26599/BDMA.2024.9020095 |
| work_keys_str_mv | AT zhiyunzheng lightweightmultiscalespatiotemporalgraphconvolutionalnetworkforskeletonbasedactionrecognition AT qilongyuan lightweightmultiscalespatiotemporalgraphconvolutionalnetworkforskeletonbasedactionrecognition AT huaizhuzhang lightweightmultiscalespatiotemporalgraphconvolutionalnetworkforskeletonbasedactionrecognition AT yizhouwang lightweightmultiscalespatiotemporalgraphconvolutionalnetworkforskeletonbasedactionrecognition AT junfengwang lightweightmultiscalespatiotemporalgraphconvolutionalnetworkforskeletonbasedactionrecognition |