Two-Stream Proximity Graph Transformer for Skeletal Person-Person Interaction Recognition With Statistical Information

Recognizing person-person interactions is practically significant and this type of interactive recognition is applied in many fields, such as video understanding and video surveillance. Compared with RGB data, skeletal data can more accurately depict articulated human movements due to its detailed r...

Full description

Saved in:
Bibliographic Details
Main Authors: Meng Li, Yaqi Wu, Qiumei Sun, Weifeng Yang
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10795464/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850120933974474752
author Meng Li
Yaqi Wu
Qiumei Sun
Weifeng Yang
author_facet Meng Li
Yaqi Wu
Qiumei Sun
Weifeng Yang
author_sort Meng Li
collection DOAJ
description Recognizing person-person interactions is practically significant and this type of interactive recognition is applied in many fields, such as video understanding and video surveillance. Compared with RGB data, skeletal data can more accurately depict articulated human movements due to its detailed recording of joint locations. With the recent success of Transformer in computer vision, numerous scholars have begun to apply Transformer to recognize person-person interaction. However, these Transformer-based models do not fully take into account the dynamic spatiotemporal relationship between interacting people, which remains a challenge. To handle this challenge, we propose a novel Transformer-based model called Two-Stream Proximity Graph Transformer (2s-PGT) to recognize skeletal person-person interaction. Specifically, we first design three types of proximity graphs based on skeletal data to encode the dynamic proximity relationship between interacting people, including frame-based, sample-based and type-based proximity graphs. Secondly, we embed proximity graphs into our Transformer-based model to jointly learn the relationship between interacting people from spatiotemporal and semantic perspectives. We thirdly investigate a two-stream framework to integrate the information of interactive joints and interactive bones together to improve the accuracy of interaction recognition. Experimental results on the three public datasets, the SBU dataset (99.07%), the NTU-RGB+D dataset (Cross-Subject (95.72%), Cross-View (97.87%)) and the NTU-RGB+D120 dataset (Cross-Subject (92.01%), Cross-View (91.65%)), demonstrate that our approach outperforms the state-of-the-art methods.
format Article
id doaj-art-4b01a8a1a0794dd7bd627f4e8bf73052
institution OA Journals
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-4b01a8a1a0794dd7bd627f4e8bf730522025-08-20T02:35:15ZengIEEEIEEE Access2169-35362024-01-011219309119310010.1109/ACCESS.2024.351651110795464Two-Stream Proximity Graph Transformer for Skeletal Person-Person Interaction Recognition With Statistical InformationMeng Li0https://orcid.org/0000-0003-3497-4391Yaqi Wu1https://orcid.org/0009-0009-1126-366XQiumei Sun2Weifeng Yang3College of Mathematics and Statistic, Hebei University of Economics and Business, Shijiazhuang, Hebei, ChinaCollege of Mathematics and Statistic, Hebei University of Economics and Business, Shijiazhuang, Hebei, ChinaYiban Development Center, Hebei University of Economics and Business, Shijiazhuang, Hebei, ChinaVipshop, Shanghai, ChinaRecognizing person-person interactions is practically significant and this type of interactive recognition is applied in many fields, such as video understanding and video surveillance. Compared with RGB data, skeletal data can more accurately depict articulated human movements due to its detailed recording of joint locations. With the recent success of Transformer in computer vision, numerous scholars have begun to apply Transformer to recognize person-person interaction. However, these Transformer-based models do not fully take into account the dynamic spatiotemporal relationship between interacting people, which remains a challenge. To handle this challenge, we propose a novel Transformer-based model called Two-Stream Proximity Graph Transformer (2s-PGT) to recognize skeletal person-person interaction. Specifically, we first design three types of proximity graphs based on skeletal data to encode the dynamic proximity relationship between interacting people, including frame-based, sample-based and type-based proximity graphs. Secondly, we embed proximity graphs into our Transformer-based model to jointly learn the relationship between interacting people from spatiotemporal and semantic perspectives. We thirdly investigate a two-stream framework to integrate the information of interactive joints and interactive bones together to improve the accuracy of interaction recognition. Experimental results on the three public datasets, the SBU dataset (99.07%), the NTU-RGB+D dataset (Cross-Subject (95.72%), Cross-View (97.87%)) and the NTU-RGB+D120 dataset (Cross-Subject (92.01%), Cross-View (91.65%)), demonstrate that our approach outperforms the state-of-the-art methods.https://ieeexplore.ieee.org/document/10795464/Interaction recognitionproximity graphstransformertwo-stream networks
spellingShingle Meng Li
Yaqi Wu
Qiumei Sun
Weifeng Yang
Two-Stream Proximity Graph Transformer for Skeletal Person-Person Interaction Recognition With Statistical Information
IEEE Access
Interaction recognition
proximity graphs
transformer
two-stream networks
title Two-Stream Proximity Graph Transformer for Skeletal Person-Person Interaction Recognition With Statistical Information
title_full Two-Stream Proximity Graph Transformer for Skeletal Person-Person Interaction Recognition With Statistical Information
title_fullStr Two-Stream Proximity Graph Transformer for Skeletal Person-Person Interaction Recognition With Statistical Information
title_full_unstemmed Two-Stream Proximity Graph Transformer for Skeletal Person-Person Interaction Recognition With Statistical Information
title_short Two-Stream Proximity Graph Transformer for Skeletal Person-Person Interaction Recognition With Statistical Information
title_sort two stream proximity graph transformer for skeletal person person interaction recognition with statistical information
topic Interaction recognition
proximity graphs
transformer
two-stream networks
url https://ieeexplore.ieee.org/document/10795464/
work_keys_str_mv AT mengli twostreamproximitygraphtransformerforskeletalpersonpersoninteractionrecognitionwithstatisticalinformation
AT yaqiwu twostreamproximitygraphtransformerforskeletalpersonpersoninteractionrecognitionwithstatisticalinformation
AT qiumeisun twostreamproximitygraphtransformerforskeletalpersonpersoninteractionrecognitionwithstatisticalinformation
AT weifengyang twostreamproximitygraphtransformerforskeletalpersonpersoninteractionrecognitionwithstatisticalinformation