Two-Stream Proximity Graph Transformer for Skeletal Person-Person Interaction Recognition With Statistical Information
Recognizing person-person interactions is practically significant and this type of interactive recognition is applied in many fields, such as video understanding and video surveillance. Compared with RGB data, skeletal data can more accurately depict articulated human movements due to its detailed r...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10795464/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850120933974474752 |
|---|---|
| author | Meng Li Yaqi Wu Qiumei Sun Weifeng Yang |
| author_facet | Meng Li Yaqi Wu Qiumei Sun Weifeng Yang |
| author_sort | Meng Li |
| collection | DOAJ |
| description | Recognizing person-person interactions is practically significant and this type of interactive recognition is applied in many fields, such as video understanding and video surveillance. Compared with RGB data, skeletal data can more accurately depict articulated human movements due to its detailed recording of joint locations. With the recent success of Transformer in computer vision, numerous scholars have begun to apply Transformer to recognize person-person interaction. However, these Transformer-based models do not fully take into account the dynamic spatiotemporal relationship between interacting people, which remains a challenge. To handle this challenge, we propose a novel Transformer-based model called Two-Stream Proximity Graph Transformer (2s-PGT) to recognize skeletal person-person interaction. Specifically, we first design three types of proximity graphs based on skeletal data to encode the dynamic proximity relationship between interacting people, including frame-based, sample-based and type-based proximity graphs. Secondly, we embed proximity graphs into our Transformer-based model to jointly learn the relationship between interacting people from spatiotemporal and semantic perspectives. We thirdly investigate a two-stream framework to integrate the information of interactive joints and interactive bones together to improve the accuracy of interaction recognition. Experimental results on the three public datasets, the SBU dataset (99.07%), the NTU-RGB+D dataset (Cross-Subject (95.72%), Cross-View (97.87%)) and the NTU-RGB+D120 dataset (Cross-Subject (92.01%), Cross-View (91.65%)), demonstrate that our approach outperforms the state-of-the-art methods. |
| format | Article |
| id | doaj-art-4b01a8a1a0794dd7bd627f4e8bf73052 |
| institution | OA Journals |
| issn | 2169-3536 |
| language | English |
| publishDate | 2024-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-4b01a8a1a0794dd7bd627f4e8bf730522025-08-20T02:35:15ZengIEEEIEEE Access2169-35362024-01-011219309119310010.1109/ACCESS.2024.351651110795464Two-Stream Proximity Graph Transformer for Skeletal Person-Person Interaction Recognition With Statistical InformationMeng Li0https://orcid.org/0000-0003-3497-4391Yaqi Wu1https://orcid.org/0009-0009-1126-366XQiumei Sun2Weifeng Yang3College of Mathematics and Statistic, Hebei University of Economics and Business, Shijiazhuang, Hebei, ChinaCollege of Mathematics and Statistic, Hebei University of Economics and Business, Shijiazhuang, Hebei, ChinaYiban Development Center, Hebei University of Economics and Business, Shijiazhuang, Hebei, ChinaVipshop, Shanghai, ChinaRecognizing person-person interactions is practically significant and this type of interactive recognition is applied in many fields, such as video understanding and video surveillance. Compared with RGB data, skeletal data can more accurately depict articulated human movements due to its detailed recording of joint locations. With the recent success of Transformer in computer vision, numerous scholars have begun to apply Transformer to recognize person-person interaction. However, these Transformer-based models do not fully take into account the dynamic spatiotemporal relationship between interacting people, which remains a challenge. To handle this challenge, we propose a novel Transformer-based model called Two-Stream Proximity Graph Transformer (2s-PGT) to recognize skeletal person-person interaction. Specifically, we first design three types of proximity graphs based on skeletal data to encode the dynamic proximity relationship between interacting people, including frame-based, sample-based and type-based proximity graphs. Secondly, we embed proximity graphs into our Transformer-based model to jointly learn the relationship between interacting people from spatiotemporal and semantic perspectives. We thirdly investigate a two-stream framework to integrate the information of interactive joints and interactive bones together to improve the accuracy of interaction recognition. Experimental results on the three public datasets, the SBU dataset (99.07%), the NTU-RGB+D dataset (Cross-Subject (95.72%), Cross-View (97.87%)) and the NTU-RGB+D120 dataset (Cross-Subject (92.01%), Cross-View (91.65%)), demonstrate that our approach outperforms the state-of-the-art methods.https://ieeexplore.ieee.org/document/10795464/Interaction recognitionproximity graphstransformertwo-stream networks |
| spellingShingle | Meng Li Yaqi Wu Qiumei Sun Weifeng Yang Two-Stream Proximity Graph Transformer for Skeletal Person-Person Interaction Recognition With Statistical Information IEEE Access Interaction recognition proximity graphs transformer two-stream networks |
| title | Two-Stream Proximity Graph Transformer for Skeletal Person-Person Interaction Recognition With Statistical Information |
| title_full | Two-Stream Proximity Graph Transformer for Skeletal Person-Person Interaction Recognition With Statistical Information |
| title_fullStr | Two-Stream Proximity Graph Transformer for Skeletal Person-Person Interaction Recognition With Statistical Information |
| title_full_unstemmed | Two-Stream Proximity Graph Transformer for Skeletal Person-Person Interaction Recognition With Statistical Information |
| title_short | Two-Stream Proximity Graph Transformer for Skeletal Person-Person Interaction Recognition With Statistical Information |
| title_sort | two stream proximity graph transformer for skeletal person person interaction recognition with statistical information |
| topic | Interaction recognition proximity graphs transformer two-stream networks |
| url | https://ieeexplore.ieee.org/document/10795464/ |
| work_keys_str_mv | AT mengli twostreamproximitygraphtransformerforskeletalpersonpersoninteractionrecognitionwithstatisticalinformation AT yaqiwu twostreamproximitygraphtransformerforskeletalpersonpersoninteractionrecognitionwithstatisticalinformation AT qiumeisun twostreamproximitygraphtransformerforskeletalpersonpersoninteractionrecognitionwithstatisticalinformation AT weifengyang twostreamproximitygraphtransformerforskeletalpersonpersoninteractionrecognitionwithstatisticalinformation |