Expert Comment Generation Considering Sports Skill Level Using a Large Multimodal Model with Video and Spatial-Temporal Motion Features
In sports training, personalized skill assessment and feedback are crucial for athletes to master complex movements and improve performance. However, existing research on skill transfer predominantly focuses on skill evaluation through video analysis, addressing only a single facet of the multifacet...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-01-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/25/2/447 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832587502563622912 |
---|---|
author | Tatsuki Seino Naoki Saito Takahiro Ogawa Satoshi Asamizu Miki Haseyama |
author_facet | Tatsuki Seino Naoki Saito Takahiro Ogawa Satoshi Asamizu Miki Haseyama |
author_sort | Tatsuki Seino |
collection | DOAJ |
description | In sports training, personalized skill assessment and feedback are crucial for athletes to master complex movements and improve performance. However, existing research on skill transfer predominantly focuses on skill evaluation through video analysis, addressing only a single facet of the multifaceted process required for skill acquisition. Furthermore, in the limited studies that generate expert comments, the learner’s skill level is predetermined, and the spatial-temporal information of human movement is often overlooked. To address this issue, we propose a novel approach to generate skill-level-aware expert comments by leveraging a Large Multimodal Model (LMM) and spatial-temporal motion features. Our method employs a Spatial-Temporal Attention Graph Convolutional Network (STA-GCN) to extract motion features that encapsulate the spatial-temporal dynamics of human movement. The STA-GCN classifies skill levels based on these motion features. The classified skill levels, along with the extracted motion features (intermediate features from the STA-GCN) and the original sports video, are then fed into the LMM. This integration enables the generation of detailed, context-specific expert comments that offer actionable insights for performance improvement. Our contributions are twofold: (1) We incorporate skill level classification results as inputs to the LMM, ensuring that feedback is appropriately tailored to the learner’s skill level; and (2) We integrate motion features that capture spatial-temporal information into the LMM, enhancing its ability to generate feedback based on the learner’s specific actions. Experimental results demonstrate that the proposed method effectively generates expert comments, overcoming the limitations of existing methods and offering valuable guidance for athletes across various skill levels. |
format | Article |
id | doaj-art-7598a78ea8724ab0ae02d494979355ea |
institution | Kabale University |
issn | 1424-8220 |
language | English |
publishDate | 2025-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Sensors |
spelling | doaj-art-7598a78ea8724ab0ae02d494979355ea2025-01-24T13:48:58ZengMDPI AGSensors1424-82202025-01-0125244710.3390/s25020447Expert Comment Generation Considering Sports Skill Level Using a Large Multimodal Model with Video and Spatial-Temporal Motion FeaturesTatsuki Seino0Naoki Saito1Takahiro Ogawa2Satoshi Asamizu3Miki Haseyama4Graduate School of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, JapanOffice of Institutional Research, Hokkaido University, N-8, W-5, Kita-ku, Sapporo 060-0808, Hokkaido, JapanFaculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, JapanNational Institute of Technology, Kushiro College, 2 Chome-32-1 Otanoshikenishi, Kushiro 084-0916, Hokkaido, JapanFaculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, JapanIn sports training, personalized skill assessment and feedback are crucial for athletes to master complex movements and improve performance. However, existing research on skill transfer predominantly focuses on skill evaluation through video analysis, addressing only a single facet of the multifaceted process required for skill acquisition. Furthermore, in the limited studies that generate expert comments, the learner’s skill level is predetermined, and the spatial-temporal information of human movement is often overlooked. To address this issue, we propose a novel approach to generate skill-level-aware expert comments by leveraging a Large Multimodal Model (LMM) and spatial-temporal motion features. Our method employs a Spatial-Temporal Attention Graph Convolutional Network (STA-GCN) to extract motion features that encapsulate the spatial-temporal dynamics of human movement. The STA-GCN classifies skill levels based on these motion features. The classified skill levels, along with the extracted motion features (intermediate features from the STA-GCN) and the original sports video, are then fed into the LMM. This integration enables the generation of detailed, context-specific expert comments that offer actionable insights for performance improvement. Our contributions are twofold: (1) We incorporate skill level classification results as inputs to the LMM, ensuring that feedback is appropriately tailored to the learner’s skill level; and (2) We integrate motion features that capture spatial-temporal information into the LMM, enhancing its ability to generate feedback based on the learner’s specific actions. Experimental results demonstrate that the proposed method effectively generates expert comments, overcoming the limitations of existing methods and offering valuable guidance for athletes across various skill levels.https://www.mdpi.com/1424-8220/25/2/447expert comment generationsports skill levelspatial-temporal attention graph convolutional networklarge multimodal model |
spellingShingle | Tatsuki Seino Naoki Saito Takahiro Ogawa Satoshi Asamizu Miki Haseyama Expert Comment Generation Considering Sports Skill Level Using a Large Multimodal Model with Video and Spatial-Temporal Motion Features Sensors expert comment generation sports skill level spatial-temporal attention graph convolutional network large multimodal model |
title | Expert Comment Generation Considering Sports Skill Level Using a Large Multimodal Model with Video and Spatial-Temporal Motion Features |
title_full | Expert Comment Generation Considering Sports Skill Level Using a Large Multimodal Model with Video and Spatial-Temporal Motion Features |
title_fullStr | Expert Comment Generation Considering Sports Skill Level Using a Large Multimodal Model with Video and Spatial-Temporal Motion Features |
title_full_unstemmed | Expert Comment Generation Considering Sports Skill Level Using a Large Multimodal Model with Video and Spatial-Temporal Motion Features |
title_short | Expert Comment Generation Considering Sports Skill Level Using a Large Multimodal Model with Video and Spatial-Temporal Motion Features |
title_sort | expert comment generation considering sports skill level using a large multimodal model with video and spatial temporal motion features |
topic | expert comment generation sports skill level spatial-temporal attention graph convolutional network large multimodal model |
url | https://www.mdpi.com/1424-8220/25/2/447 |
work_keys_str_mv | AT tatsukiseino expertcommentgenerationconsideringsportsskilllevelusingalargemultimodalmodelwithvideoandspatialtemporalmotionfeatures AT naokisaito expertcommentgenerationconsideringsportsskilllevelusingalargemultimodalmodelwithvideoandspatialtemporalmotionfeatures AT takahiroogawa expertcommentgenerationconsideringsportsskilllevelusingalargemultimodalmodelwithvideoandspatialtemporalmotionfeatures AT satoshiasamizu expertcommentgenerationconsideringsportsskilllevelusingalargemultimodalmodelwithvideoandspatialtemporalmotionfeatures AT mikihaseyama expertcommentgenerationconsideringsportsskilllevelusingalargemultimodalmodelwithvideoandspatialtemporalmotionfeatures |