Expert Comment Generation Considering Sports Skill Level Using a Large Multimodal Model with Video and Spatial-Temporal Motion Features

In sports training, personalized skill assessment and feedback are crucial for athletes to master complex movements and improve performance. However, existing research on skill transfer predominantly focuses on skill evaluation through video analysis, addressing only a single facet of the multifacet...

Full description

Saved in:

Bibliographic Details
Main Authors:	Tatsuki Seino, Naoki Saito, Takahiro Ogawa, Satoshi Asamizu, Miki Haseyama
Format:	Article
Language:	English
Published:	MDPI AG 2025-01-01
Series:	Sensors
Subjects:	expert comment generation sports skill level spatial-temporal attention graph convolutional network large multimodal model
Online Access:	https://www.mdpi.com/1424-8220/25/2/447
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832587502563622912
author	Tatsuki Seino Naoki Saito Takahiro Ogawa Satoshi Asamizu Miki Haseyama
author_facet	Tatsuki Seino Naoki Saito Takahiro Ogawa Satoshi Asamizu Miki Haseyama
author_sort	Tatsuki Seino
collection	DOAJ
description	In sports training, personalized skill assessment and feedback are crucial for athletes to master complex movements and improve performance. However, existing research on skill transfer predominantly focuses on skill evaluation through video analysis, addressing only a single facet of the multifaceted process required for skill acquisition. Furthermore, in the limited studies that generate expert comments, the learner’s skill level is predetermined, and the spatial-temporal information of human movement is often overlooked. To address this issue, we propose a novel approach to generate skill-level-aware expert comments by leveraging a Large Multimodal Model (LMM) and spatial-temporal motion features. Our method employs a Spatial-Temporal Attention Graph Convolutional Network (STA-GCN) to extract motion features that encapsulate the spatial-temporal dynamics of human movement. The STA-GCN classifies skill levels based on these motion features. The classified skill levels, along with the extracted motion features (intermediate features from the STA-GCN) and the original sports video, are then fed into the LMM. This integration enables the generation of detailed, context-specific expert comments that offer actionable insights for performance improvement. Our contributions are twofold: (1) We incorporate skill level classification results as inputs to the LMM, ensuring that feedback is appropriately tailored to the learner’s skill level; and (2) We integrate motion features that capture spatial-temporal information into the LMM, enhancing its ability to generate feedback based on the learner’s specific actions. Experimental results demonstrate that the proposed method effectively generates expert comments, overcoming the limitations of existing methods and offering valuable guidance for athletes across various skill levels.
format	Article
id	doaj-art-7598a78ea8724ab0ae02d494979355ea
institution	Kabale University
issn	1424-8220
language	English
publishDate	2025-01-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj-art-7598a78ea8724ab0ae02d494979355ea2025-01-24T13:48:58ZengMDPI AGSensors1424-82202025-01-0125244710.3390/s25020447Expert Comment Generation Considering Sports Skill Level Using a Large Multimodal Model with Video and Spatial-Temporal Motion FeaturesTatsuki Seino0Naoki Saito1Takahiro Ogawa2Satoshi Asamizu3Miki Haseyama4Graduate School of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, JapanOffice of Institutional Research, Hokkaido University, N-8, W-5, Kita-ku, Sapporo 060-0808, Hokkaido, JapanFaculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, JapanNational Institute of Technology, Kushiro College, 2 Chome-32-1 Otanoshikenishi, Kushiro 084-0916, Hokkaido, JapanFaculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, JapanIn sports training, personalized skill assessment and feedback are crucial for athletes to master complex movements and improve performance. However, existing research on skill transfer predominantly focuses on skill evaluation through video analysis, addressing only a single facet of the multifaceted process required for skill acquisition. Furthermore, in the limited studies that generate expert comments, the learner’s skill level is predetermined, and the spatial-temporal information of human movement is often overlooked. To address this issue, we propose a novel approach to generate skill-level-aware expert comments by leveraging a Large Multimodal Model (LMM) and spatial-temporal motion features. Our method employs a Spatial-Temporal Attention Graph Convolutional Network (STA-GCN) to extract motion features that encapsulate the spatial-temporal dynamics of human movement. The STA-GCN classifies skill levels based on these motion features. The classified skill levels, along with the extracted motion features (intermediate features from the STA-GCN) and the original sports video, are then fed into the LMM. This integration enables the generation of detailed, context-specific expert comments that offer actionable insights for performance improvement. Our contributions are twofold: (1) We incorporate skill level classification results as inputs to the LMM, ensuring that feedback is appropriately tailored to the learner’s skill level; and (2) We integrate motion features that capture spatial-temporal information into the LMM, enhancing its ability to generate feedback based on the learner’s specific actions. Experimental results demonstrate that the proposed method effectively generates expert comments, overcoming the limitations of existing methods and offering valuable guidance for athletes across various skill levels.https://www.mdpi.com/1424-8220/25/2/447expert comment generationsports skill levelspatial-temporal attention graph convolutional networklarge multimodal model
spellingShingle	Tatsuki Seino Naoki Saito Takahiro Ogawa Satoshi Asamizu Miki Haseyama Expert Comment Generation Considering Sports Skill Level Using a Large Multimodal Model with Video and Spatial-Temporal Motion Features Sensors expert comment generation sports skill level spatial-temporal attention graph convolutional network large multimodal model
title	Expert Comment Generation Considering Sports Skill Level Using a Large Multimodal Model with Video and Spatial-Temporal Motion Features
title_full	Expert Comment Generation Considering Sports Skill Level Using a Large Multimodal Model with Video and Spatial-Temporal Motion Features
title_fullStr	Expert Comment Generation Considering Sports Skill Level Using a Large Multimodal Model with Video and Spatial-Temporal Motion Features
title_full_unstemmed	Expert Comment Generation Considering Sports Skill Level Using a Large Multimodal Model with Video and Spatial-Temporal Motion Features
title_short	Expert Comment Generation Considering Sports Skill Level Using a Large Multimodal Model with Video and Spatial-Temporal Motion Features
title_sort	expert comment generation considering sports skill level using a large multimodal model with video and spatial temporal motion features
topic	expert comment generation sports skill level spatial-temporal attention graph convolutional network large multimodal model
url	https://www.mdpi.com/1424-8220/25/2/447
work_keys_str_mv	AT tatsukiseino expertcommentgenerationconsideringsportsskilllevelusingalargemultimodalmodelwithvideoandspatialtemporalmotionfeatures AT naokisaito expertcommentgenerationconsideringsportsskilllevelusingalargemultimodalmodelwithvideoandspatialtemporalmotionfeatures AT takahiroogawa expertcommentgenerationconsideringsportsskilllevelusingalargemultimodalmodelwithvideoandspatialtemporalmotionfeatures AT satoshiasamizu expertcommentgenerationconsideringsportsskilllevelusingalargemultimodalmodelwithvideoandspatialtemporalmotionfeatures AT mikihaseyama expertcommentgenerationconsideringsportsskilllevelusingalargemultimodalmodelwithvideoandspatialtemporalmotionfeatures

Expert Comment Generation Considering Sports Skill Level Using a Large Multimodal Model with Video and Spatial-Temporal Motion Features

Similar Items