Expert Comment Generation Considering Sports Skill Level Using a Large Multimodal Model with Video and Spatial-Temporal Motion Features

In sports training, personalized skill assessment and feedback are crucial for athletes to master complex movements and improve performance. However, existing research on skill transfer predominantly focuses on skill evaluation through video analysis, addressing only a single facet of the multifacet...

Full description

Saved in:
Bibliographic Details
Main Authors: Tatsuki Seino, Naoki Saito, Takahiro Ogawa, Satoshi Asamizu, Miki Haseyama
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/2/447
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832587502563622912
author Tatsuki Seino
Naoki Saito
Takahiro Ogawa
Satoshi Asamizu
Miki Haseyama
author_facet Tatsuki Seino
Naoki Saito
Takahiro Ogawa
Satoshi Asamizu
Miki Haseyama
author_sort Tatsuki Seino
collection DOAJ
description In sports training, personalized skill assessment and feedback are crucial for athletes to master complex movements and improve performance. However, existing research on skill transfer predominantly focuses on skill evaluation through video analysis, addressing only a single facet of the multifaceted process required for skill acquisition. Furthermore, in the limited studies that generate expert comments, the learner’s skill level is predetermined, and the spatial-temporal information of human movement is often overlooked. To address this issue, we propose a novel approach to generate skill-level-aware expert comments by leveraging a Large Multimodal Model (LMM) and spatial-temporal motion features. Our method employs a Spatial-Temporal Attention Graph Convolutional Network (STA-GCN) to extract motion features that encapsulate the spatial-temporal dynamics of human movement. The STA-GCN classifies skill levels based on these motion features. The classified skill levels, along with the extracted motion features (intermediate features from the STA-GCN) and the original sports video, are then fed into the LMM. This integration enables the generation of detailed, context-specific expert comments that offer actionable insights for performance improvement. Our contributions are twofold: (1) We incorporate skill level classification results as inputs to the LMM, ensuring that feedback is appropriately tailored to the learner’s skill level; and (2) We integrate motion features that capture spatial-temporal information into the LMM, enhancing its ability to generate feedback based on the learner’s specific actions. Experimental results demonstrate that the proposed method effectively generates expert comments, overcoming the limitations of existing methods and offering valuable guidance for athletes across various skill levels.
format Article
id doaj-art-7598a78ea8724ab0ae02d494979355ea
institution Kabale University
issn 1424-8220
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj-art-7598a78ea8724ab0ae02d494979355ea2025-01-24T13:48:58ZengMDPI AGSensors1424-82202025-01-0125244710.3390/s25020447Expert Comment Generation Considering Sports Skill Level Using a Large Multimodal Model with Video and Spatial-Temporal Motion FeaturesTatsuki Seino0Naoki Saito1Takahiro Ogawa2Satoshi Asamizu3Miki Haseyama4Graduate School of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, JapanOffice of Institutional Research, Hokkaido University, N-8, W-5, Kita-ku, Sapporo 060-0808, Hokkaido, JapanFaculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, JapanNational Institute of Technology, Kushiro College, 2 Chome-32-1 Otanoshikenishi, Kushiro 084-0916, Hokkaido, JapanFaculty of Information Science and Technology, Hokkaido University, N-14, W-9, Kita-ku, Sapporo 060-0814, Hokkaido, JapanIn sports training, personalized skill assessment and feedback are crucial for athletes to master complex movements and improve performance. However, existing research on skill transfer predominantly focuses on skill evaluation through video analysis, addressing only a single facet of the multifaceted process required for skill acquisition. Furthermore, in the limited studies that generate expert comments, the learner’s skill level is predetermined, and the spatial-temporal information of human movement is often overlooked. To address this issue, we propose a novel approach to generate skill-level-aware expert comments by leveraging a Large Multimodal Model (LMM) and spatial-temporal motion features. Our method employs a Spatial-Temporal Attention Graph Convolutional Network (STA-GCN) to extract motion features that encapsulate the spatial-temporal dynamics of human movement. The STA-GCN classifies skill levels based on these motion features. The classified skill levels, along with the extracted motion features (intermediate features from the STA-GCN) and the original sports video, are then fed into the LMM. This integration enables the generation of detailed, context-specific expert comments that offer actionable insights for performance improvement. Our contributions are twofold: (1) We incorporate skill level classification results as inputs to the LMM, ensuring that feedback is appropriately tailored to the learner’s skill level; and (2) We integrate motion features that capture spatial-temporal information into the LMM, enhancing its ability to generate feedback based on the learner’s specific actions. Experimental results demonstrate that the proposed method effectively generates expert comments, overcoming the limitations of existing methods and offering valuable guidance for athletes across various skill levels.https://www.mdpi.com/1424-8220/25/2/447expert comment generationsports skill levelspatial-temporal attention graph convolutional networklarge multimodal model
spellingShingle Tatsuki Seino
Naoki Saito
Takahiro Ogawa
Satoshi Asamizu
Miki Haseyama
Expert Comment Generation Considering Sports Skill Level Using a Large Multimodal Model with Video and Spatial-Temporal Motion Features
Sensors
expert comment generation
sports skill level
spatial-temporal attention graph convolutional network
large multimodal model
title Expert Comment Generation Considering Sports Skill Level Using a Large Multimodal Model with Video and Spatial-Temporal Motion Features
title_full Expert Comment Generation Considering Sports Skill Level Using a Large Multimodal Model with Video and Spatial-Temporal Motion Features
title_fullStr Expert Comment Generation Considering Sports Skill Level Using a Large Multimodal Model with Video and Spatial-Temporal Motion Features
title_full_unstemmed Expert Comment Generation Considering Sports Skill Level Using a Large Multimodal Model with Video and Spatial-Temporal Motion Features
title_short Expert Comment Generation Considering Sports Skill Level Using a Large Multimodal Model with Video and Spatial-Temporal Motion Features
title_sort expert comment generation considering sports skill level using a large multimodal model with video and spatial temporal motion features
topic expert comment generation
sports skill level
spatial-temporal attention graph convolutional network
large multimodal model
url https://www.mdpi.com/1424-8220/25/2/447
work_keys_str_mv AT tatsukiseino expertcommentgenerationconsideringsportsskilllevelusingalargemultimodalmodelwithvideoandspatialtemporalmotionfeatures
AT naokisaito expertcommentgenerationconsideringsportsskilllevelusingalargemultimodalmodelwithvideoandspatialtemporalmotionfeatures
AT takahiroogawa expertcommentgenerationconsideringsportsskilllevelusingalargemultimodalmodelwithvideoandspatialtemporalmotionfeatures
AT satoshiasamizu expertcommentgenerationconsideringsportsskilllevelusingalargemultimodalmodelwithvideoandspatialtemporalmotionfeatures
AT mikihaseyama expertcommentgenerationconsideringsportsskilllevelusingalargemultimodalmodelwithvideoandspatialtemporalmotionfeatures