DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models

In recent years, the dance field has been able to create diverse content by leveraging technical advancements such as deep learning models, generating content beyond the unique artistic creations that only humans can create. However, in terms of dance data, there are still a lack of video and label...

Full description

Saved in:
Bibliographic Details
Main Authors: Seohyun Kim, Kyogu Lee
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/14/22/10116
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In recent years, the dance field has been able to create diverse content by leveraging technical advancements such as deep learning models, generating content beyond the unique artistic creations that only humans can create. However, in terms of dance data, there are still a lack of video and label datasets or datasets that contain multiple tags for videos. To address this gap, this paper explores the feasibility of generating dance captions from tags using a pseudo-captioning approach, inspired by the significant improvements large language models (LLMs) have shown in other domains. Various tags are generated from features extracted from videos and audio, and LLMs are then instructed to produce dance captions based on these tags. Captions were generated using both the open dance dataset and Internet dance videos, followed by user evaluations of randomly sampled captions. Participants found the captions effective in describing dance movements, of expert quality, and consistent with video content. Additionally, positive feedback was received on the evaluation of the gap in image extraction and the inclusion of tag data. This paper introduces and validates a novel pseudo-captioning method for generating dance captions using predefined tags, contributing to the expansion of data available for dance research and offering a practical solution to the current lack of datasets in this field.
ISSN:2076-3417