DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models

In recent years, the dance field has been able to create diverse content by leveraging technical advancements such as deep learning models, generating content beyond the unique artistic creations that only humans can create. However, in terms of dance data, there are still a lack of video and label...

Full description

Saved in:
Bibliographic Details
Main Authors: Seohyun Kim, Kyogu Lee
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/14/22/10116
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850149499398258688
author Seohyun Kim
Kyogu Lee
author_facet Seohyun Kim
Kyogu Lee
author_sort Seohyun Kim
collection DOAJ
description In recent years, the dance field has been able to create diverse content by leveraging technical advancements such as deep learning models, generating content beyond the unique artistic creations that only humans can create. However, in terms of dance data, there are still a lack of video and label datasets or datasets that contain multiple tags for videos. To address this gap, this paper explores the feasibility of generating dance captions from tags using a pseudo-captioning approach, inspired by the significant improvements large language models (LLMs) have shown in other domains. Various tags are generated from features extracted from videos and audio, and LLMs are then instructed to produce dance captions based on these tags. Captions were generated using both the open dance dataset and Internet dance videos, followed by user evaluations of randomly sampled captions. Participants found the captions effective in describing dance movements, of expert quality, and consistent with video content. Additionally, positive feedback was received on the evaluation of the gap in image extraction and the inclusion of tag data. This paper introduces and validates a novel pseudo-captioning method for generating dance captions using predefined tags, contributing to the expansion of data available for dance research and offering a practical solution to the current lack of datasets in this field.
format Article
id doaj-art-fd06eef734ff456d9e92f2f853369f10
institution OA Journals
issn 2076-3417
language English
publishDate 2024-11-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-fd06eef734ff456d9e92f2f853369f102025-08-20T02:26:54ZengMDPI AGApplied Sciences2076-34172024-11-0114221011610.3390/app142210116DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language ModelsSeohyun Kim0Kyogu Lee1Music and Audio Research Group, Department of Intelligence and Information, Seoul National University, Seoul 08826, Republic of KoreaMusic and Audio Research Group, Department of Intelligence and Information, Seoul National University, Seoul 08826, Republic of KoreaIn recent years, the dance field has been able to create diverse content by leveraging technical advancements such as deep learning models, generating content beyond the unique artistic creations that only humans can create. However, in terms of dance data, there are still a lack of video and label datasets or datasets that contain multiple tags for videos. To address this gap, this paper explores the feasibility of generating dance captions from tags using a pseudo-captioning approach, inspired by the significant improvements large language models (LLMs) have shown in other domains. Various tags are generated from features extracted from videos and audio, and LLMs are then instructed to produce dance captions based on these tags. Captions were generated using both the open dance dataset and Internet dance videos, followed by user evaluations of randomly sampled captions. Participants found the captions effective in describing dance movements, of expert quality, and consistent with video content. Additionally, positive feedback was received on the evaluation of the gap in image extraction and the inclusion of tag data. This paper introduces and validates a novel pseudo-captioning method for generating dance captions using predefined tags, contributing to the expansion of data available for dance research and offering a practical solution to the current lack of datasets in this field.https://www.mdpi.com/2076-3417/14/22/10116pseudo-captioningdance captionlarge language modelmultimodal tags
spellingShingle Seohyun Kim
Kyogu Lee
DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models
Applied Sciences
pseudo-captioning
dance caption
large language model
multimodal tags
title DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models
title_full DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models
title_fullStr DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models
title_full_unstemmed DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models
title_short DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models
title_sort dancecaps pseudo captioning for dance videos using large language models
topic pseudo-captioning
dance caption
large language model
multimodal tags
url https://www.mdpi.com/2076-3417/14/22/10116
work_keys_str_mv AT seohyunkim dancecapspseudocaptioningfordancevideosusinglargelanguagemodels
AT kyogulee dancecapspseudocaptioningfordancevideosusinglargelanguagemodels