DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models

In recent years, the dance field has been able to create diverse content by leveraging technical advancements such as deep learning models, generating content beyond the unique artistic creations that only humans can create. However, in terms of dance data, there are still a lack of video and label...

Full description

Saved in:

Bibliographic Details
Main Authors:	Seohyun Kim, Kyogu Lee
Format:	Article
Language:	English
Published:	MDPI AG 2024-11-01
Series:	Applied Sciences
Subjects:	pseudo-captioning dance caption large language model multimodal tags
Online Access:	https://www.mdpi.com/2076-3417/14/22/10116
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850149499398258688
author	Seohyun Kim Kyogu Lee
author_facet	Seohyun Kim Kyogu Lee
author_sort	Seohyun Kim
collection	DOAJ
description	In recent years, the dance field has been able to create diverse content by leveraging technical advancements such as deep learning models, generating content beyond the unique artistic creations that only humans can create. However, in terms of dance data, there are still a lack of video and label datasets or datasets that contain multiple tags for videos. To address this gap, this paper explores the feasibility of generating dance captions from tags using a pseudo-captioning approach, inspired by the significant improvements large language models (LLMs) have shown in other domains. Various tags are generated from features extracted from videos and audio, and LLMs are then instructed to produce dance captions based on these tags. Captions were generated using both the open dance dataset and Internet dance videos, followed by user evaluations of randomly sampled captions. Participants found the captions effective in describing dance movements, of expert quality, and consistent with video content. Additionally, positive feedback was received on the evaluation of the gap in image extraction and the inclusion of tag data. This paper introduces and validates a novel pseudo-captioning method for generating dance captions using predefined tags, contributing to the expansion of data available for dance research and offering a practical solution to the current lack of datasets in this field.
format	Article
id	doaj-art-fd06eef734ff456d9e92f2f853369f10
institution	OA Journals
issn	2076-3417
language	English
publishDate	2024-11-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj-art-fd06eef734ff456d9e92f2f853369f102025-08-20T02:26:54ZengMDPI AGApplied Sciences2076-34172024-11-0114221011610.3390/app142210116DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language ModelsSeohyun Kim0Kyogu Lee1Music and Audio Research Group, Department of Intelligence and Information, Seoul National University, Seoul 08826, Republic of KoreaMusic and Audio Research Group, Department of Intelligence and Information, Seoul National University, Seoul 08826, Republic of KoreaIn recent years, the dance field has been able to create diverse content by leveraging technical advancements such as deep learning models, generating content beyond the unique artistic creations that only humans can create. However, in terms of dance data, there are still a lack of video and label datasets or datasets that contain multiple tags for videos. To address this gap, this paper explores the feasibility of generating dance captions from tags using a pseudo-captioning approach, inspired by the significant improvements large language models (LLMs) have shown in other domains. Various tags are generated from features extracted from videos and audio, and LLMs are then instructed to produce dance captions based on these tags. Captions were generated using both the open dance dataset and Internet dance videos, followed by user evaluations of randomly sampled captions. Participants found the captions effective in describing dance movements, of expert quality, and consistent with video content. Additionally, positive feedback was received on the evaluation of the gap in image extraction and the inclusion of tag data. This paper introduces and validates a novel pseudo-captioning method for generating dance captions using predefined tags, contributing to the expansion of data available for dance research and offering a practical solution to the current lack of datasets in this field.https://www.mdpi.com/2076-3417/14/22/10116pseudo-captioningdance captionlarge language modelmultimodal tags
spellingShingle	Seohyun Kim Kyogu Lee DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models Applied Sciences pseudo-captioning dance caption large language model multimodal tags
title	DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models
title_full	DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models
title_fullStr	DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models
title_full_unstemmed	DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models
title_short	DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models
title_sort	dancecaps pseudo captioning for dance videos using large language models
topic	pseudo-captioning dance caption large language model multimodal tags
url	https://www.mdpi.com/2076-3417/14/22/10116
work_keys_str_mv	AT seohyunkim dancecapspseudocaptioningfordancevideosusinglargelanguagemodels AT kyogulee dancecapspseudocaptioningfordancevideosusinglargelanguagemodels

DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models

Similar Items