DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models
In recent years, the dance field has been able to create diverse content by leveraging technical advancements such as deep learning models, generating content beyond the unique artistic creations that only humans can create. However, in terms of dance data, there are still a lack of video and label...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2024-11-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/14/22/10116 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850149499398258688 |
|---|---|
| author | Seohyun Kim Kyogu Lee |
| author_facet | Seohyun Kim Kyogu Lee |
| author_sort | Seohyun Kim |
| collection | DOAJ |
| description | In recent years, the dance field has been able to create diverse content by leveraging technical advancements such as deep learning models, generating content beyond the unique artistic creations that only humans can create. However, in terms of dance data, there are still a lack of video and label datasets or datasets that contain multiple tags for videos. To address this gap, this paper explores the feasibility of generating dance captions from tags using a pseudo-captioning approach, inspired by the significant improvements large language models (LLMs) have shown in other domains. Various tags are generated from features extracted from videos and audio, and LLMs are then instructed to produce dance captions based on these tags. Captions were generated using both the open dance dataset and Internet dance videos, followed by user evaluations of randomly sampled captions. Participants found the captions effective in describing dance movements, of expert quality, and consistent with video content. Additionally, positive feedback was received on the evaluation of the gap in image extraction and the inclusion of tag data. This paper introduces and validates a novel pseudo-captioning method for generating dance captions using predefined tags, contributing to the expansion of data available for dance research and offering a practical solution to the current lack of datasets in this field. |
| format | Article |
| id | doaj-art-fd06eef734ff456d9e92f2f853369f10 |
| institution | OA Journals |
| issn | 2076-3417 |
| language | English |
| publishDate | 2024-11-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-fd06eef734ff456d9e92f2f853369f102025-08-20T02:26:54ZengMDPI AGApplied Sciences2076-34172024-11-0114221011610.3390/app142210116DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language ModelsSeohyun Kim0Kyogu Lee1Music and Audio Research Group, Department of Intelligence and Information, Seoul National University, Seoul 08826, Republic of KoreaMusic and Audio Research Group, Department of Intelligence and Information, Seoul National University, Seoul 08826, Republic of KoreaIn recent years, the dance field has been able to create diverse content by leveraging technical advancements such as deep learning models, generating content beyond the unique artistic creations that only humans can create. However, in terms of dance data, there are still a lack of video and label datasets or datasets that contain multiple tags for videos. To address this gap, this paper explores the feasibility of generating dance captions from tags using a pseudo-captioning approach, inspired by the significant improvements large language models (LLMs) have shown in other domains. Various tags are generated from features extracted from videos and audio, and LLMs are then instructed to produce dance captions based on these tags. Captions were generated using both the open dance dataset and Internet dance videos, followed by user evaluations of randomly sampled captions. Participants found the captions effective in describing dance movements, of expert quality, and consistent with video content. Additionally, positive feedback was received on the evaluation of the gap in image extraction and the inclusion of tag data. This paper introduces and validates a novel pseudo-captioning method for generating dance captions using predefined tags, contributing to the expansion of data available for dance research and offering a practical solution to the current lack of datasets in this field.https://www.mdpi.com/2076-3417/14/22/10116pseudo-captioningdance captionlarge language modelmultimodal tags |
| spellingShingle | Seohyun Kim Kyogu Lee DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models Applied Sciences pseudo-captioning dance caption large language model multimodal tags |
| title | DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models |
| title_full | DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models |
| title_fullStr | DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models |
| title_full_unstemmed | DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models |
| title_short | DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models |
| title_sort | dancecaps pseudo captioning for dance videos using large language models |
| topic | pseudo-captioning dance caption large language model multimodal tags |
| url | https://www.mdpi.com/2076-3417/14/22/10116 |
| work_keys_str_mv | AT seohyunkim dancecapspseudocaptioningfordancevideosusinglargelanguagemodels AT kyogulee dancecapspseudocaptioningfordancevideosusinglargelanguagemodels |