UniMotion-DM: Uniform Text-Motion Generation and Editing via Diffusion Model
Diffusion models have demonstrated substantial success in controllable generation for continuous modalities, positioning them as highly suitable for tasks such as human motion generation. However, existing approaches are typically limited to single-task applications, such as text-to-motion generatio...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2024-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10802885/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1846099861568487424 |
---|---|
author | Song Lin Wenjun Hou |
author_facet | Song Lin Wenjun Hou |
author_sort | Song Lin |
collection | DOAJ |
description | Diffusion models have demonstrated substantial success in controllable generation for continuous modalities, positioning them as highly suitable for tasks such as human motion generation. However, existing approaches are typically limited to single-task applications, such as text-to-motion generation, and often lack versatility and editing capabilities. To overcome these limitations, we propose UniMotion-DM, a unified framework for both text-motion generation and editing based on diffusion models. UniMotion-DM integrates three core components: 1) a Contrastive Text-Motion Variational Autoencoder (CTMV), which aligns text and motion in a shared latent space using contrastive learning; 2) a controllable diffusion model tailored to the CTMV representation for generating and editing multimodal content; and 3) a Multimodal Conditional Representation and Editing (MCRE) module that leverages CLIP embeddings to enable precise and flexible control across various tasks. The ability of UniMotion-DM to seamlessly handle text-to-motion generation, motion captioning, motion completion, and multimodal editing results in significant improvements in both quantitative and qualitative evaluations. Beyond conventional domains such as gaming and virtual reality, we emphasize UniMotion-DM’s potential in underexplored fields such as healthcare and creative industries. For example, UniMotion-DM could be used to generate personalized physical therapy routines or assist designers in rapidly prototyping motion-based narratives. By addressing these emerging applications, UniMotion-DM paves the way for utilizing multimodal generative models in interdisciplinary and socially impactful areas. |
format | Article |
id | doaj-art-01ba0cbcac2644de91f55eaa0f59bbdb |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2024-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-01ba0cbcac2644de91f55eaa0f59bbdb2024-12-31T00:01:02ZengIEEEIEEE Access2169-35362024-01-011219698419699910.1109/ACCESS.2024.351830010802885UniMotion-DM: Uniform Text-Motion Generation and Editing via Diffusion ModelSong Lin0https://orcid.org/0009-0008-3355-599XWenjun Hou1School of Intelligent Engineering and Automation, Beijing University of Posts and Telecommunications, Beijing, ChinaBeijing Key Laboratory of Network Systems and Network Culture, Beijing University of Posts and Telecommunications, Beijing, ChinaDiffusion models have demonstrated substantial success in controllable generation for continuous modalities, positioning them as highly suitable for tasks such as human motion generation. However, existing approaches are typically limited to single-task applications, such as text-to-motion generation, and often lack versatility and editing capabilities. To overcome these limitations, we propose UniMotion-DM, a unified framework for both text-motion generation and editing based on diffusion models. UniMotion-DM integrates three core components: 1) a Contrastive Text-Motion Variational Autoencoder (CTMV), which aligns text and motion in a shared latent space using contrastive learning; 2) a controllable diffusion model tailored to the CTMV representation for generating and editing multimodal content; and 3) a Multimodal Conditional Representation and Editing (MCRE) module that leverages CLIP embeddings to enable precise and flexible control across various tasks. The ability of UniMotion-DM to seamlessly handle text-to-motion generation, motion captioning, motion completion, and multimodal editing results in significant improvements in both quantitative and qualitative evaluations. Beyond conventional domains such as gaming and virtual reality, we emphasize UniMotion-DM’s potential in underexplored fields such as healthcare and creative industries. For example, UniMotion-DM could be used to generate personalized physical therapy routines or assist designers in rapidly prototyping motion-based narratives. By addressing these emerging applications, UniMotion-DM paves the way for utilizing multimodal generative models in interdisciplinary and socially impactful areas.https://ieeexplore.ieee.org/document/10802885/Diffusion-based multimodal generationmultiple taskscontrastive learning |
spellingShingle | Song Lin Wenjun Hou UniMotion-DM: Uniform Text-Motion Generation and Editing via Diffusion Model IEEE Access Diffusion-based multimodal generation multiple tasks contrastive learning |
title | UniMotion-DM: Uniform Text-Motion Generation and Editing via Diffusion Model |
title_full | UniMotion-DM: Uniform Text-Motion Generation and Editing via Diffusion Model |
title_fullStr | UniMotion-DM: Uniform Text-Motion Generation and Editing via Diffusion Model |
title_full_unstemmed | UniMotion-DM: Uniform Text-Motion Generation and Editing via Diffusion Model |
title_short | UniMotion-DM: Uniform Text-Motion Generation and Editing via Diffusion Model |
title_sort | unimotion dm uniform text motion generation and editing via diffusion model |
topic | Diffusion-based multimodal generation multiple tasks contrastive learning |
url | https://ieeexplore.ieee.org/document/10802885/ |
work_keys_str_mv | AT songlin unimotiondmuniformtextmotiongenerationandeditingviadiffusionmodel AT wenjunhou unimotiondmuniformtextmotiongenerationandeditingviadiffusionmodel |