UniMotion-DM: Uniform Text-Motion Generation and Editing via Diffusion Model

Diffusion models have demonstrated substantial success in controllable generation for continuous modalities, positioning them as highly suitable for tasks such as human motion generation. However, existing approaches are typically limited to single-task applications, such as text-to-motion generatio...

Full description

Saved in:

Bibliographic Details
Main Authors:	Song Lin, Wenjun Hou
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Access
Subjects:	Diffusion-based multimodal generation multiple tasks contrastive learning
Online Access:	https://ieeexplore.ieee.org/document/10802885/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1846099861568487424
author	Song Lin Wenjun Hou
author_facet	Song Lin Wenjun Hou
author_sort	Song Lin
collection	DOAJ
description	Diffusion models have demonstrated substantial success in controllable generation for continuous modalities, positioning them as highly suitable for tasks such as human motion generation. However, existing approaches are typically limited to single-task applications, such as text-to-motion generation, and often lack versatility and editing capabilities. To overcome these limitations, we propose UniMotion-DM, a unified framework for both text-motion generation and editing based on diffusion models. UniMotion-DM integrates three core components: 1) a Contrastive Text-Motion Variational Autoencoder (CTMV), which aligns text and motion in a shared latent space using contrastive learning; 2) a controllable diffusion model tailored to the CTMV representation for generating and editing multimodal content; and 3) a Multimodal Conditional Representation and Editing (MCRE) module that leverages CLIP embeddings to enable precise and flexible control across various tasks. The ability of UniMotion-DM to seamlessly handle text-to-motion generation, motion captioning, motion completion, and multimodal editing results in significant improvements in both quantitative and qualitative evaluations. Beyond conventional domains such as gaming and virtual reality, we emphasize UniMotion-DM’s potential in underexplored fields such as healthcare and creative industries. For example, UniMotion-DM could be used to generate personalized physical therapy routines or assist designers in rapidly prototyping motion-based narratives. By addressing these emerging applications, UniMotion-DM paves the way for utilizing multimodal generative models in interdisciplinary and socially impactful areas.
format	Article
id	doaj-art-01ba0cbcac2644de91f55eaa0f59bbdb
institution	Kabale University
issn	2169-3536
language	English
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-01ba0cbcac2644de91f55eaa0f59bbdb2024-12-31T00:01:02ZengIEEEIEEE Access2169-35362024-01-011219698419699910.1109/ACCESS.2024.351830010802885UniMotion-DM: Uniform Text-Motion Generation and Editing via Diffusion ModelSong Lin0https://orcid.org/0009-0008-3355-599XWenjun Hou1School of Intelligent Engineering and Automation, Beijing University of Posts and Telecommunications, Beijing, ChinaBeijing Key Laboratory of Network Systems and Network Culture, Beijing University of Posts and Telecommunications, Beijing, ChinaDiffusion models have demonstrated substantial success in controllable generation for continuous modalities, positioning them as highly suitable for tasks such as human motion generation. However, existing approaches are typically limited to single-task applications, such as text-to-motion generation, and often lack versatility and editing capabilities. To overcome these limitations, we propose UniMotion-DM, a unified framework for both text-motion generation and editing based on diffusion models. UniMotion-DM integrates three core components: 1) a Contrastive Text-Motion Variational Autoencoder (CTMV), which aligns text and motion in a shared latent space using contrastive learning; 2) a controllable diffusion model tailored to the CTMV representation for generating and editing multimodal content; and 3) a Multimodal Conditional Representation and Editing (MCRE) module that leverages CLIP embeddings to enable precise and flexible control across various tasks. The ability of UniMotion-DM to seamlessly handle text-to-motion generation, motion captioning, motion completion, and multimodal editing results in significant improvements in both quantitative and qualitative evaluations. Beyond conventional domains such as gaming and virtual reality, we emphasize UniMotion-DM’s potential in underexplored fields such as healthcare and creative industries. For example, UniMotion-DM could be used to generate personalized physical therapy routines or assist designers in rapidly prototyping motion-based narratives. By addressing these emerging applications, UniMotion-DM paves the way for utilizing multimodal generative models in interdisciplinary and socially impactful areas.https://ieeexplore.ieee.org/document/10802885/Diffusion-based multimodal generationmultiple taskscontrastive learning
spellingShingle	Song Lin Wenjun Hou UniMotion-DM: Uniform Text-Motion Generation and Editing via Diffusion Model IEEE Access Diffusion-based multimodal generation multiple tasks contrastive learning
title	UniMotion-DM: Uniform Text-Motion Generation and Editing via Diffusion Model
title_full	UniMotion-DM: Uniform Text-Motion Generation and Editing via Diffusion Model
title_fullStr	UniMotion-DM: Uniform Text-Motion Generation and Editing via Diffusion Model
title_full_unstemmed	UniMotion-DM: Uniform Text-Motion Generation and Editing via Diffusion Model
title_short	UniMotion-DM: Uniform Text-Motion Generation and Editing via Diffusion Model
title_sort	unimotion dm uniform text motion generation and editing via diffusion model
topic	Diffusion-based multimodal generation multiple tasks contrastive learning
url	https://ieeexplore.ieee.org/document/10802885/
work_keys_str_mv	AT songlin unimotiondmuniformtextmotiongenerationandeditingviadiffusionmodel AT wenjunhou unimotiondmuniformtextmotiongenerationandeditingviadiffusionmodel

UniMotion-DM: Uniform Text-Motion Generation and Editing via Diffusion Model

Similar Items