Lipsynthesis incorporating audio-visual synchronisation

With the flourishing development of video-based information dissemination, audio and video synchronization is gradually becoming an important standard for measuring video quality.Deep synthesis technology has been entering the public's view in the international communication field, and lip-sync...

Full description

Saved in:

Bibliographic Details
Main Authors:	Cong JIN, Jie WANG, Zichun GUO, Jing WANG
Format:	Article
Language:	zho
Published:	POSTS&TELECOM PRESS Co., LTD 2023-09-01
Series:	智能科学与技术学报
Subjects:	lip generation;deep learning;artificial intelligence;computer visualization;synchronization of audio and video
Online Access:	http://www.cjist.com.cn/thesisDetails#10.11959/j.issn.2096-6652.202335
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850213713720639488
author	Cong JIN Jie WANG Zichun GUO Jing WANG
author_facet	Cong JIN Jie WANG Zichun GUO Jing WANG
author_sort	Cong JIN
collection	DOAJ
description	With the flourishing development of video-based information dissemination, audio and video synchronization is gradually becoming an important standard for measuring video quality.Deep synthesis technology has been entering the public's view in the international communication field, and lip-sync technology integrating audio and video synchronization has attracted more and more attention.The existing lip-synthesis models are mainly based on lip-synthesis of static images, which are not effective for synthesis of dynamic videos, and most of them use English datasets for training which results in poor synthesis of Chinese Mandarin.To address these problems, this paper conducted optimization experiments on the Wav2Lip lip synthesis model in Chinese context based on its research foundation, and tested the effect of different routes of training models through multiple sets of experiments, which provided important reference values for the subsequent Wav2Lip series research.This study realized lip synthesis from speech-driven to text-driven, discussed the application of lip synthesis in multiple fields such as virtual digital human, and laid the foundation for the broader application and development of lip synthesis technology.
format	Article
id	doaj-art-564abe129b414c3fa466044de70ab33b
institution	OA Journals
issn	2096-6652
language	zho
publishDate	2023-09-01
publisher	POSTS&TELECOM PRESS Co., LTD
record_format	Article
series	智能科学与技术学报
spelling	doaj-art-564abe129b414c3fa466044de70ab33b2025-08-20T02:09:05ZzhoPOSTS&TELECOM PRESS Co., LTD智能科学与技术学报2096-66522023-09-01539740542316534Lipsynthesis incorporating audio-visual synchronisationCong JINJie WANGZichun GUOJing WANGWith the flourishing development of video-based information dissemination, audio and video synchronization is gradually becoming an important standard for measuring video quality.Deep synthesis technology has been entering the public's view in the international communication field, and lip-sync technology integrating audio and video synchronization has attracted more and more attention.The existing lip-synthesis models are mainly based on lip-synthesis of static images, which are not effective for synthesis of dynamic videos, and most of them use English datasets for training which results in poor synthesis of Chinese Mandarin.To address these problems, this paper conducted optimization experiments on the Wav2Lip lip synthesis model in Chinese context based on its research foundation, and tested the effect of different routes of training models through multiple sets of experiments, which provided important reference values for the subsequent Wav2Lip series research.This study realized lip synthesis from speech-driven to text-driven, discussed the application of lip synthesis in multiple fields such as virtual digital human, and laid the foundation for the broader application and development of lip synthesis technology.http://www.cjist.com.cn/thesisDetails#10.11959/j.issn.2096-6652.202335lip generation;deep learning;artificial intelligence;computer visualization;synchronization of audio and video
spellingShingle	Cong JIN Jie WANG Zichun GUO Jing WANG Lipsynthesis incorporating audio-visual synchronisation 智能科学与技术学报 lip generation;deep learning;artificial intelligence;computer visualization;synchronization of audio and video
title	Lipsynthesis incorporating audio-visual synchronisation
title_full	Lipsynthesis incorporating audio-visual synchronisation
title_fullStr	Lipsynthesis incorporating audio-visual synchronisation
title_full_unstemmed	Lipsynthesis incorporating audio-visual synchronisation
title_short	Lipsynthesis incorporating audio-visual synchronisation
title_sort	lipsynthesis incorporating audio visual synchronisation
topic	lip generation;deep learning;artificial intelligence;computer visualization;synchronization of audio and video
url	http://www.cjist.com.cn/thesisDetails#10.11959/j.issn.2096-6652.202335
work_keys_str_mv	AT congjin lipsynthesisincorporatingaudiovisualsynchronisation AT jiewang lipsynthesisincorporatingaudiovisualsynchronisation AT zichunguo lipsynthesisincorporatingaudiovisualsynchronisation AT jingwang lipsynthesisincorporatingaudiovisualsynchronisation

Lipsynthesis incorporating audio-visual synchronisation

Similar Items