Wav2Lip Bridges Communication Gap: Automating Lip Sync and Language Translation for Indian Languages

Translating spoken speech in videos from one language to another is known as audio-visual translation (AVT). This paper describes the implementation of an automated AVT and lip-synced dubbing application. It addresses the difficulty of synchronizing mouth movements with translated speech by building...

Full description

Saved in:

Bibliographic Details
Main Authors:	Vaishnavi Venkataraghavan, Shoba Sivapatham, Asutosh Kar
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Wav2Lip automatic speech recognition (ASR) audio-visual translation (AVT) lip synchronization google speech recognition (GSR) Wav2vec 2.0
Online Access:	https://ieeexplore.ieee.org/document/10971971/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850281641902080000
author	Vaishnavi Venkataraghavan Shoba Sivapatham Asutosh Kar
author_facet	Vaishnavi Venkataraghavan Shoba Sivapatham Asutosh Kar
author_sort	Vaishnavi Venkataraghavan
collection	DOAJ
description	Translating spoken speech in videos from one language to another is known as audio-visual translation (AVT). This paper describes the implementation of an automated AVT and lip-synced dubbing application. It addresses the difficulty of synchronizing mouth movements with translated speech by building a web application that synthesizes the speaker’s lip movements to match translated audio. Using ASR models, the speech from the source video is converted to text, translated into several languages, and then automatically synthesized into speech in the target language. A lip synchronization model, Wav2Lip, is used to alter the mouth movements in the video to correspond to the target language. We compare our work with two well-known ASR systems: Wav2vec 2.0 and Google Speech Recognition. Wav2vec 2.0 performs better with the lesser average WER% of 15.38 and is used in our final web application. The performance of the video dubbing component is discussed with the generated speech in Tamil, Telugu, Hindi, and English, and we determine that our generated videos outperform the existing ones. Our proposed AVT application is user-friendly for a wide variety of speakers, utilizing readily available TTS systems instead of training on an individual speaker’s voice.
format	Article
id	doaj-art-a656c873774f45eb8a96b02ecad422d9
institution	OA Journals
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-a656c873774f45eb8a96b02ecad422d92025-08-20T01:48:12ZengIEEEIEEE Access2169-35362025-01-0113729067291710.1109/ACCESS.2025.356288310971971Wav2Lip Bridges Communication Gap: Automating Lip Sync and Language Translation for Indian LanguagesVaishnavi Venkataraghavan0Shoba Sivapatham1https://orcid.org/0000-0001-8036-2420Asutosh Kar2https://orcid.org/0000-0003-0011-0069School of Electronics Engineering, Vellore Institute of Technology, Chennai, IndiaCentre for Advanced Data Science, Vellore Institute of Technology, Chennai, IndiaDepartment of Electronics and Communication Engineering, Dr. B. R. Ambedkar National Institute of Technology, Jalandhar, Punjab, IndiaTranslating spoken speech in videos from one language to another is known as audio-visual translation (AVT). This paper describes the implementation of an automated AVT and lip-synced dubbing application. It addresses the difficulty of synchronizing mouth movements with translated speech by building a web application that synthesizes the speaker’s lip movements to match translated audio. Using ASR models, the speech from the source video is converted to text, translated into several languages, and then automatically synthesized into speech in the target language. A lip synchronization model, Wav2Lip, is used to alter the mouth movements in the video to correspond to the target language. We compare our work with two well-known ASR systems: Wav2vec 2.0 and Google Speech Recognition. Wav2vec 2.0 performs better with the lesser average WER% of 15.38 and is used in our final web application. The performance of the video dubbing component is discussed with the generated speech in Tamil, Telugu, Hindi, and English, and we determine that our generated videos outperform the existing ones. Our proposed AVT application is user-friendly for a wide variety of speakers, utilizing readily available TTS systems instead of training on an individual speaker’s voice.https://ieeexplore.ieee.org/document/10971971/Wav2Lipautomatic speech recognition (ASR)audio-visual translation (AVT)lip synchronizationgoogle speech recognition (GSR)Wav2vec 2.0
spellingShingle	Vaishnavi Venkataraghavan Shoba Sivapatham Asutosh Kar Wav2Lip Bridges Communication Gap: Automating Lip Sync and Language Translation for Indian Languages IEEE Access Wav2Lip automatic speech recognition (ASR) audio-visual translation (AVT) lip synchronization google speech recognition (GSR) Wav2vec 2.0
title	Wav2Lip Bridges Communication Gap: Automating Lip Sync and Language Translation for Indian Languages
title_full	Wav2Lip Bridges Communication Gap: Automating Lip Sync and Language Translation for Indian Languages
title_fullStr	Wav2Lip Bridges Communication Gap: Automating Lip Sync and Language Translation for Indian Languages
title_full_unstemmed	Wav2Lip Bridges Communication Gap: Automating Lip Sync and Language Translation for Indian Languages
title_short	Wav2Lip Bridges Communication Gap: Automating Lip Sync and Language Translation for Indian Languages
title_sort	wav2lip bridges communication gap automating lip sync and language translation for indian languages
topic	Wav2Lip automatic speech recognition (ASR) audio-visual translation (AVT) lip synchronization google speech recognition (GSR) Wav2vec 2.0
url	https://ieeexplore.ieee.org/document/10971971/
work_keys_str_mv	AT vaishnavivenkataraghavan wav2lipbridgescommunicationgapautomatinglipsyncandlanguagetranslationforindianlanguages AT shobasivapatham wav2lipbridgescommunicationgapautomatinglipsyncandlanguagetranslationforindianlanguages AT asutoshkar wav2lipbridgescommunicationgapautomatinglipsyncandlanguagetranslationforindianlanguages

Wav2Lip Bridges Communication Gap: Automating Lip Sync and Language Translation for Indian Languages

Similar Items