Research on Speech Enhancement Translation and Mel-Spectrogram Mapping Method for the Deaf Based on Pix2PixGANs

This study proposes an innovative speech translation method based on Pix2PixGAN, which maps the Mel spectrograms of speech produced by deaf individuals to those of normal-hearing individuals and generates semantically coherent speech output. The objective is to translate speech produced by the deaf...

Full description

Saved in:
Bibliographic Details
Main Authors: Shaoting Zeng, Xinran Xu, Xinyu Chi, Yuqing Liu, Huiting Yu, Feng Zou
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11002503/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849744813247692800
author Shaoting Zeng
Xinran Xu
Xinyu Chi
Yuqing Liu
Huiting Yu
Feng Zou
author_facet Shaoting Zeng
Xinran Xu
Xinyu Chi
Yuqing Liu
Huiting Yu
Feng Zou
author_sort Shaoting Zeng
collection DOAJ
description This study proposes an innovative speech translation method based on Pix2PixGAN, which maps the Mel spectrograms of speech produced by deaf individuals to those of normal-hearing individuals and generates semantically coherent speech output. The objective is to translate speech produced by the deaf into intelligible speech as it would be spoken by hearing individuals, thereby enhancing understandability and supporting assisted communication. A paired Mel spectrogram dataset was constructed using speech from both deaf and normal-hearing individuals. Deaf speech data were manually extracted from video segments, while the corresponding normal-hearing speech was synthesized using a text-to-speech (TTS) system. Mel spectrograms were then extracted as training data. The model is built upon the Pix2PixGAN framework, with deaf speech spectrograms as input and target hearing spectrograms as output. Model performance was evaluated using SSIM, PSNR, and MSE metrics. The results demonstrate excellent fidelity in structure and clarity in signal restoration, particularly in the low-frequency regions associated with semantic content. Unlike traditional deaf speech translation methods, this study innovatively combines Pix2PixGAN with Mel spectrogram representations, reframing the speech translation task as an image-to-image translation problem. By matching and concatenating speech segments from a reference database, the system generates natural and intelligible speech output. User survey results showed high ratings in both semantic consistency and naturalness of the generated speech. This method offers a viable technical pathway for facilitating communication between deaf and hearing individuals and provides a valuable reference for personalized speech enhancement in complex auditory environments.
format Article
id doaj-art-648f03341d644ac8bb553da075dfa847
institution DOAJ
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-648f03341d644ac8bb553da075dfa8472025-08-20T03:08:17ZengIEEEIEEE Access2169-35362025-01-0113851398515510.1109/ACCESS.2025.356932111002503Research on Speech Enhancement Translation and Mel-Spectrogram Mapping Method for the Deaf Based on Pix2PixGANsShaoting Zeng0https://orcid.org/0000-0003-4720-7351Xinran Xu1Xinyu Chi2Yuqing Liu3Huiting Yu4Feng Zou5College of Art and Design, Beijing University of Technology, Beijing, ChinaCollege of Art and Design, Beijing University of Technology, Beijing, ChinaCollege of Art and Design, Beijing University of Technology, Beijing, ChinaCollege of Art and Design, Beijing University of Technology, Beijing, ChinaCollege of Art and Design, Beijing University of Technology, Beijing, ChinaCollege of Art and Design, Beijing University of Technology, Beijing, ChinaThis study proposes an innovative speech translation method based on Pix2PixGAN, which maps the Mel spectrograms of speech produced by deaf individuals to those of normal-hearing individuals and generates semantically coherent speech output. The objective is to translate speech produced by the deaf into intelligible speech as it would be spoken by hearing individuals, thereby enhancing understandability and supporting assisted communication. A paired Mel spectrogram dataset was constructed using speech from both deaf and normal-hearing individuals. Deaf speech data were manually extracted from video segments, while the corresponding normal-hearing speech was synthesized using a text-to-speech (TTS) system. Mel spectrograms were then extracted as training data. The model is built upon the Pix2PixGAN framework, with deaf speech spectrograms as input and target hearing spectrograms as output. Model performance was evaluated using SSIM, PSNR, and MSE metrics. The results demonstrate excellent fidelity in structure and clarity in signal restoration, particularly in the low-frequency regions associated with semantic content. Unlike traditional deaf speech translation methods, this study innovatively combines Pix2PixGAN with Mel spectrogram representations, reframing the speech translation task as an image-to-image translation problem. By matching and concatenating speech segments from a reference database, the system generates natural and intelligible speech output. User survey results showed high ratings in both semantic consistency and naturalness of the generated speech. This method offers a viable technical pathway for facilitating communication between deaf and hearing individuals and provides a valuable reference for personalized speech enhancement in complex auditory environments.https://ieeexplore.ieee.org/document/11002503/Deaf speech enhancement and translationPix2PixGANsMel spectrogramsgenerative adversarial networksAI-assisted design for deaf welfare
spellingShingle Shaoting Zeng
Xinran Xu
Xinyu Chi
Yuqing Liu
Huiting Yu
Feng Zou
Research on Speech Enhancement Translation and Mel-Spectrogram Mapping Method for the Deaf Based on Pix2PixGANs
IEEE Access
Deaf speech enhancement and translation
Pix2PixGANs
Mel spectrograms
generative adversarial networks
AI-assisted design for deaf welfare
title Research on Speech Enhancement Translation and Mel-Spectrogram Mapping Method for the Deaf Based on Pix2PixGANs
title_full Research on Speech Enhancement Translation and Mel-Spectrogram Mapping Method for the Deaf Based on Pix2PixGANs
title_fullStr Research on Speech Enhancement Translation and Mel-Spectrogram Mapping Method for the Deaf Based on Pix2PixGANs
title_full_unstemmed Research on Speech Enhancement Translation and Mel-Spectrogram Mapping Method for the Deaf Based on Pix2PixGANs
title_short Research on Speech Enhancement Translation and Mel-Spectrogram Mapping Method for the Deaf Based on Pix2PixGANs
title_sort research on speech enhancement translation and mel spectrogram mapping method for the deaf based on pix2pixgans
topic Deaf speech enhancement and translation
Pix2PixGANs
Mel spectrograms
generative adversarial networks
AI-assisted design for deaf welfare
url https://ieeexplore.ieee.org/document/11002503/
work_keys_str_mv AT shaotingzeng researchonspeechenhancementtranslationandmelspectrogrammappingmethodforthedeafbasedonpix2pixgans
AT xinranxu researchonspeechenhancementtranslationandmelspectrogrammappingmethodforthedeafbasedonpix2pixgans
AT xinyuchi researchonspeechenhancementtranslationandmelspectrogrammappingmethodforthedeafbasedonpix2pixgans
AT yuqingliu researchonspeechenhancementtranslationandmelspectrogrammappingmethodforthedeafbasedonpix2pixgans
AT huitingyu researchonspeechenhancementtranslationandmelspectrogrammappingmethodforthedeafbasedonpix2pixgans
AT fengzou researchonspeechenhancementtranslationandmelspectrogrammappingmethodforthedeafbasedonpix2pixgans