Research on Speech Enhancement Translation and Mel-Spectrogram Mapping Method for the Deaf Based on Pix2PixGANs
This study proposes an innovative speech translation method based on Pix2PixGAN, which maps the Mel spectrograms of speech produced by deaf individuals to those of normal-hearing individuals and generates semantically coherent speech output. The objective is to translate speech produced by the deaf...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11002503/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849744813247692800 |
|---|---|
| author | Shaoting Zeng Xinran Xu Xinyu Chi Yuqing Liu Huiting Yu Feng Zou |
| author_facet | Shaoting Zeng Xinran Xu Xinyu Chi Yuqing Liu Huiting Yu Feng Zou |
| author_sort | Shaoting Zeng |
| collection | DOAJ |
| description | This study proposes an innovative speech translation method based on Pix2PixGAN, which maps the Mel spectrograms of speech produced by deaf individuals to those of normal-hearing individuals and generates semantically coherent speech output. The objective is to translate speech produced by the deaf into intelligible speech as it would be spoken by hearing individuals, thereby enhancing understandability and supporting assisted communication. A paired Mel spectrogram dataset was constructed using speech from both deaf and normal-hearing individuals. Deaf speech data were manually extracted from video segments, while the corresponding normal-hearing speech was synthesized using a text-to-speech (TTS) system. Mel spectrograms were then extracted as training data. The model is built upon the Pix2PixGAN framework, with deaf speech spectrograms as input and target hearing spectrograms as output. Model performance was evaluated using SSIM, PSNR, and MSE metrics. The results demonstrate excellent fidelity in structure and clarity in signal restoration, particularly in the low-frequency regions associated with semantic content. Unlike traditional deaf speech translation methods, this study innovatively combines Pix2PixGAN with Mel spectrogram representations, reframing the speech translation task as an image-to-image translation problem. By matching and concatenating speech segments from a reference database, the system generates natural and intelligible speech output. User survey results showed high ratings in both semantic consistency and naturalness of the generated speech. This method offers a viable technical pathway for facilitating communication between deaf and hearing individuals and provides a valuable reference for personalized speech enhancement in complex auditory environments. |
| format | Article |
| id | doaj-art-648f03341d644ac8bb553da075dfa847 |
| institution | DOAJ |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-648f03341d644ac8bb553da075dfa8472025-08-20T03:08:17ZengIEEEIEEE Access2169-35362025-01-0113851398515510.1109/ACCESS.2025.356932111002503Research on Speech Enhancement Translation and Mel-Spectrogram Mapping Method for the Deaf Based on Pix2PixGANsShaoting Zeng0https://orcid.org/0000-0003-4720-7351Xinran Xu1Xinyu Chi2Yuqing Liu3Huiting Yu4Feng Zou5College of Art and Design, Beijing University of Technology, Beijing, ChinaCollege of Art and Design, Beijing University of Technology, Beijing, ChinaCollege of Art and Design, Beijing University of Technology, Beijing, ChinaCollege of Art and Design, Beijing University of Technology, Beijing, ChinaCollege of Art and Design, Beijing University of Technology, Beijing, ChinaCollege of Art and Design, Beijing University of Technology, Beijing, ChinaThis study proposes an innovative speech translation method based on Pix2PixGAN, which maps the Mel spectrograms of speech produced by deaf individuals to those of normal-hearing individuals and generates semantically coherent speech output. The objective is to translate speech produced by the deaf into intelligible speech as it would be spoken by hearing individuals, thereby enhancing understandability and supporting assisted communication. A paired Mel spectrogram dataset was constructed using speech from both deaf and normal-hearing individuals. Deaf speech data were manually extracted from video segments, while the corresponding normal-hearing speech was synthesized using a text-to-speech (TTS) system. Mel spectrograms were then extracted as training data. The model is built upon the Pix2PixGAN framework, with deaf speech spectrograms as input and target hearing spectrograms as output. Model performance was evaluated using SSIM, PSNR, and MSE metrics. The results demonstrate excellent fidelity in structure and clarity in signal restoration, particularly in the low-frequency regions associated with semantic content. Unlike traditional deaf speech translation methods, this study innovatively combines Pix2PixGAN with Mel spectrogram representations, reframing the speech translation task as an image-to-image translation problem. By matching and concatenating speech segments from a reference database, the system generates natural and intelligible speech output. User survey results showed high ratings in both semantic consistency and naturalness of the generated speech. This method offers a viable technical pathway for facilitating communication between deaf and hearing individuals and provides a valuable reference for personalized speech enhancement in complex auditory environments.https://ieeexplore.ieee.org/document/11002503/Deaf speech enhancement and translationPix2PixGANsMel spectrogramsgenerative adversarial networksAI-assisted design for deaf welfare |
| spellingShingle | Shaoting Zeng Xinran Xu Xinyu Chi Yuqing Liu Huiting Yu Feng Zou Research on Speech Enhancement Translation and Mel-Spectrogram Mapping Method for the Deaf Based on Pix2PixGANs IEEE Access Deaf speech enhancement and translation Pix2PixGANs Mel spectrograms generative adversarial networks AI-assisted design for deaf welfare |
| title | Research on Speech Enhancement Translation and Mel-Spectrogram Mapping Method for the Deaf Based on Pix2PixGANs |
| title_full | Research on Speech Enhancement Translation and Mel-Spectrogram Mapping Method for the Deaf Based on Pix2PixGANs |
| title_fullStr | Research on Speech Enhancement Translation and Mel-Spectrogram Mapping Method for the Deaf Based on Pix2PixGANs |
| title_full_unstemmed | Research on Speech Enhancement Translation and Mel-Spectrogram Mapping Method for the Deaf Based on Pix2PixGANs |
| title_short | Research on Speech Enhancement Translation and Mel-Spectrogram Mapping Method for the Deaf Based on Pix2PixGANs |
| title_sort | research on speech enhancement translation and mel spectrogram mapping method for the deaf based on pix2pixgans |
| topic | Deaf speech enhancement and translation Pix2PixGANs Mel spectrograms generative adversarial networks AI-assisted design for deaf welfare |
| url | https://ieeexplore.ieee.org/document/11002503/ |
| work_keys_str_mv | AT shaotingzeng researchonspeechenhancementtranslationandmelspectrogrammappingmethodforthedeafbasedonpix2pixgans AT xinranxu researchonspeechenhancementtranslationandmelspectrogrammappingmethodforthedeafbasedonpix2pixgans AT xinyuchi researchonspeechenhancementtranslationandmelspectrogrammappingmethodforthedeafbasedonpix2pixgans AT yuqingliu researchonspeechenhancementtranslationandmelspectrogrammappingmethodforthedeafbasedonpix2pixgans AT huitingyu researchonspeechenhancementtranslationandmelspectrogrammappingmethodforthedeafbasedonpix2pixgans AT fengzou researchonspeechenhancementtranslationandmelspectrogrammappingmethodforthedeafbasedonpix2pixgans |