Transformer-based language-independent gender recognition in noisy audio environments

Abstract This study proposes an independent method for identifying the gender of the speaker from an audio clip in a noisy environment. In this paper are performed two different processes on audio clips: one as a Mel-Spectrogram and the other using the Wav2Vec2 acoustic model emission, examining the...

Full description

Saved in:

Bibliographic Details
Main Authors:	Or Haim Anidjar, Roi Yozevitch
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-04-01
Series:	Scientific Reports
Subjects:	Automatic speech recognition Wav2Vec 2.0 Language independent gender recognition
Online Access:	https://doi.org/10.1038/s41598-025-99011-x
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850139190655713280
author	Or Haim Anidjar Roi Yozevitch
author_facet	Or Haim Anidjar Roi Yozevitch
author_sort	Or Haim Anidjar
collection	DOAJ
description	Abstract This study proposes an independent method for identifying the gender of the speaker from an audio clip in a noisy environment. In this paper are performed two different processes on audio clips: one as a Mel-Spectrogram and the other using the Wav2Vec2 acoustic model emission, examining the advantages and disadvantages of each method. A series of experiments are presented across five different languages-English, Arabic, Spanish, French, and Russian-containing male and female audio clips. An analysis of these languages is carried out, examining their independent characteristics against a five-language model. The goal of this study is to distinguish the gender of the speaker based on an audio clip, regardless of language or complex background noise such as nightclubs or stadiums. Additionally, this research addresses the critical issue of gender bias in voice recognition systems. It highlights the challenges posed by the over-representation of male voices in training datasets and the subsequent impact on the accuracy and fairness of gender classification, particularly for female voices. The approach in this paper involves maintaining an equivalent quantity of audio clips for both male and female voices to ensure balance and mitigate this bias. The experimental results indicate that the performance evaluation of the traditional spectrogram method achieved better results compared to the Wav2Vec transformer method. For the Russian language, the spectrogram method achieved an accuracy of 99%, while the Wav2Vec transformer1 method achieved only 89% accuracy. Tests in various environments-noisy and silent-show that a model trained in both conditions exhibited better accuracy. The results also indicate that a model trained on data from a wide variety of languages yielded higher results. The research findings highlight important insights for developing more reliable, accurate, and equitable systems in acoustic gender detection.
format	Article
id	doaj-art-150be3e0632547ed8f6c8a6ccbbf3b7f
institution	OA Journals
issn	2045-2322
language	English
publishDate	2025-04-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-150be3e0632547ed8f6c8a6ccbbf3b7f2025-08-20T02:30:23ZengNature PortfolioScientific Reports2045-23222025-04-0115111610.1038/s41598-025-99011-xTransformer-based language-independent gender recognition in noisy audio environmentsOr Haim Anidjar0Roi Yozevitch1Faculty of Computer Science, College of ManagementDepartment of Computer and Software Engineering, Ariel UniversityAbstract This study proposes an independent method for identifying the gender of the speaker from an audio clip in a noisy environment. In this paper are performed two different processes on audio clips: one as a Mel-Spectrogram and the other using the Wav2Vec2 acoustic model emission, examining the advantages and disadvantages of each method. A series of experiments are presented across five different languages-English, Arabic, Spanish, French, and Russian-containing male and female audio clips. An analysis of these languages is carried out, examining their independent characteristics against a five-language model. The goal of this study is to distinguish the gender of the speaker based on an audio clip, regardless of language or complex background noise such as nightclubs or stadiums. Additionally, this research addresses the critical issue of gender bias in voice recognition systems. It highlights the challenges posed by the over-representation of male voices in training datasets and the subsequent impact on the accuracy and fairness of gender classification, particularly for female voices. The approach in this paper involves maintaining an equivalent quantity of audio clips for both male and female voices to ensure balance and mitigate this bias. The experimental results indicate that the performance evaluation of the traditional spectrogram method achieved better results compared to the Wav2Vec transformer method. For the Russian language, the spectrogram method achieved an accuracy of 99%, while the Wav2Vec transformer1 method achieved only 89% accuracy. Tests in various environments-noisy and silent-show that a model trained in both conditions exhibited better accuracy. The results also indicate that a model trained on data from a wide variety of languages yielded higher results. The research findings highlight important insights for developing more reliable, accurate, and equitable systems in acoustic gender detection.https://doi.org/10.1038/s41598-025-99011-xAutomatic speech recognitionWav2Vec 2.0Language independent gender recognition
spellingShingle	Or Haim Anidjar Roi Yozevitch Transformer-based language-independent gender recognition in noisy audio environments Scientific Reports Automatic speech recognition Wav2Vec 2.0 Language independent gender recognition
title	Transformer-based language-independent gender recognition in noisy audio environments
title_full	Transformer-based language-independent gender recognition in noisy audio environments
title_fullStr	Transformer-based language-independent gender recognition in noisy audio environments
title_full_unstemmed	Transformer-based language-independent gender recognition in noisy audio environments
title_short	Transformer-based language-independent gender recognition in noisy audio environments
title_sort	transformer based language independent gender recognition in noisy audio environments
topic	Automatic speech recognition Wav2Vec 2.0 Language independent gender recognition
url	https://doi.org/10.1038/s41598-025-99011-x
work_keys_str_mv	AT orhaimanidjar transformerbasedlanguageindependentgenderrecognitioninnoisyaudioenvironments AT roiyozevitch transformerbasedlanguageindependentgenderrecognitioninnoisyaudioenvironments

Transformer-based language-independent gender recognition in noisy audio environments

Similar Items