Transformer-based language-independent gender recognition in noisy audio environments

Abstract This study proposes an independent method for identifying the gender of the speaker from an audio clip in a noisy environment. In this paper are performed two different processes on audio clips: one as a Mel-Spectrogram and the other using the Wav2Vec2 acoustic model emission, examining the...

Full description

Saved in:
Bibliographic Details
Main Authors: Or Haim Anidjar, Roi Yozevitch
Format: Article
Language:English
Published: Nature Portfolio 2025-04-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-99011-x
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850139190655713280
author Or Haim Anidjar
Roi Yozevitch
author_facet Or Haim Anidjar
Roi Yozevitch
author_sort Or Haim Anidjar
collection DOAJ
description Abstract This study proposes an independent method for identifying the gender of the speaker from an audio clip in a noisy environment. In this paper are performed two different processes on audio clips: one as a Mel-Spectrogram and the other using the Wav2Vec2 acoustic model emission, examining the advantages and disadvantages of each method. A series of experiments are presented across five different languages-English, Arabic, Spanish, French, and Russian-containing male and female audio clips. An analysis of these languages is carried out, examining their independent characteristics against a five-language model. The goal of this study is to distinguish the gender of the speaker based on an audio clip, regardless of language or complex background noise such as nightclubs or stadiums. Additionally, this research addresses the critical issue of gender bias in voice recognition systems. It highlights the challenges posed by the over-representation of male voices in training datasets and the subsequent impact on the accuracy and fairness of gender classification, particularly for female voices. The approach in this paper involves maintaining an equivalent quantity of audio clips for both male and female voices to ensure balance and mitigate this bias. The experimental results indicate that the performance evaluation of the traditional spectrogram method achieved better results compared to the Wav2Vec transformer method. For the Russian language, the spectrogram method achieved an accuracy of 99%, while the Wav2Vec transformer1 method achieved only 89% accuracy. Tests in various environments-noisy and silent-show that a model trained in both conditions exhibited better accuracy. The results also indicate that a model trained on data from a wide variety of languages yielded higher results. The research findings highlight important insights for developing more reliable, accurate, and equitable systems in acoustic gender detection.
format Article
id doaj-art-150be3e0632547ed8f6c8a6ccbbf3b7f
institution OA Journals
issn 2045-2322
language English
publishDate 2025-04-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-150be3e0632547ed8f6c8a6ccbbf3b7f2025-08-20T02:30:23ZengNature PortfolioScientific Reports2045-23222025-04-0115111610.1038/s41598-025-99011-xTransformer-based language-independent gender recognition in noisy audio environmentsOr Haim Anidjar0Roi Yozevitch1Faculty of Computer Science, College of ManagementDepartment of Computer and Software Engineering, Ariel UniversityAbstract This study proposes an independent method for identifying the gender of the speaker from an audio clip in a noisy environment. In this paper are performed two different processes on audio clips: one as a Mel-Spectrogram and the other using the Wav2Vec2 acoustic model emission, examining the advantages and disadvantages of each method. A series of experiments are presented across five different languages-English, Arabic, Spanish, French, and Russian-containing male and female audio clips. An analysis of these languages is carried out, examining their independent characteristics against a five-language model. The goal of this study is to distinguish the gender of the speaker based on an audio clip, regardless of language or complex background noise such as nightclubs or stadiums. Additionally, this research addresses the critical issue of gender bias in voice recognition systems. It highlights the challenges posed by the over-representation of male voices in training datasets and the subsequent impact on the accuracy and fairness of gender classification, particularly for female voices. The approach in this paper involves maintaining an equivalent quantity of audio clips for both male and female voices to ensure balance and mitigate this bias. The experimental results indicate that the performance evaluation of the traditional spectrogram method achieved better results compared to the Wav2Vec transformer method. For the Russian language, the spectrogram method achieved an accuracy of 99%, while the Wav2Vec transformer1 method achieved only 89% accuracy. Tests in various environments-noisy and silent-show that a model trained in both conditions exhibited better accuracy. The results also indicate that a model trained on data from a wide variety of languages yielded higher results. The research findings highlight important insights for developing more reliable, accurate, and equitable systems in acoustic gender detection.https://doi.org/10.1038/s41598-025-99011-xAutomatic speech recognitionWav2Vec 2.0Language independent gender recognition
spellingShingle Or Haim Anidjar
Roi Yozevitch
Transformer-based language-independent gender recognition in noisy audio environments
Scientific Reports
Automatic speech recognition
Wav2Vec 2.0
Language independent gender recognition
title Transformer-based language-independent gender recognition in noisy audio environments
title_full Transformer-based language-independent gender recognition in noisy audio environments
title_fullStr Transformer-based language-independent gender recognition in noisy audio environments
title_full_unstemmed Transformer-based language-independent gender recognition in noisy audio environments
title_short Transformer-based language-independent gender recognition in noisy audio environments
title_sort transformer based language independent gender recognition in noisy audio environments
topic Automatic speech recognition
Wav2Vec 2.0
Language independent gender recognition
url https://doi.org/10.1038/s41598-025-99011-x
work_keys_str_mv AT orhaimanidjar transformerbasedlanguageindependentgenderrecognitioninnoisyaudioenvironments
AT roiyozevitch transformerbasedlanguageindependentgenderrecognitioninnoisyaudioenvironments