Transformer-based language-independent gender recognition in noisy audio environments
Abstract This study proposes an independent method for identifying the gender of the speaker from an audio clip in a noisy environment. In this paper are performed two different processes on audio clips: one as a Mel-Spectrogram and the other using the Wav2Vec2 acoustic model emission, examining the...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-04-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-99011-x |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850139190655713280 |
|---|---|
| author | Or Haim Anidjar Roi Yozevitch |
| author_facet | Or Haim Anidjar Roi Yozevitch |
| author_sort | Or Haim Anidjar |
| collection | DOAJ |
| description | Abstract This study proposes an independent method for identifying the gender of the speaker from an audio clip in a noisy environment. In this paper are performed two different processes on audio clips: one as a Mel-Spectrogram and the other using the Wav2Vec2 acoustic model emission, examining the advantages and disadvantages of each method. A series of experiments are presented across five different languages-English, Arabic, Spanish, French, and Russian-containing male and female audio clips. An analysis of these languages is carried out, examining their independent characteristics against a five-language model. The goal of this study is to distinguish the gender of the speaker based on an audio clip, regardless of language or complex background noise such as nightclubs or stadiums. Additionally, this research addresses the critical issue of gender bias in voice recognition systems. It highlights the challenges posed by the over-representation of male voices in training datasets and the subsequent impact on the accuracy and fairness of gender classification, particularly for female voices. The approach in this paper involves maintaining an equivalent quantity of audio clips for both male and female voices to ensure balance and mitigate this bias. The experimental results indicate that the performance evaluation of the traditional spectrogram method achieved better results compared to the Wav2Vec transformer method. For the Russian language, the spectrogram method achieved an accuracy of 99%, while the Wav2Vec transformer1 method achieved only 89% accuracy. Tests in various environments-noisy and silent-show that a model trained in both conditions exhibited better accuracy. The results also indicate that a model trained on data from a wide variety of languages yielded higher results. The research findings highlight important insights for developing more reliable, accurate, and equitable systems in acoustic gender detection. |
| format | Article |
| id | doaj-art-150be3e0632547ed8f6c8a6ccbbf3b7f |
| institution | OA Journals |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-150be3e0632547ed8f6c8a6ccbbf3b7f2025-08-20T02:30:23ZengNature PortfolioScientific Reports2045-23222025-04-0115111610.1038/s41598-025-99011-xTransformer-based language-independent gender recognition in noisy audio environmentsOr Haim Anidjar0Roi Yozevitch1Faculty of Computer Science, College of ManagementDepartment of Computer and Software Engineering, Ariel UniversityAbstract This study proposes an independent method for identifying the gender of the speaker from an audio clip in a noisy environment. In this paper are performed two different processes on audio clips: one as a Mel-Spectrogram and the other using the Wav2Vec2 acoustic model emission, examining the advantages and disadvantages of each method. A series of experiments are presented across five different languages-English, Arabic, Spanish, French, and Russian-containing male and female audio clips. An analysis of these languages is carried out, examining their independent characteristics against a five-language model. The goal of this study is to distinguish the gender of the speaker based on an audio clip, regardless of language or complex background noise such as nightclubs or stadiums. Additionally, this research addresses the critical issue of gender bias in voice recognition systems. It highlights the challenges posed by the over-representation of male voices in training datasets and the subsequent impact on the accuracy and fairness of gender classification, particularly for female voices. The approach in this paper involves maintaining an equivalent quantity of audio clips for both male and female voices to ensure balance and mitigate this bias. The experimental results indicate that the performance evaluation of the traditional spectrogram method achieved better results compared to the Wav2Vec transformer method. For the Russian language, the spectrogram method achieved an accuracy of 99%, while the Wav2Vec transformer1 method achieved only 89% accuracy. Tests in various environments-noisy and silent-show that a model trained in both conditions exhibited better accuracy. The results also indicate that a model trained on data from a wide variety of languages yielded higher results. The research findings highlight important insights for developing more reliable, accurate, and equitable systems in acoustic gender detection.https://doi.org/10.1038/s41598-025-99011-xAutomatic speech recognitionWav2Vec 2.0Language independent gender recognition |
| spellingShingle | Or Haim Anidjar Roi Yozevitch Transformer-based language-independent gender recognition in noisy audio environments Scientific Reports Automatic speech recognition Wav2Vec 2.0 Language independent gender recognition |
| title | Transformer-based language-independent gender recognition in noisy audio environments |
| title_full | Transformer-based language-independent gender recognition in noisy audio environments |
| title_fullStr | Transformer-based language-independent gender recognition in noisy audio environments |
| title_full_unstemmed | Transformer-based language-independent gender recognition in noisy audio environments |
| title_short | Transformer-based language-independent gender recognition in noisy audio environments |
| title_sort | transformer based language independent gender recognition in noisy audio environments |
| topic | Automatic speech recognition Wav2Vec 2.0 Language independent gender recognition |
| url | https://doi.org/10.1038/s41598-025-99011-x |
| work_keys_str_mv | AT orhaimanidjar transformerbasedlanguageindependentgenderrecognitioninnoisyaudioenvironments AT roiyozevitch transformerbasedlanguageindependentgenderrecognitioninnoisyaudioenvironments |