Spectrogram Features-Based Automatic Speaker Identification For Smart Services
Automatic speaker identification (ASI) is an exciting area of research with numerous applications such as surveillance, voice authentication, identity verification, and electronic voice eavesdropping. This study investigates ASI based on features derived from spectrogram images through a convolution...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Taylor & Francis Group
2025-12-01
|
| Series: | Applied Artificial Intelligence |
| Online Access: | https://www.tandfonline.com/doi/10.1080/08839514.2025.2459476 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850140106909810688 |
|---|---|
| author | Rashid Jahangir Mohammed Alreshoodi Fawaz Khaled Alarfaj |
| author_facet | Rashid Jahangir Mohammed Alreshoodi Fawaz Khaled Alarfaj |
| author_sort | Rashid Jahangir |
| collection | DOAJ |
| description | Automatic speaker identification (ASI) is an exciting area of research with numerous applications such as surveillance, voice authentication, identity verification, and electronic voice eavesdropping. This study investigates ASI based on features derived from spectrogram images through a convolution neural network (CNN) with rectangular-shaped kernels. Traditionally, CNN employs square-shaped kernel and max-pooling operations at different layers, a design optimized to handle 2D data. Nevertheless, encoding of information differs slightly to deal with spectrograms. The frequency is displayed along the y-axis, and the x-axis presents the time of the audio. Amplitude is denoted by intensity within the spectrogram image at certain point. The main contributions of this study are 1: To analyze audio signals effectively using spectrograms, this study proposed the utilization of spectrogram features with different sizes and shapes of rectangular kernels to derive distinctive features by improving the recognition accuracy of the speaker identification system. 2. The extracted spectrogram-based features and models are evaluated on the ELSDSR, TSP, and LibriSpeech datasets and achieved the weighted accuracy of 96.0%, 99.2%, and 97.6%, respectively. 3. The proposed rectangular-shaped CNN approach effectively derives suitable features from spectrogram images and outperformed several baseline techniques when performance was assessed on ELSDSR, TSP, and LibriSpeech datasets. |
| format | Article |
| id | doaj-art-2d2b9a0530174bf5ad4a37bf56a30c0e |
| institution | OA Journals |
| issn | 0883-9514 1087-6545 |
| language | English |
| publishDate | 2025-12-01 |
| publisher | Taylor & Francis Group |
| record_format | Article |
| series | Applied Artificial Intelligence |
| spelling | doaj-art-2d2b9a0530174bf5ad4a37bf56a30c0e2025-08-20T02:29:58ZengTaylor & Francis GroupApplied Artificial Intelligence0883-95141087-65452025-12-0139110.1080/08839514.2025.2459476Spectrogram Features-Based Automatic Speaker Identification For Smart ServicesRashid Jahangir0Mohammed Alreshoodi1Fawaz Khaled Alarfaj2Department of Computer Science, COMSATS University Islamabad, Vehari, PakistanUnit of Scientific Research, Applied College, Qassim University, Qassim, Saudi ArabiaDepartment of Management Information Systems (MIS), School of Business, King Faisal University (KFU), Al-Ahsa, Saudi ArabiaAutomatic speaker identification (ASI) is an exciting area of research with numerous applications such as surveillance, voice authentication, identity verification, and electronic voice eavesdropping. This study investigates ASI based on features derived from spectrogram images through a convolution neural network (CNN) with rectangular-shaped kernels. Traditionally, CNN employs square-shaped kernel and max-pooling operations at different layers, a design optimized to handle 2D data. Nevertheless, encoding of information differs slightly to deal with spectrograms. The frequency is displayed along the y-axis, and the x-axis presents the time of the audio. Amplitude is denoted by intensity within the spectrogram image at certain point. The main contributions of this study are 1: To analyze audio signals effectively using spectrograms, this study proposed the utilization of spectrogram features with different sizes and shapes of rectangular kernels to derive distinctive features by improving the recognition accuracy of the speaker identification system. 2. The extracted spectrogram-based features and models are evaluated on the ELSDSR, TSP, and LibriSpeech datasets and achieved the weighted accuracy of 96.0%, 99.2%, and 97.6%, respectively. 3. The proposed rectangular-shaped CNN approach effectively derives suitable features from spectrogram images and outperformed several baseline techniques when performance was assessed on ELSDSR, TSP, and LibriSpeech datasets.https://www.tandfonline.com/doi/10.1080/08839514.2025.2459476 |
| spellingShingle | Rashid Jahangir Mohammed Alreshoodi Fawaz Khaled Alarfaj Spectrogram Features-Based Automatic Speaker Identification For Smart Services Applied Artificial Intelligence |
| title | Spectrogram Features-Based Automatic Speaker Identification For Smart Services |
| title_full | Spectrogram Features-Based Automatic Speaker Identification For Smart Services |
| title_fullStr | Spectrogram Features-Based Automatic Speaker Identification For Smart Services |
| title_full_unstemmed | Spectrogram Features-Based Automatic Speaker Identification For Smart Services |
| title_short | Spectrogram Features-Based Automatic Speaker Identification For Smart Services |
| title_sort | spectrogram features based automatic speaker identification for smart services |
| url | https://www.tandfonline.com/doi/10.1080/08839514.2025.2459476 |
| work_keys_str_mv | AT rashidjahangir spectrogramfeaturesbasedautomaticspeakeridentificationforsmartservices AT mohammedalreshoodi spectrogramfeaturesbasedautomaticspeakeridentificationforsmartservices AT fawazkhaledalarfaj spectrogramfeaturesbasedautomaticspeakeridentificationforsmartservices |