Performance Analysis: AI-based VIST Audio Player by Microsoft Speech API
Speech recognition has gained much attention from researchers for almost last two decades. Isolated words, connected words, and continuous speech are the main focused areas of speech recognition. Researchers have adopted many techniques to solve speech recognition challenges under the umbrella of Ar...
Saved in:
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Sulaimani Polytechnic University
2021-07-01
|
Series: | Kurdistan Journal of Applied Research |
Subjects: | |
Online Access: | https://kjar.spu.edu.iq/index.php/kjar/article/view/607 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823861360418619392 |
---|---|
author | Ribwar Bakhtyar Ibrahim |
author_facet | Ribwar Bakhtyar Ibrahim |
author_sort | Ribwar Bakhtyar Ibrahim |
collection | DOAJ |
description | Speech recognition has gained much attention from researchers for almost last two decades. Isolated words, connected words, and continuous speech are the main focused areas of speech recognition. Researchers have adopted many techniques to solve speech recognition challenges under the umbrella of Artificial Intelligence (AI), Pattern Recognition and Acoustic Phonetic approaches. Variation in pronunciation of words, individual accents, unwanted ambient noise, speech context, and quality of input devices are some of these challenges in speech recognition. Many Application Programming Interface (API)s are developed to overcome the issue of accuracy in a speech-to-text conversion such as Microsoft Speech API and Google Speech API. In this paper, the performance of Microsoft Speech API is analyzed against other Speech APIs mentioned in the literature on the special dataset (without background noise) prepared. A Voice Interactive Speech to Text (VIST) audio player was developed for the analysis of Microsoft Speech API. VIST audio player creates runtime subtitles of the audio files running on it; the player is responsible for speech to text conversion in real-time. Microsoft Speech API was incorporated in the application to validate and make the performance of API measurable. The experiments proved the Microsoft Speech API more accurate with respect to other APIs in the context of the prepared dataset for the VIST audio player. The accuracy rate according to the precision-recall is 96% for Microsoft Speech API, which is better than previous ones as mentioned in the literature.
|
format | Article |
id | doaj-art-ae6c0a1c88ac40d09a6bc4e3ea6c57aa |
institution | Kabale University |
issn | 2411-7684 2411-7706 |
language | English |
publishDate | 2021-07-01 |
publisher | Sulaimani Polytechnic University |
record_format | Article |
series | Kurdistan Journal of Applied Research |
spelling | doaj-art-ae6c0a1c88ac40d09a6bc4e3ea6c57aa2025-02-09T20:59:52ZengSulaimani Polytechnic UniversityKurdistan Journal of Applied Research2411-76842411-77062021-07-016110.24017/science.2021.1.3607Performance Analysis: AI-based VIST Audio Player by Microsoft Speech APIRibwar Bakhtyar Ibrahim0Database Technology Department, College of Informatics, Sulaimani Polytechnic University, Sulaimani, IraqSpeech recognition has gained much attention from researchers for almost last two decades. Isolated words, connected words, and continuous speech are the main focused areas of speech recognition. Researchers have adopted many techniques to solve speech recognition challenges under the umbrella of Artificial Intelligence (AI), Pattern Recognition and Acoustic Phonetic approaches. Variation in pronunciation of words, individual accents, unwanted ambient noise, speech context, and quality of input devices are some of these challenges in speech recognition. Many Application Programming Interface (API)s are developed to overcome the issue of accuracy in a speech-to-text conversion such as Microsoft Speech API and Google Speech API. In this paper, the performance of Microsoft Speech API is analyzed against other Speech APIs mentioned in the literature on the special dataset (without background noise) prepared. A Voice Interactive Speech to Text (VIST) audio player was developed for the analysis of Microsoft Speech API. VIST audio player creates runtime subtitles of the audio files running on it; the player is responsible for speech to text conversion in real-time. Microsoft Speech API was incorporated in the application to validate and make the performance of API measurable. The experiments proved the Microsoft Speech API more accurate with respect to other APIs in the context of the prepared dataset for the VIST audio player. The accuracy rate according to the precision-recall is 96% for Microsoft Speech API, which is better than previous ones as mentioned in the literature. https://kjar.spu.edu.iq/index.php/kjar/article/view/607Speech Recognition, Microsoft Speech API, Subtitles, Speech to Text, speech-to-text recognition, Artificial Intelligence. A Voice Interactive Speech to Text (VIST).Microsoft Speech API. |
spellingShingle | Ribwar Bakhtyar Ibrahim Performance Analysis: AI-based VIST Audio Player by Microsoft Speech API Kurdistan Journal of Applied Research Speech Recognition, Microsoft Speech API, Subtitles, Speech to Text, speech-to-text recognition, Artificial Intelligence. A Voice Interactive Speech to Text (VIST).Microsoft Speech API. |
title | Performance Analysis: AI-based VIST Audio Player by Microsoft Speech API |
title_full | Performance Analysis: AI-based VIST Audio Player by Microsoft Speech API |
title_fullStr | Performance Analysis: AI-based VIST Audio Player by Microsoft Speech API |
title_full_unstemmed | Performance Analysis: AI-based VIST Audio Player by Microsoft Speech API |
title_short | Performance Analysis: AI-based VIST Audio Player by Microsoft Speech API |
title_sort | performance analysis ai based vist audio player by microsoft speech api |
topic | Speech Recognition, Microsoft Speech API, Subtitles, Speech to Text, speech-to-text recognition, Artificial Intelligence. A Voice Interactive Speech to Text (VIST).Microsoft Speech API. |
url | https://kjar.spu.edu.iq/index.php/kjar/article/view/607 |
work_keys_str_mv | AT ribwarbakhtyaribrahim performanceanalysisaibasedvistaudioplayerbymicrosoftspeechapi |