Performance Analysis: AI-based VIST Audio Player by Microsoft Speech API

Speech recognition has gained much attention from researchers for almost last two decades. Isolated words, connected words, and continuous speech are the main focused areas of speech recognition. Researchers have adopted many techniques to solve speech recognition challenges under the umbrella of Ar...

Full description

Saved in:
Bibliographic Details
Main Author: Ribwar Bakhtyar Ibrahim
Format: Article
Language:English
Published: Sulaimani Polytechnic University 2021-07-01
Series:Kurdistan Journal of Applied Research
Subjects:
Online Access:https://kjar.spu.edu.iq/index.php/kjar/article/view/607
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823861360418619392
author Ribwar Bakhtyar Ibrahim
author_facet Ribwar Bakhtyar Ibrahim
author_sort Ribwar Bakhtyar Ibrahim
collection DOAJ
description Speech recognition has gained much attention from researchers for almost last two decades. Isolated words, connected words, and continuous speech are the main focused areas of speech recognition. Researchers have adopted many techniques to solve speech recognition challenges under the umbrella of Artificial Intelligence (AI), Pattern Recognition and Acoustic Phonetic approaches. Variation in pronunciation of words, individual accents, unwanted ambient noise, speech context, and quality of input devices are some of these challenges in speech recognition. Many Application Programming Interface (API)s are developed to overcome the issue of accuracy in a speech-to-text conversion such as Microsoft Speech API and Google Speech API. In this paper, the performance of Microsoft Speech API is analyzed against other Speech APIs mentioned in the literature on the special dataset (without background noise) prepared. A Voice Interactive Speech to Text (VIST) audio player was developed for the analysis of Microsoft Speech API. VIST audio player creates runtime subtitles of the audio files running on it; the player is responsible for speech to text conversion in real-time. Microsoft Speech API was incorporated in the application to validate and make the performance of API measurable. The experiments proved the Microsoft Speech API more accurate with respect to other APIs in the context of the prepared dataset for the VIST audio player. The accuracy rate according to the precision-recall is 96% for Microsoft Speech API, which is better than previous ones as mentioned in the literature.
format Article
id doaj-art-ae6c0a1c88ac40d09a6bc4e3ea6c57aa
institution Kabale University
issn 2411-7684
2411-7706
language English
publishDate 2021-07-01
publisher Sulaimani Polytechnic University
record_format Article
series Kurdistan Journal of Applied Research
spelling doaj-art-ae6c0a1c88ac40d09a6bc4e3ea6c57aa2025-02-09T20:59:52ZengSulaimani Polytechnic UniversityKurdistan Journal of Applied Research2411-76842411-77062021-07-016110.24017/science.2021.1.3607Performance Analysis: AI-based VIST Audio Player by Microsoft Speech APIRibwar Bakhtyar Ibrahim0Database Technology Department, College of Informatics, Sulaimani Polytechnic University, Sulaimani, IraqSpeech recognition has gained much attention from researchers for almost last two decades. Isolated words, connected words, and continuous speech are the main focused areas of speech recognition. Researchers have adopted many techniques to solve speech recognition challenges under the umbrella of Artificial Intelligence (AI), Pattern Recognition and Acoustic Phonetic approaches. Variation in pronunciation of words, individual accents, unwanted ambient noise, speech context, and quality of input devices are some of these challenges in speech recognition. Many Application Programming Interface (API)s are developed to overcome the issue of accuracy in a speech-to-text conversion such as Microsoft Speech API and Google Speech API. In this paper, the performance of Microsoft Speech API is analyzed against other Speech APIs mentioned in the literature on the special dataset (without background noise) prepared. A Voice Interactive Speech to Text (VIST) audio player was developed for the analysis of Microsoft Speech API. VIST audio player creates runtime subtitles of the audio files running on it; the player is responsible for speech to text conversion in real-time. Microsoft Speech API was incorporated in the application to validate and make the performance of API measurable. The experiments proved the Microsoft Speech API more accurate with respect to other APIs in the context of the prepared dataset for the VIST audio player. The accuracy rate according to the precision-recall is 96% for Microsoft Speech API, which is better than previous ones as mentioned in the literature. https://kjar.spu.edu.iq/index.php/kjar/article/view/607Speech Recognition, Microsoft Speech API, Subtitles, Speech to Text, speech-to-text recognition, Artificial Intelligence. A Voice Interactive Speech to Text (VIST).Microsoft Speech API.
spellingShingle Ribwar Bakhtyar Ibrahim
Performance Analysis: AI-based VIST Audio Player by Microsoft Speech API
Kurdistan Journal of Applied Research
Speech Recognition, Microsoft Speech API, Subtitles, Speech to Text, speech-to-text recognition, Artificial Intelligence. A Voice Interactive Speech to Text (VIST).Microsoft Speech API.
title Performance Analysis: AI-based VIST Audio Player by Microsoft Speech API
title_full Performance Analysis: AI-based VIST Audio Player by Microsoft Speech API
title_fullStr Performance Analysis: AI-based VIST Audio Player by Microsoft Speech API
title_full_unstemmed Performance Analysis: AI-based VIST Audio Player by Microsoft Speech API
title_short Performance Analysis: AI-based VIST Audio Player by Microsoft Speech API
title_sort performance analysis ai based vist audio player by microsoft speech api
topic Speech Recognition, Microsoft Speech API, Subtitles, Speech to Text, speech-to-text recognition, Artificial Intelligence. A Voice Interactive Speech to Text (VIST).Microsoft Speech API.
url https://kjar.spu.edu.iq/index.php/kjar/article/view/607
work_keys_str_mv AT ribwarbakhtyaribrahim performanceanalysisaibasedvistaudioplayerbymicrosoftspeechapi