Comparative performance analysis of end-to-end ASR models on Indo-Aryan and Dravidian languages within India’s linguistic landscape

Abstract India’s linguistic diversity encompasses multiple language families, including the Indo-Aryan and Dravidian, which represent distinct phonological and morphological characteristics. This study aims to evaluate and compare the performance of end-to-end automatic speech recognition (ASR) syst...

Full description

Saved in:
Bibliographic Details
Main Authors: Palash Jain, Anirban Bhowmick
Format: Article
Language:English
Published: SpringerOpen 2025-02-01
Series:EURASIP Journal on Audio, Speech, and Music Processing
Subjects:
Online Access:https://doi.org/10.1186/s13636-025-00395-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850029731017129984
author Palash Jain
Anirban Bhowmick
author_facet Palash Jain
Anirban Bhowmick
author_sort Palash Jain
collection DOAJ
description Abstract India’s linguistic diversity encompasses multiple language families, including the Indo-Aryan and Dravidian, which represent distinct phonological and morphological characteristics. This study aims to evaluate and compare the performance of end-to-end automatic speech recognition (ASR) systems for three Indo-Aryan languages—Marathi, Odia, and Gujarati—and three Dravidian languages—Tamil, Telugu, and Malayalam. Using four transformer-based pre-trained models—Wav2Vec2.0-base, XLSR-53, W2V2-BERT, and Whisper small—the analysis explores their adaptability to these languages’ linguistic features, with word error rate (WER) and character error rate (CER) serving as evaluation metrics. Results indicate that W2V2-BERT and XLSR-53 outperform other models, achieving lower WER and CER, especially for Indo-Aryan languages. However, higher error rates for Dravidian languages highlight challenges such as complex phonology and agglutinative morphology. This work provides a comparative insight into the strengths and limitations of pre-trained ASR models across India’s diverse linguistic landscape and underscores the need for language-specific adaptations to improve ASR accuracy for underrepresented languages.
format Article
id doaj-art-afb54e0b19ca459e979fa5f88a6ad71f
institution DOAJ
issn 1687-4722
language English
publishDate 2025-02-01
publisher SpringerOpen
record_format Article
series EURASIP Journal on Audio, Speech, and Music Processing
spelling doaj-art-afb54e0b19ca459e979fa5f88a6ad71f2025-08-20T02:59:28ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47222025-02-012025111610.1186/s13636-025-00395-5Comparative performance analysis of end-to-end ASR models on Indo-Aryan and Dravidian languages within India’s linguistic landscapePalash Jain0Anirban Bhowmick1School of Electrical and Electronics Engineering, VIT Bhopal UniversitySchool of Electrical and Electronics Engineering, VIT Bhopal UniversityAbstract India’s linguistic diversity encompasses multiple language families, including the Indo-Aryan and Dravidian, which represent distinct phonological and morphological characteristics. This study aims to evaluate and compare the performance of end-to-end automatic speech recognition (ASR) systems for three Indo-Aryan languages—Marathi, Odia, and Gujarati—and three Dravidian languages—Tamil, Telugu, and Malayalam. Using four transformer-based pre-trained models—Wav2Vec2.0-base, XLSR-53, W2V2-BERT, and Whisper small—the analysis explores their adaptability to these languages’ linguistic features, with word error rate (WER) and character error rate (CER) serving as evaluation metrics. Results indicate that W2V2-BERT and XLSR-53 outperform other models, achieving lower WER and CER, especially for Indo-Aryan languages. However, higher error rates for Dravidian languages highlight challenges such as complex phonology and agglutinative morphology. This work provides a comparative insight into the strengths and limitations of pre-trained ASR models across India’s diverse linguistic landscape and underscores the need for language-specific adaptations to improve ASR accuracy for underrepresented languages.https://doi.org/10.1186/s13636-025-00395-5End-to-end ASRWav2Vec2.0WhisperXLSR-53W2V2-BERT
spellingShingle Palash Jain
Anirban Bhowmick
Comparative performance analysis of end-to-end ASR models on Indo-Aryan and Dravidian languages within India’s linguistic landscape
EURASIP Journal on Audio, Speech, and Music Processing
End-to-end ASR
Wav2Vec2.0
Whisper
XLSR-53
W2V2-BERT
title Comparative performance analysis of end-to-end ASR models on Indo-Aryan and Dravidian languages within India’s linguistic landscape
title_full Comparative performance analysis of end-to-end ASR models on Indo-Aryan and Dravidian languages within India’s linguistic landscape
title_fullStr Comparative performance analysis of end-to-end ASR models on Indo-Aryan and Dravidian languages within India’s linguistic landscape
title_full_unstemmed Comparative performance analysis of end-to-end ASR models on Indo-Aryan and Dravidian languages within India’s linguistic landscape
title_short Comparative performance analysis of end-to-end ASR models on Indo-Aryan and Dravidian languages within India’s linguistic landscape
title_sort comparative performance analysis of end to end asr models on indo aryan and dravidian languages within india s linguistic landscape
topic End-to-end ASR
Wav2Vec2.0
Whisper
XLSR-53
W2V2-BERT
url https://doi.org/10.1186/s13636-025-00395-5
work_keys_str_mv AT palashjain comparativeperformanceanalysisofendtoendasrmodelsonindoaryananddravidianlanguageswithinindiaslinguisticlandscape
AT anirbanbhowmick comparativeperformanceanalysisofendtoendasrmodelsonindoaryananddravidianlanguageswithinindiaslinguisticlandscape