Comparative performance analysis of end-to-end ASR models on Indo-Aryan and Dravidian languages within India’s linguistic landscape
Abstract India’s linguistic diversity encompasses multiple language families, including the Indo-Aryan and Dravidian, which represent distinct phonological and morphological characteristics. This study aims to evaluate and compare the performance of end-to-end automatic speech recognition (ASR) syst...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
SpringerOpen
2025-02-01
|
| Series: | EURASIP Journal on Audio, Speech, and Music Processing |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s13636-025-00395-5 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850029731017129984 |
|---|---|
| author | Palash Jain Anirban Bhowmick |
| author_facet | Palash Jain Anirban Bhowmick |
| author_sort | Palash Jain |
| collection | DOAJ |
| description | Abstract India’s linguistic diversity encompasses multiple language families, including the Indo-Aryan and Dravidian, which represent distinct phonological and morphological characteristics. This study aims to evaluate and compare the performance of end-to-end automatic speech recognition (ASR) systems for three Indo-Aryan languages—Marathi, Odia, and Gujarati—and three Dravidian languages—Tamil, Telugu, and Malayalam. Using four transformer-based pre-trained models—Wav2Vec2.0-base, XLSR-53, W2V2-BERT, and Whisper small—the analysis explores their adaptability to these languages’ linguistic features, with word error rate (WER) and character error rate (CER) serving as evaluation metrics. Results indicate that W2V2-BERT and XLSR-53 outperform other models, achieving lower WER and CER, especially for Indo-Aryan languages. However, higher error rates for Dravidian languages highlight challenges such as complex phonology and agglutinative morphology. This work provides a comparative insight into the strengths and limitations of pre-trained ASR models across India’s diverse linguistic landscape and underscores the need for language-specific adaptations to improve ASR accuracy for underrepresented languages. |
| format | Article |
| id | doaj-art-afb54e0b19ca459e979fa5f88a6ad71f |
| institution | DOAJ |
| issn | 1687-4722 |
| language | English |
| publishDate | 2025-02-01 |
| publisher | SpringerOpen |
| record_format | Article |
| series | EURASIP Journal on Audio, Speech, and Music Processing |
| spelling | doaj-art-afb54e0b19ca459e979fa5f88a6ad71f2025-08-20T02:59:28ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47222025-02-012025111610.1186/s13636-025-00395-5Comparative performance analysis of end-to-end ASR models on Indo-Aryan and Dravidian languages within India’s linguistic landscapePalash Jain0Anirban Bhowmick1School of Electrical and Electronics Engineering, VIT Bhopal UniversitySchool of Electrical and Electronics Engineering, VIT Bhopal UniversityAbstract India’s linguistic diversity encompasses multiple language families, including the Indo-Aryan and Dravidian, which represent distinct phonological and morphological characteristics. This study aims to evaluate and compare the performance of end-to-end automatic speech recognition (ASR) systems for three Indo-Aryan languages—Marathi, Odia, and Gujarati—and three Dravidian languages—Tamil, Telugu, and Malayalam. Using four transformer-based pre-trained models—Wav2Vec2.0-base, XLSR-53, W2V2-BERT, and Whisper small—the analysis explores their adaptability to these languages’ linguistic features, with word error rate (WER) and character error rate (CER) serving as evaluation metrics. Results indicate that W2V2-BERT and XLSR-53 outperform other models, achieving lower WER and CER, especially for Indo-Aryan languages. However, higher error rates for Dravidian languages highlight challenges such as complex phonology and agglutinative morphology. This work provides a comparative insight into the strengths and limitations of pre-trained ASR models across India’s diverse linguistic landscape and underscores the need for language-specific adaptations to improve ASR accuracy for underrepresented languages.https://doi.org/10.1186/s13636-025-00395-5End-to-end ASRWav2Vec2.0WhisperXLSR-53W2V2-BERT |
| spellingShingle | Palash Jain Anirban Bhowmick Comparative performance analysis of end-to-end ASR models on Indo-Aryan and Dravidian languages within India’s linguistic landscape EURASIP Journal on Audio, Speech, and Music Processing End-to-end ASR Wav2Vec2.0 Whisper XLSR-53 W2V2-BERT |
| title | Comparative performance analysis of end-to-end ASR models on Indo-Aryan and Dravidian languages within India’s linguistic landscape |
| title_full | Comparative performance analysis of end-to-end ASR models on Indo-Aryan and Dravidian languages within India’s linguistic landscape |
| title_fullStr | Comparative performance analysis of end-to-end ASR models on Indo-Aryan and Dravidian languages within India’s linguistic landscape |
| title_full_unstemmed | Comparative performance analysis of end-to-end ASR models on Indo-Aryan and Dravidian languages within India’s linguistic landscape |
| title_short | Comparative performance analysis of end-to-end ASR models on Indo-Aryan and Dravidian languages within India’s linguistic landscape |
| title_sort | comparative performance analysis of end to end asr models on indo aryan and dravidian languages within india s linguistic landscape |
| topic | End-to-end ASR Wav2Vec2.0 Whisper XLSR-53 W2V2-BERT |
| url | https://doi.org/10.1186/s13636-025-00395-5 |
| work_keys_str_mv | AT palashjain comparativeperformanceanalysisofendtoendasrmodelsonindoaryananddravidianlanguageswithinindiaslinguisticlandscape AT anirbanbhowmick comparativeperformanceanalysisofendtoendasrmodelsonindoaryananddravidianlanguageswithinindiaslinguisticlandscape |