Comparative performance analysis of end-to-end ASR models on Indo-Aryan and Dravidian languages within India’s linguistic landscape
Abstract India’s linguistic diversity encompasses multiple language families, including the Indo-Aryan and Dravidian, which represent distinct phonological and morphological characteristics. This study aims to evaluate and compare the performance of end-to-end automatic speech recognition (ASR) syst...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
SpringerOpen
2025-02-01
|
| Series: | EURASIP Journal on Audio, Speech, and Music Processing |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s13636-025-00395-5 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract India’s linguistic diversity encompasses multiple language families, including the Indo-Aryan and Dravidian, which represent distinct phonological and morphological characteristics. This study aims to evaluate and compare the performance of end-to-end automatic speech recognition (ASR) systems for three Indo-Aryan languages—Marathi, Odia, and Gujarati—and three Dravidian languages—Tamil, Telugu, and Malayalam. Using four transformer-based pre-trained models—Wav2Vec2.0-base, XLSR-53, W2V2-BERT, and Whisper small—the analysis explores their adaptability to these languages’ linguistic features, with word error rate (WER) and character error rate (CER) serving as evaluation metrics. Results indicate that W2V2-BERT and XLSR-53 outperform other models, achieving lower WER and CER, especially for Indo-Aryan languages. However, higher error rates for Dravidian languages highlight challenges such as complex phonology and agglutinative morphology. This work provides a comparative insight into the strengths and limitations of pre-trained ASR models across India’s diverse linguistic landscape and underscores the need for language-specific adaptations to improve ASR accuracy for underrepresented languages. |
|---|---|
| ISSN: | 1687-4722 |