Leveraging Retrieval-Augmented Generation for Swahili Language Conversation Systems
A conversational system is an artificial intelligence application designed to interact with users in natural language, providing accurate and contextually relevant responses. Building such systems for low-resource languages like Swahili presents significant challenges due to the limited availability...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-01-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/15/2/524 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832589242847461376 |
---|---|
author | Edmund V. Ndimbo Qin Luo Gimo C. Fernando Xu Yang Bang Wang |
author_facet | Edmund V. Ndimbo Qin Luo Gimo C. Fernando Xu Yang Bang Wang |
author_sort | Edmund V. Ndimbo |
collection | DOAJ |
description | A conversational system is an artificial intelligence application designed to interact with users in natural language, providing accurate and contextually relevant responses. Building such systems for low-resource languages like Swahili presents significant challenges due to the limited availability of large-scale training datasets. This paper proposes a Retrieval-Augmented Generation-based system to address these challenges and improve the quality of Swahili conversational AI. The system leverages fine-tuning, where models are trained on available Swahili data, combined with external knowledge retrieval to enhance response accuracy and fluency. Four models—mT5, GPT-2, mBART, and GPT-Neo—were evaluated using metrics such as BLEU, METEOR, Query Performance, and inference time. Results show that Retrieval-Augmented Generation consistently outperforms fine-tuning alone, particularly in generating detailed and contextually appropriate responses. Among the tested models, mT5 with Retrieval-Augmented Generation demonstrated the best performance, achieving a BLEU score of 56.88%, a METEOR score of 72.72%, and a Query Performance score of 84.34%, while maintaining relevance and fluency. Although Retrieval-Augmented Generation introduces slightly longer response times, its ability to significantly improve response quality makes it an effective approach for Swahili conversational systems. This study highlights the potential of Retrieval-Augmented Generation to advance conversational AI for Swahili and other low-resource languages, with future work focusing on optimizing efficiency and exploring multilingual applications. |
format | Article |
id | doaj-art-24c59f6d120849d7b602d050c93ce774 |
institution | Kabale University |
issn | 2076-3417 |
language | English |
publishDate | 2025-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj-art-24c59f6d120849d7b602d050c93ce7742025-01-24T13:19:42ZengMDPI AGApplied Sciences2076-34172025-01-0115252410.3390/app15020524Leveraging Retrieval-Augmented Generation for Swahili Language Conversation SystemsEdmund V. Ndimbo0Qin Luo1Gimo C. Fernando2Xu Yang3Bang Wang4School of Information, Electronic and Communications, Huazhong University of Science and Technology, Wuhan 430074, ChinaSchool of Information, Electronic and Communications, Huazhong University of Science and Technology, Wuhan 430074, ChinaSchool of Information, Electronic and Communications, Huazhong University of Science and Technology, Wuhan 430074, ChinaHubei Key Laboratory of Intelligent Yangtze and Hydroelectric Science, China Yangtze Power Co., Ltd., Yichang 443000, ChinaSchool of Information, Electronic and Communications, Huazhong University of Science and Technology, Wuhan 430074, ChinaA conversational system is an artificial intelligence application designed to interact with users in natural language, providing accurate and contextually relevant responses. Building such systems for low-resource languages like Swahili presents significant challenges due to the limited availability of large-scale training datasets. This paper proposes a Retrieval-Augmented Generation-based system to address these challenges and improve the quality of Swahili conversational AI. The system leverages fine-tuning, where models are trained on available Swahili data, combined with external knowledge retrieval to enhance response accuracy and fluency. Four models—mT5, GPT-2, mBART, and GPT-Neo—were evaluated using metrics such as BLEU, METEOR, Query Performance, and inference time. Results show that Retrieval-Augmented Generation consistently outperforms fine-tuning alone, particularly in generating detailed and contextually appropriate responses. Among the tested models, mT5 with Retrieval-Augmented Generation demonstrated the best performance, achieving a BLEU score of 56.88%, a METEOR score of 72.72%, and a Query Performance score of 84.34%, while maintaining relevance and fluency. Although Retrieval-Augmented Generation introduces slightly longer response times, its ability to significantly improve response quality makes it an effective approach for Swahili conversational systems. This study highlights the potential of Retrieval-Augmented Generation to advance conversational AI for Swahili and other low-resource languages, with future work focusing on optimizing efficiency and exploring multilingual applications.https://www.mdpi.com/2076-3417/15/2/524retrieval-augmented generationSwahili NLPlanguage model optimizationlow-resource languagesmodel performance |
spellingShingle | Edmund V. Ndimbo Qin Luo Gimo C. Fernando Xu Yang Bang Wang Leveraging Retrieval-Augmented Generation for Swahili Language Conversation Systems Applied Sciences retrieval-augmented generation Swahili NLP language model optimization low-resource languages model performance |
title | Leveraging Retrieval-Augmented Generation for Swahili Language Conversation Systems |
title_full | Leveraging Retrieval-Augmented Generation for Swahili Language Conversation Systems |
title_fullStr | Leveraging Retrieval-Augmented Generation for Swahili Language Conversation Systems |
title_full_unstemmed | Leveraging Retrieval-Augmented Generation for Swahili Language Conversation Systems |
title_short | Leveraging Retrieval-Augmented Generation for Swahili Language Conversation Systems |
title_sort | leveraging retrieval augmented generation for swahili language conversation systems |
topic | retrieval-augmented generation Swahili NLP language model optimization low-resource languages model performance |
url | https://www.mdpi.com/2076-3417/15/2/524 |
work_keys_str_mv | AT edmundvndimbo leveragingretrievalaugmentedgenerationforswahililanguageconversationsystems AT qinluo leveragingretrievalaugmentedgenerationforswahililanguageconversationsystems AT gimocfernando leveragingretrievalaugmentedgenerationforswahililanguageconversationsystems AT xuyang leveragingretrievalaugmentedgenerationforswahililanguageconversationsystems AT bangwang leveragingretrievalaugmentedgenerationforswahililanguageconversationsystems |