Leveraging Retrieval-Augmented Generation for Swahili Language Conversation Systems

A conversational system is an artificial intelligence application designed to interact with users in natural language, providing accurate and contextually relevant responses. Building such systems for low-resource languages like Swahili presents significant challenges due to the limited availability...

Full description

Saved in:
Bibliographic Details
Main Authors: Edmund V. Ndimbo, Qin Luo, Gimo C. Fernando, Xu Yang, Bang Wang
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/2/524
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832589242847461376
author Edmund V. Ndimbo
Qin Luo
Gimo C. Fernando
Xu Yang
Bang Wang
author_facet Edmund V. Ndimbo
Qin Luo
Gimo C. Fernando
Xu Yang
Bang Wang
author_sort Edmund V. Ndimbo
collection DOAJ
description A conversational system is an artificial intelligence application designed to interact with users in natural language, providing accurate and contextually relevant responses. Building such systems for low-resource languages like Swahili presents significant challenges due to the limited availability of large-scale training datasets. This paper proposes a Retrieval-Augmented Generation-based system to address these challenges and improve the quality of Swahili conversational AI. The system leverages fine-tuning, where models are trained on available Swahili data, combined with external knowledge retrieval to enhance response accuracy and fluency. Four models—mT5, GPT-2, mBART, and GPT-Neo—were evaluated using metrics such as BLEU, METEOR, Query Performance, and inference time. Results show that Retrieval-Augmented Generation consistently outperforms fine-tuning alone, particularly in generating detailed and contextually appropriate responses. Among the tested models, mT5 with Retrieval-Augmented Generation demonstrated the best performance, achieving a BLEU score of 56.88%, a METEOR score of 72.72%, and a Query Performance score of 84.34%, while maintaining relevance and fluency. Although Retrieval-Augmented Generation introduces slightly longer response times, its ability to significantly improve response quality makes it an effective approach for Swahili conversational systems. This study highlights the potential of Retrieval-Augmented Generation to advance conversational AI for Swahili and other low-resource languages, with future work focusing on optimizing efficiency and exploring multilingual applications.
format Article
id doaj-art-24c59f6d120849d7b602d050c93ce774
institution Kabale University
issn 2076-3417
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-24c59f6d120849d7b602d050c93ce7742025-01-24T13:19:42ZengMDPI AGApplied Sciences2076-34172025-01-0115252410.3390/app15020524Leveraging Retrieval-Augmented Generation for Swahili Language Conversation SystemsEdmund V. Ndimbo0Qin Luo1Gimo C. Fernando2Xu Yang3Bang Wang4School of Information, Electronic and Communications, Huazhong University of Science and Technology, Wuhan 430074, ChinaSchool of Information, Electronic and Communications, Huazhong University of Science and Technology, Wuhan 430074, ChinaSchool of Information, Electronic and Communications, Huazhong University of Science and Technology, Wuhan 430074, ChinaHubei Key Laboratory of Intelligent Yangtze and Hydroelectric Science, China Yangtze Power Co., Ltd., Yichang 443000, ChinaSchool of Information, Electronic and Communications, Huazhong University of Science and Technology, Wuhan 430074, ChinaA conversational system is an artificial intelligence application designed to interact with users in natural language, providing accurate and contextually relevant responses. Building such systems for low-resource languages like Swahili presents significant challenges due to the limited availability of large-scale training datasets. This paper proposes a Retrieval-Augmented Generation-based system to address these challenges and improve the quality of Swahili conversational AI. The system leverages fine-tuning, where models are trained on available Swahili data, combined with external knowledge retrieval to enhance response accuracy and fluency. Four models—mT5, GPT-2, mBART, and GPT-Neo—were evaluated using metrics such as BLEU, METEOR, Query Performance, and inference time. Results show that Retrieval-Augmented Generation consistently outperforms fine-tuning alone, particularly in generating detailed and contextually appropriate responses. Among the tested models, mT5 with Retrieval-Augmented Generation demonstrated the best performance, achieving a BLEU score of 56.88%, a METEOR score of 72.72%, and a Query Performance score of 84.34%, while maintaining relevance and fluency. Although Retrieval-Augmented Generation introduces slightly longer response times, its ability to significantly improve response quality makes it an effective approach for Swahili conversational systems. This study highlights the potential of Retrieval-Augmented Generation to advance conversational AI for Swahili and other low-resource languages, with future work focusing on optimizing efficiency and exploring multilingual applications.https://www.mdpi.com/2076-3417/15/2/524retrieval-augmented generationSwahili NLPlanguage model optimizationlow-resource languagesmodel performance
spellingShingle Edmund V. Ndimbo
Qin Luo
Gimo C. Fernando
Xu Yang
Bang Wang
Leveraging Retrieval-Augmented Generation for Swahili Language Conversation Systems
Applied Sciences
retrieval-augmented generation
Swahili NLP
language model optimization
low-resource languages
model performance
title Leveraging Retrieval-Augmented Generation for Swahili Language Conversation Systems
title_full Leveraging Retrieval-Augmented Generation for Swahili Language Conversation Systems
title_fullStr Leveraging Retrieval-Augmented Generation for Swahili Language Conversation Systems
title_full_unstemmed Leveraging Retrieval-Augmented Generation for Swahili Language Conversation Systems
title_short Leveraging Retrieval-Augmented Generation for Swahili Language Conversation Systems
title_sort leveraging retrieval augmented generation for swahili language conversation systems
topic retrieval-augmented generation
Swahili NLP
language model optimization
low-resource languages
model performance
url https://www.mdpi.com/2076-3417/15/2/524
work_keys_str_mv AT edmundvndimbo leveragingretrievalaugmentedgenerationforswahililanguageconversationsystems
AT qinluo leveragingretrievalaugmentedgenerationforswahililanguageconversationsystems
AT gimocfernando leveragingretrievalaugmentedgenerationforswahililanguageconversationsystems
AT xuyang leveragingretrievalaugmentedgenerationforswahililanguageconversationsystems
AT bangwang leveragingretrievalaugmentedgenerationforswahililanguageconversationsystems