Leveraging Retrieval-Augmented Generation for Swahili Language Conversation Systems

A conversational system is an artificial intelligence application designed to interact with users in natural language, providing accurate and contextually relevant responses. Building such systems for low-resource languages like Swahili presents significant challenges due to the limited availability...

Full description

Saved in:
Bibliographic Details
Main Authors: Edmund V. Ndimbo, Qin Luo, Gimo C. Fernando, Xu Yang, Bang Wang
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/2/524
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A conversational system is an artificial intelligence application designed to interact with users in natural language, providing accurate and contextually relevant responses. Building such systems for low-resource languages like Swahili presents significant challenges due to the limited availability of large-scale training datasets. This paper proposes a Retrieval-Augmented Generation-based system to address these challenges and improve the quality of Swahili conversational AI. The system leverages fine-tuning, where models are trained on available Swahili data, combined with external knowledge retrieval to enhance response accuracy and fluency. Four models—mT5, GPT-2, mBART, and GPT-Neo—were evaluated using metrics such as BLEU, METEOR, Query Performance, and inference time. Results show that Retrieval-Augmented Generation consistently outperforms fine-tuning alone, particularly in generating detailed and contextually appropriate responses. Among the tested models, mT5 with Retrieval-Augmented Generation demonstrated the best performance, achieving a BLEU score of 56.88%, a METEOR score of 72.72%, and a Query Performance score of 84.34%, while maintaining relevance and fluency. Although Retrieval-Augmented Generation introduces slightly longer response times, its ability to significantly improve response quality makes it an effective approach for Swahili conversational systems. This study highlights the potential of Retrieval-Augmented Generation to advance conversational AI for Swahili and other low-resource languages, with future work focusing on optimizing efficiency and exploring multilingual applications.
ISSN:2076-3417