The Development of Small-Scale Language Models for Low-Resource Languages, with a Focus on Kazakh and Direct Preference Optimization

Low-resource languages remain underserved by contemporary large language models (LLMs) because they lack sizable corpora, bespoke preprocessing tools, and the computing budgets assumed by mainstream alignment pipelines. Focusing on Kazakh, we present a 1.94B parameter LLaMA-based model that demonstr...

Full description

Saved in:

Bibliographic Details
Main Authors:	Nurgali Kadyrbek, Zhanseit Tuimebayev, Madina Mansurova, Vítor Viegas
Format:	Article
Language:	English
Published:	MDPI AG 2025-05-01
Series:	Big Data and Cognitive Computing
Subjects:	Kazakh language model LLaMA natural language processing (NLP) low-resource languages DPO fine-tuning
Online Access:	https://www.mdpi.com/2504-2289/9/5/137
Tags:	Add Tag No Tags, Be the first to tag this record!

Be the first to leave a comment!

The Development of Small-Scale Language Models for Low-Resource Languages, with a Focus on Kazakh and Direct Preference Optimization

Similar Items