Detection of offensive content in the Kazakh language using machine learning and deep learning approaches

This article addresses the urgent need to detect destructive content, including religious extremism, racism, cyberbullying, and nation oriented extremism messages, on social media platforms in the Kazakh language. Given the agglutinative structure and rich morphology of Kazakh, standard natural lang...

Full description

Saved in:
Bibliographic Details
Main Authors: Milana Bolatbek, Moldir Sagynay, Shynar Mussiraliyeva, Zhastay Yeltay
Format: Article
Language:English
Published: PeerJ Inc. 2025-08-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-3027.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This article addresses the urgent need to detect destructive content, including religious extremism, racism, cyberbullying, and nation oriented extremism messages, on social media platforms in the Kazakh language. Given the agglutinative structure and rich morphology of Kazakh, standard natural language processing (NLP) models require significant adaptation. The study employs a range of machine learning and deep learning techniques, such as logistic regression, support vector machines (SVM), and long short-term memory (LSTM) networks, to classify destructive content. This article demonstrates the effectiveness of combining n-gram and stemming methods with machine learning algorithms, achieving high accuracy in content classification. The findings underscore the importance of developing language-specific NLP tools tailored to Kazakh’s linguistic complexities. This research not only contributes to ensuring online safety by detecting destructive content in Kazakh digital spaces, but also provides a framework for applying similar techniques to other lesser-resourced languages.
ISSN:2376-5992