Detection of offensive content in the Kazakh language using machine learning and deep learning approaches

This article addresses the urgent need to detect destructive content, including religious extremism, racism, cyberbullying, and nation oriented extremism messages, on social media platforms in the Kazakh language. Given the agglutinative structure and rich morphology of Kazakh, standard natural lang...

Full description

Saved in:
Bibliographic Details
Main Authors: Milana Bolatbek, Moldir Sagynay, Shynar Mussiraliyeva, Zhastay Yeltay
Format: Article
Language:English
Published: PeerJ Inc. 2025-08-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-3027.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849765887943376896
author Milana Bolatbek
Moldir Sagynay
Shynar Mussiraliyeva
Zhastay Yeltay
author_facet Milana Bolatbek
Moldir Sagynay
Shynar Mussiraliyeva
Zhastay Yeltay
author_sort Milana Bolatbek
collection DOAJ
description This article addresses the urgent need to detect destructive content, including religious extremism, racism, cyberbullying, and nation oriented extremism messages, on social media platforms in the Kazakh language. Given the agglutinative structure and rich morphology of Kazakh, standard natural language processing (NLP) models require significant adaptation. The study employs a range of machine learning and deep learning techniques, such as logistic regression, support vector machines (SVM), and long short-term memory (LSTM) networks, to classify destructive content. This article demonstrates the effectiveness of combining n-gram and stemming methods with machine learning algorithms, achieving high accuracy in content classification. The findings underscore the importance of developing language-specific NLP tools tailored to Kazakh’s linguistic complexities. This research not only contributes to ensuring online safety by detecting destructive content in Kazakh digital spaces, but also provides a framework for applying similar techniques to other lesser-resourced languages.
format Article
id doaj-art-760ccd072c334ff4b4c83e00754d127e
institution DOAJ
issn 2376-5992
language English
publishDate 2025-08-01
publisher PeerJ Inc.
record_format Article
series PeerJ Computer Science
spelling doaj-art-760ccd072c334ff4b4c83e00754d127e2025-08-20T03:04:44ZengPeerJ Inc.PeerJ Computer Science2376-59922025-08-0111e302710.7717/peerj-cs.3027Detection of offensive content in the Kazakh language using machine learning and deep learning approachesMilana BolatbekMoldir SagynayShynar MussiraliyevaZhastay YeltayThis article addresses the urgent need to detect destructive content, including religious extremism, racism, cyberbullying, and nation oriented extremism messages, on social media platforms in the Kazakh language. Given the agglutinative structure and rich morphology of Kazakh, standard natural language processing (NLP) models require significant adaptation. The study employs a range of machine learning and deep learning techniques, such as logistic regression, support vector machines (SVM), and long short-term memory (LSTM) networks, to classify destructive content. This article demonstrates the effectiveness of combining n-gram and stemming methods with machine learning algorithms, achieving high accuracy in content classification. The findings underscore the importance of developing language-specific NLP tools tailored to Kazakh’s linguistic complexities. This research not only contributes to ensuring online safety by detecting destructive content in Kazakh digital spaces, but also provides a framework for applying similar techniques to other lesser-resourced languages.https://peerj.com/articles/cs-3027.pdfAbusive languageHate-speech detectionNLPSocial mediaNeural networksOnline social networks
spellingShingle Milana Bolatbek
Moldir Sagynay
Shynar Mussiraliyeva
Zhastay Yeltay
Detection of offensive content in the Kazakh language using machine learning and deep learning approaches
PeerJ Computer Science
Abusive language
Hate-speech detection
NLP
Social media
Neural networks
Online social networks
title Detection of offensive content in the Kazakh language using machine learning and deep learning approaches
title_full Detection of offensive content in the Kazakh language using machine learning and deep learning approaches
title_fullStr Detection of offensive content in the Kazakh language using machine learning and deep learning approaches
title_full_unstemmed Detection of offensive content in the Kazakh language using machine learning and deep learning approaches
title_short Detection of offensive content in the Kazakh language using machine learning and deep learning approaches
title_sort detection of offensive content in the kazakh language using machine learning and deep learning approaches
topic Abusive language
Hate-speech detection
NLP
Social media
Neural networks
Online social networks
url https://peerj.com/articles/cs-3027.pdf
work_keys_str_mv AT milanabolatbek detectionofoffensivecontentinthekazakhlanguageusingmachinelearninganddeeplearningapproaches
AT moldirsagynay detectionofoffensivecontentinthekazakhlanguageusingmachinelearninganddeeplearningapproaches
AT shynarmussiraliyeva detectionofoffensivecontentinthekazakhlanguageusingmachinelearninganddeeplearningapproaches
AT zhastayyeltay detectionofoffensivecontentinthekazakhlanguageusingmachinelearninganddeeplearningapproaches