Detection of offensive content in the Kazakh language using machine learning and deep learning approaches
This article addresses the urgent need to detect destructive content, including religious extremism, racism, cyberbullying, and nation oriented extremism messages, on social media platforms in the Kazakh language. Given the agglutinative structure and rich morphology of Kazakh, standard natural lang...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
PeerJ Inc.
2025-08-01
|
| Series: | PeerJ Computer Science |
| Subjects: | |
| Online Access: | https://peerj.com/articles/cs-3027.pdf |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849765887943376896 |
|---|---|
| author | Milana Bolatbek Moldir Sagynay Shynar Mussiraliyeva Zhastay Yeltay |
| author_facet | Milana Bolatbek Moldir Sagynay Shynar Mussiraliyeva Zhastay Yeltay |
| author_sort | Milana Bolatbek |
| collection | DOAJ |
| description | This article addresses the urgent need to detect destructive content, including religious extremism, racism, cyberbullying, and nation oriented extremism messages, on social media platforms in the Kazakh language. Given the agglutinative structure and rich morphology of Kazakh, standard natural language processing (NLP) models require significant adaptation. The study employs a range of machine learning and deep learning techniques, such as logistic regression, support vector machines (SVM), and long short-term memory (LSTM) networks, to classify destructive content. This article demonstrates the effectiveness of combining n-gram and stemming methods with machine learning algorithms, achieving high accuracy in content classification. The findings underscore the importance of developing language-specific NLP tools tailored to Kazakh’s linguistic complexities. This research not only contributes to ensuring online safety by detecting destructive content in Kazakh digital spaces, but also provides a framework for applying similar techniques to other lesser-resourced languages. |
| format | Article |
| id | doaj-art-760ccd072c334ff4b4c83e00754d127e |
| institution | DOAJ |
| issn | 2376-5992 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | PeerJ Inc. |
| record_format | Article |
| series | PeerJ Computer Science |
| spelling | doaj-art-760ccd072c334ff4b4c83e00754d127e2025-08-20T03:04:44ZengPeerJ Inc.PeerJ Computer Science2376-59922025-08-0111e302710.7717/peerj-cs.3027Detection of offensive content in the Kazakh language using machine learning and deep learning approachesMilana BolatbekMoldir SagynayShynar MussiraliyevaZhastay YeltayThis article addresses the urgent need to detect destructive content, including religious extremism, racism, cyberbullying, and nation oriented extremism messages, on social media platforms in the Kazakh language. Given the agglutinative structure and rich morphology of Kazakh, standard natural language processing (NLP) models require significant adaptation. The study employs a range of machine learning and deep learning techniques, such as logistic regression, support vector machines (SVM), and long short-term memory (LSTM) networks, to classify destructive content. This article demonstrates the effectiveness of combining n-gram and stemming methods with machine learning algorithms, achieving high accuracy in content classification. The findings underscore the importance of developing language-specific NLP tools tailored to Kazakh’s linguistic complexities. This research not only contributes to ensuring online safety by detecting destructive content in Kazakh digital spaces, but also provides a framework for applying similar techniques to other lesser-resourced languages.https://peerj.com/articles/cs-3027.pdfAbusive languageHate-speech detectionNLPSocial mediaNeural networksOnline social networks |
| spellingShingle | Milana Bolatbek Moldir Sagynay Shynar Mussiraliyeva Zhastay Yeltay Detection of offensive content in the Kazakh language using machine learning and deep learning approaches PeerJ Computer Science Abusive language Hate-speech detection NLP Social media Neural networks Online social networks |
| title | Detection of offensive content in the Kazakh language using machine learning and deep learning approaches |
| title_full | Detection of offensive content in the Kazakh language using machine learning and deep learning approaches |
| title_fullStr | Detection of offensive content in the Kazakh language using machine learning and deep learning approaches |
| title_full_unstemmed | Detection of offensive content in the Kazakh language using machine learning and deep learning approaches |
| title_short | Detection of offensive content in the Kazakh language using machine learning and deep learning approaches |
| title_sort | detection of offensive content in the kazakh language using machine learning and deep learning approaches |
| topic | Abusive language Hate-speech detection NLP Social media Neural networks Online social networks |
| url | https://peerj.com/articles/cs-3027.pdf |
| work_keys_str_mv | AT milanabolatbek detectionofoffensivecontentinthekazakhlanguageusingmachinelearninganddeeplearningapproaches AT moldirsagynay detectionofoffensivecontentinthekazakhlanguageusingmachinelearninganddeeplearningapproaches AT shynarmussiraliyeva detectionofoffensivecontentinthekazakhlanguageusingmachinelearninganddeeplearningapproaches AT zhastayyeltay detectionofoffensivecontentinthekazakhlanguageusingmachinelearninganddeeplearningapproaches |