Bilingual hate speech detection on social media: Amharic and Afaan Oromo

Abstract Due to significant increases in internet penetration and the development of smartphone technology during the preceding couple of decades, many people have started using social media as a communication platform. Social media has grown to be one of the most significant components, with severa...

Full description

Saved in:
Bibliographic Details
Main Authors: Teshome Mulugeta Ababu, Michael Melese Woldeyohannis, Emuye Bawoke Getaneh
Format: Article
Language:English
Published: SpringerOpen 2025-02-01
Series:Journal of Big Data
Subjects:
Online Access:https://doi.org/10.1186/s40537-024-01044-y
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823861972295221248
author Teshome Mulugeta Ababu
Michael Melese Woldeyohannis
Emuye Bawoke Getaneh
author_facet Teshome Mulugeta Ababu
Michael Melese Woldeyohannis
Emuye Bawoke Getaneh
author_sort Teshome Mulugeta Ababu
collection DOAJ
description Abstract Due to significant increases in internet penetration and the development of smartphone technology during the preceding couple of decades, many people have started using social media as a communication platform. Social media has grown to be one of the most significant components, with several benefits. However, technology also poses a number of threats, challenges, and barriers, such as hate speech, disinformation, and fake news. Hate speech detection is one of the many ways social media platforms can be accused of not doing enough to thwart hate speech on their platform. People in Bilingual and multinational societies commonly employ a code mix in both spoken and written communication. Among these, Amharic and Afaan Oromo language speakers frequently mix the two languages when conversing and posting on social media. The majority of previous study concentrated on identifying either technological favoured language or monolingual hate speech in Ethiopian languages; however, the availability of Bilingual communication in social media hampers the propagation of hate speech via social media. In this work, a Bilingual hate speech detection for Amharic and Afaan Oromo languages were conducted using four different deep learning classifiers (CNN, BiLSTM, CNN-BiLSTM, and BiGRU) and three feature extraction (Keras word embedding, word2vec, and FastText) techniques. According to the experiment, BiLSTM with FastText feature extraction is an outperforming the other algorithm by achieving a 78.05% accuracy for Bilingual Amharic Afaan Oromo hate speech detection. The FastText feature extraction overcomes the problem of out of vocabulary (OOV). Furthermore, we are working towards including others linguistic features of the languages to detect hate speech and make the resource available to facilitate further research in the area of Bilingual hate speech detection for other under-resourced Ethiopian languages.
format Article
id doaj-art-7837643af0dd4313961b5c45ce4d7551
institution Kabale University
issn 2196-1115
language English
publishDate 2025-02-01
publisher SpringerOpen
record_format Article
series Journal of Big Data
spelling doaj-art-7837643af0dd4313961b5c45ce4d75512025-02-09T12:41:20ZengSpringerOpenJournal of Big Data2196-11152025-02-0112112310.1186/s40537-024-01044-yBilingual hate speech detection on social media: Amharic and Afaan OromoTeshome Mulugeta Ababu0Michael Melese Woldeyohannis1Emuye Bawoke Getaneh2Department of Computer Science, College of Natural Science, Salale UniversitySchool of Information Science, Addis Ababa UniversityDepartment of Information Technology, Dire Dawa University Institute of TechnologyAbstract Due to significant increases in internet penetration and the development of smartphone technology during the preceding couple of decades, many people have started using social media as a communication platform. Social media has grown to be one of the most significant components, with several benefits. However, technology also poses a number of threats, challenges, and barriers, such as hate speech, disinformation, and fake news. Hate speech detection is one of the many ways social media platforms can be accused of not doing enough to thwart hate speech on their platform. People in Bilingual and multinational societies commonly employ a code mix in both spoken and written communication. Among these, Amharic and Afaan Oromo language speakers frequently mix the two languages when conversing and posting on social media. The majority of previous study concentrated on identifying either technological favoured language or monolingual hate speech in Ethiopian languages; however, the availability of Bilingual communication in social media hampers the propagation of hate speech via social media. In this work, a Bilingual hate speech detection for Amharic and Afaan Oromo languages were conducted using four different deep learning classifiers (CNN, BiLSTM, CNN-BiLSTM, and BiGRU) and three feature extraction (Keras word embedding, word2vec, and FastText) techniques. According to the experiment, BiLSTM with FastText feature extraction is an outperforming the other algorithm by achieving a 78.05% accuracy for Bilingual Amharic Afaan Oromo hate speech detection. The FastText feature extraction overcomes the problem of out of vocabulary (OOV). Furthermore, we are working towards including others linguistic features of the languages to detect hate speech and make the resource available to facilitate further research in the area of Bilingual hate speech detection for other under-resourced Ethiopian languages.https://doi.org/10.1186/s40537-024-01044-yAmharicAfaan OromoHate SpeechBilingualUnder-resourcedEthiopian Languages
spellingShingle Teshome Mulugeta Ababu
Michael Melese Woldeyohannis
Emuye Bawoke Getaneh
Bilingual hate speech detection on social media: Amharic and Afaan Oromo
Journal of Big Data
Amharic
Afaan Oromo
Hate Speech
Bilingual
Under-resourced
Ethiopian Languages
title Bilingual hate speech detection on social media: Amharic and Afaan Oromo
title_full Bilingual hate speech detection on social media: Amharic and Afaan Oromo
title_fullStr Bilingual hate speech detection on social media: Amharic and Afaan Oromo
title_full_unstemmed Bilingual hate speech detection on social media: Amharic and Afaan Oromo
title_short Bilingual hate speech detection on social media: Amharic and Afaan Oromo
title_sort bilingual hate speech detection on social media amharic and afaan oromo
topic Amharic
Afaan Oromo
Hate Speech
Bilingual
Under-resourced
Ethiopian Languages
url https://doi.org/10.1186/s40537-024-01044-y
work_keys_str_mv AT teshomemulugetaababu bilingualhatespeechdetectiononsocialmediaamharicandafaanoromo
AT michaelmelesewoldeyohannis bilingualhatespeechdetectiononsocialmediaamharicandafaanoromo
AT emuyebawokegetaneh bilingualhatespeechdetectiononsocialmediaamharicandafaanoromo