Enhancing Hate Speech Detection: Leveraging Emoji Preprocessing with BI-LSTM Model
Microblogging platforms like Twitter enable users to rapidly share opinions, information, and viewpoints. However, the vast volume of daily user-generated content poses challenges in ensuring the platform remains safe and inclusive. One key concern is the prevalence of hate speech, which must be add...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Informatics Department, Faculty of Computer Science Bina Darma University
2025-06-01
|
| Series: | Journal of Information Systems and Informatics |
| Subjects: | |
| Online Access: | https://journal-isi.org/index.php/isi/article/view/1147 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Microblogging platforms like Twitter enable users to rapidly share opinions, information, and viewpoints. However, the vast volume of daily user-generated content poses challenges in ensuring the platform remains safe and inclusive. One key concern is the prevalence of hate speech, which must be addressed to foster a respectful and open environment. This study explores the effectiveness of the Emoji Description Method (EMJ DESC), which enhances tweet classification by converting emojis into descriptive text or sentences. These descriptions are then encoded into numerical vector matrices that capture the meaning and emotional tone of each emoji. Integrated into a basic text classification model, these vectors help improve detection performance. The research examines how different emoji preprocessing strategies affect the performance of a BI-LSTM model for hate speech classification. Results show that removing emojis significantly reduces accuracy (68%) and weakens the model’s ability to distinguish between hate and non-hate speech, due to the loss of valuable semantic context. In contrast, retaining emoji semantics either through textual descriptions or embeddings boosts classification accuracy to 93% and 94%, respectively. The highest performance is achieved through emoji embedding, highlighting its ability to capture subtle non-verbal cues critically for accurate hate speech detection. Overall, the findings emphasize the importance of incorporating emoji-aware preprocessing techniques to enhance the effectiveness of social media content classification. |
|---|---|
| ISSN: | 2656-5935 2656-4882 |