Hate Speech Detection and Online Public Opinion Regulation Using Support Vector Machine Algorithm: Application and Impact on Social Media

Detecting hate speech in social media is challenging due to its rarity, high-dimensional complexity, and implicit expression via sarcasm or spelling variations, rendering linear models ineffective. In this study, the SVM (Support Vector Machine) algorithm is used to map text features from low-dimens...

Full description

Saved in:
Bibliographic Details
Main Authors: Siyuan Li, Zhi Li
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/16/5/344
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Detecting hate speech in social media is challenging due to its rarity, high-dimensional complexity, and implicit expression via sarcasm or spelling variations, rendering linear models ineffective. In this study, the SVM (Support Vector Machine) algorithm is used to map text features from low-dimensional to high-dimensional space using kernel function techniques to meet complex nonlinear classification challenges. By maximizing the category interval to locate the optimal hyperplane and combining nuclear techniques to implicitly adjust the data distribution, the classification accuracy of hate speech detection is significantly improved. Data collection leverages social media APIs (Application Programming Interface) and customized crawlers with OAuth2.0 authentication and keyword filtering, ensuring relevance. Regular expressions validate data integrity, followed by preprocessing steps such as denoising, stop-word removal, and spelling correction. Word embeddings are generated using Word2Vec’s Skip-gram model, combined with TF-IDF (Term Frequency–Inverse Document Frequency) weighting to capture contextual semantics. A multi-level feature extraction framework integrates sentiment analysis via lexicon-based methods and BERT for advanced sentiment recognition. Experimental evaluations on two datasets demonstrate the SVM model’s effectiveness, achieving accuracies of 90.42% and 92.84%, recall rates of 88.06% and 90.79%, and average inference times of 3.71 ms and 2.96 ms. These results highlight the model’s ability to detect implicit hate speech accurately and efficiently, supporting real-time monitoring. This research contributes to creating a safer online environment by advancing hate speech detection methodologies.
ISSN:2078-2489