Unveiling the spectrum of Arabic offensive language: Taxonomy and insights.
This paper presents a novel taxonomy designed to classify offensive language in Arabic, filling a notable void in existing literature primarily concentrated on Indo-European languages. Our taxonomy delineates offensive language into seven distinct levels, comprising six explicit levels and one impli...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Public Library of Science (PLoS)
2025-01-01
|
| Series: | PLoS ONE |
| Online Access: | https://doi.org/10.1371/journal.pone.0319900 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This paper presents a novel taxonomy designed to classify offensive language in Arabic, filling a notable void in existing literature primarily concentrated on Indo-European languages. Our taxonomy delineates offensive language into seven distinct levels, comprising six explicit levels and one implicit level. Drawing inspiration from the simplified offensive language (SOL) taxonomy outlined in prior work, we adapted it to accommodate the intricacies and linguistic richness of Arabic. In our study, we analyzed existing datasets containing offensive language in Arabic, examining the range of annotations employed within these datasets. This exploration allowed us to gain insights into the diversity of offensive language instances and the methodologies used for their annotation, thereby informing the development of our streamlined taxonomy for categorizing such expressions. Initial examination of datasets uncovers compelling trends and distributions, emphasizing the intricate and distinct nature of offensive expressions in Arabic. We have also analyzed the performance of pre-trained and fine-tuned Arabic transformer offensive language detection models on these datasets. Our results underscore the importance of acknowledging linguistic and cultural diversity in the study and mitigation of online abusive language. We posit that our refined taxonomy and accompanying dataset will be pivotal in advancing research across Semitic languages, including sociocultural studies, natural language processing, and linguistic analyses. |
|---|---|
| ISSN: | 1932-6203 |