AsanteTwiSenti: A Sentiment dataset of Ghanaian Asante Twi Tweets in a multilingual contextgithub repository.
Ghanaian Asante Twi is the most widely spoken indigenous language in Ghana. It is a language of scholarship that is very rich in African studies and is taught in many universities across the globe. Despite its popularity, it lacks data resources in Sentiment Analysis, Named Entity Recognition, Part...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-06-01
|
| Series: | Data in Brief |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2352340925001921 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Ghanaian Asante Twi is the most widely spoken indigenous language in Ghana. It is a language of scholarship that is very rich in African studies and is taught in many universities across the globe. Despite its popularity, it lacks data resources in Sentiment Analysis, Named Entity Recognition, Part of Speech (POS) tagging, and in particular linguistic corpora. The paper introduces AsanteTwiSenti, a comprehensive sentiment corpus for the Ghanaian Asante Twi language with the methods and challenges encountered in the corpus construction. The AsanteTwiSenti corpus contains 10,095 tweets extracted from 30,507 tweets scraped from the Twitter API. Based on standard guidelines and data preprocessing, 8438 tweets are labeled as Positive, Negative, Neutral, Ghanaian-Pidgin, multilingual, and Monolingual. The AsanteTwiSenti corpus seeks to bridge the low-resource gap of the Twi Language, inspire the development of local Ghanaian language resources, and impact academic research of Asante Twi for Natural Language Processing(NLP), language preservation, and education. |
|---|---|
| ISSN: | 2352-3409 |