AsanteTwiSenti: A Sentiment dataset of Ghanaian Asante Twi Tweets in a multilingual contextgithub repository.

Ghanaian Asante Twi is the most widely spoken indigenous language in Ghana. It is a language of scholarship that is very rich in African studies and is taught in many universities across the globe. Despite its popularity, it lacks data resources in Sentiment Analysis, Named Entity Recognition, Part...

Full description

Saved in:
Bibliographic Details
Main Authors: Mavis Sarah Gyimah, James Benjamin Hayfron -Acquah, Rose-Mary Mensah Gyening, Michael Asante, Umar Farouk Ibn Abdulrahman, Evans Kotei
Format: Article
Language:English
Published: Elsevier 2025-06-01
Series:Data in Brief
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352340925001921
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849725715510984704
author Mavis Sarah Gyimah
James Benjamin Hayfron -Acquah
Rose-Mary Mensah Gyening
Michael Asante
Umar Farouk Ibn Abdulrahman
Evans Kotei
author_facet Mavis Sarah Gyimah
James Benjamin Hayfron -Acquah
Rose-Mary Mensah Gyening
Michael Asante
Umar Farouk Ibn Abdulrahman
Evans Kotei
author_sort Mavis Sarah Gyimah
collection DOAJ
description Ghanaian Asante Twi is the most widely spoken indigenous language in Ghana. It is a language of scholarship that is very rich in African studies and is taught in many universities across the globe. Despite its popularity, it lacks data resources in Sentiment Analysis, Named Entity Recognition, Part of Speech (POS) tagging, and in particular linguistic corpora. The paper introduces AsanteTwiSenti, a comprehensive sentiment corpus for the Ghanaian Asante Twi language with the methods and challenges encountered in the corpus construction. The AsanteTwiSenti corpus contains 10,095 tweets extracted from 30,507 tweets scraped from the Twitter API. Based on standard guidelines and data preprocessing, 8438 tweets are labeled as Positive, Negative, Neutral, Ghanaian-Pidgin, multilingual, and Monolingual. The AsanteTwiSenti corpus seeks to bridge the low-resource gap of the Twi Language, inspire the development of local Ghanaian language resources, and impact academic research of Asante Twi for Natural Language Processing(NLP), language preservation, and education.
format Article
id doaj-art-c25f0e70d08d4ee189ad37d5e0a3bb76
institution DOAJ
issn 2352-3409
language English
publishDate 2025-06-01
publisher Elsevier
record_format Article
series Data in Brief
spelling doaj-art-c25f0e70d08d4ee189ad37d5e0a3bb762025-08-20T03:10:24ZengElsevierData in Brief2352-34092025-06-016011146010.1016/j.dib.2025.111460AsanteTwiSenti: A Sentiment dataset of Ghanaian Asante Twi Tweets in a multilingual contextgithub repository.Mavis Sarah Gyimah0James Benjamin Hayfron -Acquah1Rose-Mary Mensah Gyening2Michael Asante3Umar Farouk Ibn Abdulrahman4Evans Kotei5Corresponding author.; Computer Science Department, Kumasi Technical University (KSTU), P.O. Box 854, Kumasi, GhanaComputer Science Department, Kumasi Technical University (KSTU), P.O. Box 854, Kumasi, GhanaComputer Science Department, Kumasi Technical University (KSTU), P.O. Box 854, Kumasi, GhanaComputer Science Department, Kumasi Technical University (KSTU), P.O. Box 854, Kumasi, GhanaComputer Science Department, Kumasi Technical University (KSTU), P.O. Box 854, Kumasi, GhanaComputer Science Department, Kumasi Technical University (KSTU), P.O. Box 854, Kumasi, GhanaGhanaian Asante Twi is the most widely spoken indigenous language in Ghana. It is a language of scholarship that is very rich in African studies and is taught in many universities across the globe. Despite its popularity, it lacks data resources in Sentiment Analysis, Named Entity Recognition, Part of Speech (POS) tagging, and in particular linguistic corpora. The paper introduces AsanteTwiSenti, a comprehensive sentiment corpus for the Ghanaian Asante Twi language with the methods and challenges encountered in the corpus construction. The AsanteTwiSenti corpus contains 10,095 tweets extracted from 30,507 tweets scraped from the Twitter API. Based on standard guidelines and data preprocessing, 8438 tweets are labeled as Positive, Negative, Neutral, Ghanaian-Pidgin, multilingual, and Monolingual. The AsanteTwiSenti corpus seeks to bridge the low-resource gap of the Twi Language, inspire the development of local Ghanaian language resources, and impact academic research of Asante Twi for Natural Language Processing(NLP), language preservation, and education.http://www.sciencedirect.com/science/article/pii/S2352340925001921AsanteTwiLow-resource languagesSentiment analysisMultilingualGhanaian Pidgin
spellingShingle Mavis Sarah Gyimah
James Benjamin Hayfron -Acquah
Rose-Mary Mensah Gyening
Michael Asante
Umar Farouk Ibn Abdulrahman
Evans Kotei
AsanteTwiSenti: A Sentiment dataset of Ghanaian Asante Twi Tweets in a multilingual contextgithub repository.
Data in Brief
Asante
Twi
Low-resource languages
Sentiment analysis
Multilingual
Ghanaian Pidgin
title AsanteTwiSenti: A Sentiment dataset of Ghanaian Asante Twi Tweets in a multilingual contextgithub repository.
title_full AsanteTwiSenti: A Sentiment dataset of Ghanaian Asante Twi Tweets in a multilingual contextgithub repository.
title_fullStr AsanteTwiSenti: A Sentiment dataset of Ghanaian Asante Twi Tweets in a multilingual contextgithub repository.
title_full_unstemmed AsanteTwiSenti: A Sentiment dataset of Ghanaian Asante Twi Tweets in a multilingual contextgithub repository.
title_short AsanteTwiSenti: A Sentiment dataset of Ghanaian Asante Twi Tweets in a multilingual contextgithub repository.
title_sort asantetwisenti a sentiment dataset of ghanaian asante twi tweets in a multilingual contextgithub repository
topic Asante
Twi
Low-resource languages
Sentiment analysis
Multilingual
Ghanaian Pidgin
url http://www.sciencedirect.com/science/article/pii/S2352340925001921
work_keys_str_mv AT mavissarahgyimah asantetwisentiasentimentdatasetofghanaianasantetwitweetsinamultilingualcontextgithubrepository
AT jamesbenjaminhayfronacquah asantetwisentiasentimentdatasetofghanaianasantetwitweetsinamultilingualcontextgithubrepository
AT rosemarymensahgyening asantetwisentiasentimentdatasetofghanaianasantetwitweetsinamultilingualcontextgithubrepository
AT michaelasante asantetwisentiasentimentdatasetofghanaianasantetwitweetsinamultilingualcontextgithubrepository
AT umarfaroukibnabdulrahman asantetwisentiasentimentdatasetofghanaianasantetwitweetsinamultilingualcontextgithubrepository
AT evanskotei asantetwisentiasentimentdatasetofghanaianasantetwitweetsinamultilingualcontextgithubrepository