AsanteTwiSenti: A Sentiment dataset of Ghanaian Asante Twi Tweets in a multilingual contextgithub repository.
Ghanaian Asante Twi is the most widely spoken indigenous language in Ghana. It is a language of scholarship that is very rich in African studies and is taught in many universities across the globe. Despite its popularity, it lacks data resources in Sentiment Analysis, Named Entity Recognition, Part...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-06-01
|
| Series: | Data in Brief |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2352340925001921 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849725715510984704 |
|---|---|
| author | Mavis Sarah Gyimah James Benjamin Hayfron -Acquah Rose-Mary Mensah Gyening Michael Asante Umar Farouk Ibn Abdulrahman Evans Kotei |
| author_facet | Mavis Sarah Gyimah James Benjamin Hayfron -Acquah Rose-Mary Mensah Gyening Michael Asante Umar Farouk Ibn Abdulrahman Evans Kotei |
| author_sort | Mavis Sarah Gyimah |
| collection | DOAJ |
| description | Ghanaian Asante Twi is the most widely spoken indigenous language in Ghana. It is a language of scholarship that is very rich in African studies and is taught in many universities across the globe. Despite its popularity, it lacks data resources in Sentiment Analysis, Named Entity Recognition, Part of Speech (POS) tagging, and in particular linguistic corpora. The paper introduces AsanteTwiSenti, a comprehensive sentiment corpus for the Ghanaian Asante Twi language with the methods and challenges encountered in the corpus construction. The AsanteTwiSenti corpus contains 10,095 tweets extracted from 30,507 tweets scraped from the Twitter API. Based on standard guidelines and data preprocessing, 8438 tweets are labeled as Positive, Negative, Neutral, Ghanaian-Pidgin, multilingual, and Monolingual. The AsanteTwiSenti corpus seeks to bridge the low-resource gap of the Twi Language, inspire the development of local Ghanaian language resources, and impact academic research of Asante Twi for Natural Language Processing(NLP), language preservation, and education. |
| format | Article |
| id | doaj-art-c25f0e70d08d4ee189ad37d5e0a3bb76 |
| institution | DOAJ |
| issn | 2352-3409 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Data in Brief |
| spelling | doaj-art-c25f0e70d08d4ee189ad37d5e0a3bb762025-08-20T03:10:24ZengElsevierData in Brief2352-34092025-06-016011146010.1016/j.dib.2025.111460AsanteTwiSenti: A Sentiment dataset of Ghanaian Asante Twi Tweets in a multilingual contextgithub repository.Mavis Sarah Gyimah0James Benjamin Hayfron -Acquah1Rose-Mary Mensah Gyening2Michael Asante3Umar Farouk Ibn Abdulrahman4Evans Kotei5Corresponding author.; Computer Science Department, Kumasi Technical University (KSTU), P.O. Box 854, Kumasi, GhanaComputer Science Department, Kumasi Technical University (KSTU), P.O. Box 854, Kumasi, GhanaComputer Science Department, Kumasi Technical University (KSTU), P.O. Box 854, Kumasi, GhanaComputer Science Department, Kumasi Technical University (KSTU), P.O. Box 854, Kumasi, GhanaComputer Science Department, Kumasi Technical University (KSTU), P.O. Box 854, Kumasi, GhanaComputer Science Department, Kumasi Technical University (KSTU), P.O. Box 854, Kumasi, GhanaGhanaian Asante Twi is the most widely spoken indigenous language in Ghana. It is a language of scholarship that is very rich in African studies and is taught in many universities across the globe. Despite its popularity, it lacks data resources in Sentiment Analysis, Named Entity Recognition, Part of Speech (POS) tagging, and in particular linguistic corpora. The paper introduces AsanteTwiSenti, a comprehensive sentiment corpus for the Ghanaian Asante Twi language with the methods and challenges encountered in the corpus construction. The AsanteTwiSenti corpus contains 10,095 tweets extracted from 30,507 tweets scraped from the Twitter API. Based on standard guidelines and data preprocessing, 8438 tweets are labeled as Positive, Negative, Neutral, Ghanaian-Pidgin, multilingual, and Monolingual. The AsanteTwiSenti corpus seeks to bridge the low-resource gap of the Twi Language, inspire the development of local Ghanaian language resources, and impact academic research of Asante Twi for Natural Language Processing(NLP), language preservation, and education.http://www.sciencedirect.com/science/article/pii/S2352340925001921AsanteTwiLow-resource languagesSentiment analysisMultilingualGhanaian Pidgin |
| spellingShingle | Mavis Sarah Gyimah James Benjamin Hayfron -Acquah Rose-Mary Mensah Gyening Michael Asante Umar Farouk Ibn Abdulrahman Evans Kotei AsanteTwiSenti: A Sentiment dataset of Ghanaian Asante Twi Tweets in a multilingual contextgithub repository. Data in Brief Asante Twi Low-resource languages Sentiment analysis Multilingual Ghanaian Pidgin |
| title | AsanteTwiSenti: A Sentiment dataset of Ghanaian Asante Twi Tweets in a multilingual contextgithub repository. |
| title_full | AsanteTwiSenti: A Sentiment dataset of Ghanaian Asante Twi Tweets in a multilingual contextgithub repository. |
| title_fullStr | AsanteTwiSenti: A Sentiment dataset of Ghanaian Asante Twi Tweets in a multilingual contextgithub repository. |
| title_full_unstemmed | AsanteTwiSenti: A Sentiment dataset of Ghanaian Asante Twi Tweets in a multilingual contextgithub repository. |
| title_short | AsanteTwiSenti: A Sentiment dataset of Ghanaian Asante Twi Tweets in a multilingual contextgithub repository. |
| title_sort | asantetwisenti a sentiment dataset of ghanaian asante twi tweets in a multilingual contextgithub repository |
| topic | Asante Twi Low-resource languages Sentiment analysis Multilingual Ghanaian Pidgin |
| url | http://www.sciencedirect.com/science/article/pii/S2352340925001921 |
| work_keys_str_mv | AT mavissarahgyimah asantetwisentiasentimentdatasetofghanaianasantetwitweetsinamultilingualcontextgithubrepository AT jamesbenjaminhayfronacquah asantetwisentiasentimentdatasetofghanaianasantetwitweetsinamultilingualcontextgithubrepository AT rosemarymensahgyening asantetwisentiasentimentdatasetofghanaianasantetwitweetsinamultilingualcontextgithubrepository AT michaelasante asantetwisentiasentimentdatasetofghanaianasantetwitweetsinamultilingualcontextgithubrepository AT umarfaroukibnabdulrahman asantetwisentiasentimentdatasetofghanaianasantetwitweetsinamultilingualcontextgithubrepository AT evanskotei asantetwisentiasentimentdatasetofghanaianasantetwitweetsinamultilingualcontextgithubrepository |