Annotated Lexicon for Sentiment Analysis in the Bosnian Language

The paper presents the first sentiment-annotated lexicon of the Bosnian language. The annotation process and methodology are presented along with a usability study, which concentrates on language coverage. The composition of the starting base was done by translating the Slovenian annotated lexicon a...

Full description

Saved in:

Bibliographic Details
Main Authors:	Sead Jahić, Jernej Vičič
Format:	Article
Language:	English
Published:	University of Ljubljana Press (Založba Univerze v Ljubljani) 2023-12-01
Series:	Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave
Subjects:	Bosnian lexicon corpus sentiment analysis AnAwords stopwords log-likelihood
Online Access:	https://journals.uni-lj.si/slovenscina2/article/view/11717
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849319321337069568
author	Sead Jahić Jernej Vičič
author_facet	Sead Jahić Jernej Vičič
author_sort	Sead Jahić
collection	DOAJ
description	The paper presents the first sentiment-annotated lexicon of the Bosnian language. The annotation process and methodology are presented along with a usability study, which concentrates on language coverage. The composition of the starting base was done by translating the Slovenian annotated lexicon and later manually checking the translations and annotations. The language coverage was observed using two reference corpora. The Bosnian language is still considered a low-resource language. A reference corpus comprised of automatically crawled web pages is available for the Bosnian language, but the authors had a hard time sourcing any corpora with a clear time frame for the text contained therein. A corpus of contemporary texts was constructed by collecting news articles from several Bosnian web portals. Two language coverage methods were used in this experiment. The first used a frequency list of all words extracted from two reference Bosnian language corpora, and the second ignored the frequencies as the main factor in counting. The computed coverage using the first presented method for the first corpus was 19.24%, while the second corpus yielded 28.05%. The second method yielded 2.34% coverage for the first corpus and 6.98% for the second corpus. The results of the study present a language coverage that is comparable to the state of the art in the field. The usability of the lexicon was already proven in a Twitter-based comparison.
format	Article
id	doaj-art-6d1a237bf5a04de6815c5699fb9050d9
institution	Kabale University
issn	2335-2736
language	English
publishDate	2023-12-01
publisher	University of Ljubljana Press (Založba Univerze v Ljubljani)
record_format	Article
series	Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave
spelling	doaj-art-6d1a237bf5a04de6815c5699fb9050d92025-08-20T03:50:31ZengUniversity of Ljubljana Press (Založba Univerze v Ljubljani)Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave2335-27362023-12-0111210.4312/slo2.0.2023.2.59-8318085Annotated Lexicon for Sentiment Analysis in the Bosnian LanguageSead Jahić0Jernej Vičič1University of Primorska, Faculty of Mathematics, Natural Science and Information Technologies, Koper, SloveniaUniversity of Primorska, Faculty of Mathematics, Natural Science and Information Technologies, Koper; Research Centre of the Slovenian Academy of Sciences and Arts, Fran Ramovš Institute of the Slovenian Language, Ljubljana, SloveniaThe paper presents the first sentiment-annotated lexicon of the Bosnian language. The annotation process and methodology are presented along with a usability study, which concentrates on language coverage. The composition of the starting base was done by translating the Slovenian annotated lexicon and later manually checking the translations and annotations. The language coverage was observed using two reference corpora. The Bosnian language is still considered a low-resource language. A reference corpus comprised of automatically crawled web pages is available for the Bosnian language, but the authors had a hard time sourcing any corpora with a clear time frame for the text contained therein. A corpus of contemporary texts was constructed by collecting news articles from several Bosnian web portals. Two language coverage methods were used in this experiment. The first used a frequency list of all words extracted from two reference Bosnian language corpora, and the second ignored the frequencies as the main factor in counting. The computed coverage using the first presented method for the first corpus was 19.24%, while the second corpus yielded 28.05%. The second method yielded 2.34% coverage for the first corpus and 6.98% for the second corpus. The results of the study present a language coverage that is comparable to the state of the art in the field. The usability of the lexicon was already proven in a Twitter-based comparison. https://journals.uni-lj.si/slovenscina2/article/view/11717Bosnian lexiconcorpussentiment analysisAnAwordsstopwordslog-likelihood
spellingShingle	Sead Jahić Jernej Vičič Annotated Lexicon for Sentiment Analysis in the Bosnian Language Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave Bosnian lexicon corpus sentiment analysis AnAwords stopwords log-likelihood
title	Annotated Lexicon for Sentiment Analysis in the Bosnian Language
title_full	Annotated Lexicon for Sentiment Analysis in the Bosnian Language
title_fullStr	Annotated Lexicon for Sentiment Analysis in the Bosnian Language
title_full_unstemmed	Annotated Lexicon for Sentiment Analysis in the Bosnian Language
title_short	Annotated Lexicon for Sentiment Analysis in the Bosnian Language
title_sort	annotated lexicon for sentiment analysis in the bosnian language
topic	Bosnian lexicon corpus sentiment analysis AnAwords stopwords log-likelihood
url	https://journals.uni-lj.si/slovenscina2/article/view/11717
work_keys_str_mv	AT seadjahic annotatedlexiconforsentimentanalysisinthebosnianlanguage AT jernejvicic annotatedlexiconforsentimentanalysisinthebosnianlanguage

Annotated Lexicon for Sentiment Analysis in the Bosnian Language

Similar Items