Slovene and Croatian word embeddings in terms of gender occupational analogies

In recent years, the use of deep neural networks and dense vector embeddings for text representation have led to excellent results in the field of computational understanding of natural language. It has also been shown that word embeddings often capture gender, racial and other types of bias. The ar...

Full description

Saved in:
Bibliographic Details
Main Authors: Matej Ulčar, Anka Supej, Marko Robnik-Šikonja, Senja Pollak
Format: Article
Language:English
Published: University of Ljubljana Press (Založba Univerze v Ljubljani) 2021-07-01
Series:Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave
Subjects:
Online Access:https://journals.uni-lj.si/slovenscina2/article/view/9883
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849319346499747840
author Matej Ulčar
Anka Supej
Marko Robnik-Šikonja
Senja Pollak
author_facet Matej Ulčar
Anka Supej
Marko Robnik-Šikonja
Senja Pollak
author_sort Matej Ulčar
collection DOAJ
description In recent years, the use of deep neural networks and dense vector embeddings for text representation have led to excellent results in the field of computational understanding of natural language. It has also been shown that word embeddings often capture gender, racial and other types of bias. The article focuses on evaluating Slovene and Croatian word embeddings in terms of gender bias using word analogy calculations. We compiled a list of masculine and feminine nouns for occupations in Slovene and evaluated the gender bias of fastText, word2vec and ELMo embeddings with different configurations and different approaches to analogy calculations. The lowest occupational gender bias was observed with the fastText embeddings. Similarly, we compared different fastText embeddings on Croatian occupational analogies.
format Article
id doaj-art-9d4e22fbe85e43159fb32dc66266b2c4
institution Kabale University
issn 2335-2736
language English
publishDate 2021-07-01
publisher University of Ljubljana Press (Založba Univerze v Ljubljani)
record_format Article
series Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave
spelling doaj-art-9d4e22fbe85e43159fb32dc66266b2c42025-08-20T03:50:31ZengUniversity of Ljubljana Press (Založba Univerze v Ljubljani)Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave2335-27362021-07-019110.4312/slo2.0.2021.1.26-59Slovene and Croatian word embeddings in terms of gender occupational analogiesMatej Ulčar0Anka Supej1Marko Robnik-Šikonja2Senja Pollak3University of Ljubljana, Faculty of Computer and Information Science, SloveniaJožef Stefan Institute, Ljubljana, SloveniaUniversity of Ljubljana, Faculty of Computer and Information Science, SloveniaJožef Stefan Institute, Ljubljana, SloveniaIn recent years, the use of deep neural networks and dense vector embeddings for text representation have led to excellent results in the field of computational understanding of natural language. It has also been shown that word embeddings often capture gender, racial and other types of bias. The article focuses on evaluating Slovene and Croatian word embeddings in terms of gender bias using word analogy calculations. We compiled a list of masculine and feminine nouns for occupations in Slovene and evaluated the gender bias of fastText, word2vec and ELMo embeddings with different configurations and different approaches to analogy calculations. The lowest occupational gender bias was observed with the fastText embeddings. Similarly, we compared different fastText embeddings on Croatian occupational analogies. https://journals.uni-lj.si/slovenscina2/article/view/9883word embeddingsgender biasword analogy taskoccupationsnatural language processing
spellingShingle Matej Ulčar
Anka Supej
Marko Robnik-Šikonja
Senja Pollak
Slovene and Croatian word embeddings in terms of gender occupational analogies
Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave
word embeddings
gender bias
word analogy task
occupations
natural language processing
title Slovene and Croatian word embeddings in terms of gender occupational analogies
title_full Slovene and Croatian word embeddings in terms of gender occupational analogies
title_fullStr Slovene and Croatian word embeddings in terms of gender occupational analogies
title_full_unstemmed Slovene and Croatian word embeddings in terms of gender occupational analogies
title_short Slovene and Croatian word embeddings in terms of gender occupational analogies
title_sort slovene and croatian word embeddings in terms of gender occupational analogies
topic word embeddings
gender bias
word analogy task
occupations
natural language processing
url https://journals.uni-lj.si/slovenscina2/article/view/9883
work_keys_str_mv AT matejulcar sloveneandcroatianwordembeddingsintermsofgenderoccupationalanalogies
AT ankasupej sloveneandcroatianwordembeddingsintermsofgenderoccupationalanalogies
AT markorobniksikonja sloveneandcroatianwordembeddingsintermsofgenderoccupationalanalogies
AT senjapollak sloveneandcroatianwordembeddingsintermsofgenderoccupationalanalogies