Analyzing Discourses in Portuguese Word Embeddings: A Case of Gender Bias Outside the English-Speaking World

In this paper we meticulously examined a Word Embedding model in Portuguese, endeavoring to identify gender biases through diverse analytical perspectives, employing SC-WEAT and RIPA metrics that is widely used in the English realm. Our inquiry focused on three primary dimensions: (1) the frequency...

Full description

Saved in:

Bibliographic Details
Main Authors:	Fernanda Tiemi de Souza Taso, Valéria Quadros dos Reis, Fábio Viduani Martinez
Format:	Article
Language:	English
Published:	Brazilian Computer Society 2025-07-01
Series:	Journal on Interactive Systems
Subjects:	Natural Language Processing Computational Linguistics Algorithmic Sexism Ethics in AI Non-English NLP
Online Access:	https://journals-sol.sbc.org.br/index.php/jis/article/view/5958
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849710077677666304
author	Fernanda Tiemi de Souza Taso Valéria Quadros dos Reis Fábio Viduani Martinez
author_facet	Fernanda Tiemi de Souza Taso Valéria Quadros dos Reis Fábio Viduani Martinez
author_sort	Fernanda Tiemi de Souza Taso
collection	DOAJ
description	In this paper we meticulously examined a Word Embedding model in Portuguese, endeavoring to identify gender biases through diverse analytical perspectives, employing SC-WEAT and RIPA metrics that is widely used in the English realm. Our inquiry focused on three primary dimensions: (1) the frequency-based association of words with feminine and masculine terms; (2) the identification of disparities between grammatical classes pertaining to gender sets; and (3) the categorisation and grouping of feminine and masculine words, including their distinctive attributes. In regard to frequency groups, our investigation revealed a pervasive negative association of words with feminine terms in most subsets, indicative of a pronounced inclination of the model’s vocabulary towards the masculine references. Notably, among the 100 most frequent words, 89 exhibited a stronger association with masculine terms. In the scrutiny of grammatical classes, our analysis demonstrated a predominant association of adjectives with feminine references, underscoring the imperative for supplementary description when referring to women. Furthermore, a conspicuous prevalence of participle verbs associated with feminine terms was observed, a phenomenon distinct from their male counterparts and one that requires further expert attention to be properly explained. The categorisation process underscored the existence of gender bias, as exemplified by the association of words with masculine terms within the domains of sport, finance, and science, while words related to feelings, home furniture, and entertainment were associated with feminine terms. These findings assume significance in fostering a discourse on gender analysis within non-English models, such as Portuguese models, thereby encouraging the Brazilian community to actively investigate biases in NLP models.
format	Article
id	doaj-art-89f0abdc97e4437591d85c10279c7d33
institution	DOAJ
issn	2763-7719
language	English
publishDate	2025-07-01
publisher	Brazilian Computer Society
record_format	Article
series	Journal on Interactive Systems
spelling	doaj-art-89f0abdc97e4437591d85c10279c7d332025-08-20T03:15:03ZengBrazilian Computer SocietyJournal on Interactive Systems2763-77192025-07-0116110.5753/jis.2025.5958Analyzing Discourses in Portuguese Word Embeddings: A Case of Gender Bias Outside the English-Speaking WorldFernanda Tiemi de Souza Taso0Valéria Quadros dos Reis1Fábio Viduani Martinez2Federal University of Mato Grosso do SulFederal University of Mato Grosso do Sul, Leuphana University LüneburgFederal University of Mato Grosso do Sul In this paper we meticulously examined a Word Embedding model in Portuguese, endeavoring to identify gender biases through diverse analytical perspectives, employing SC-WEAT and RIPA metrics that is widely used in the English realm. Our inquiry focused on three primary dimensions: (1) the frequency-based association of words with feminine and masculine terms; (2) the identification of disparities between grammatical classes pertaining to gender sets; and (3) the categorisation and grouping of feminine and masculine words, including their distinctive attributes. In regard to frequency groups, our investigation revealed a pervasive negative association of words with feminine terms in most subsets, indicative of a pronounced inclination of the model’s vocabulary towards the masculine references. Notably, among the 100 most frequent words, 89 exhibited a stronger association with masculine terms. In the scrutiny of grammatical classes, our analysis demonstrated a predominant association of adjectives with feminine references, underscoring the imperative for supplementary description when referring to women. Furthermore, a conspicuous prevalence of participle verbs associated with feminine terms was observed, a phenomenon distinct from their male counterparts and one that requires further expert attention to be properly explained. The categorisation process underscored the existence of gender bias, as exemplified by the association of words with masculine terms within the domains of sport, finance, and science, while words related to feelings, home furniture, and entertainment were associated with feminine terms. These findings assume significance in fostering a discourse on gender analysis within non-English models, such as Portuguese models, thereby encouraging the Brazilian community to actively investigate biases in NLP models. https://journals-sol.sbc.org.br/index.php/jis/article/view/5958Natural Language ProcessingComputational LinguisticsAlgorithmic SexismEthics in AINon-English NLP
spellingShingle	Fernanda Tiemi de Souza Taso Valéria Quadros dos Reis Fábio Viduani Martinez Analyzing Discourses in Portuguese Word Embeddings: A Case of Gender Bias Outside the English-Speaking World Journal on Interactive Systems Natural Language Processing Computational Linguistics Algorithmic Sexism Ethics in AI Non-English NLP
title	Analyzing Discourses in Portuguese Word Embeddings: A Case of Gender Bias Outside the English-Speaking World
title_full	Analyzing Discourses in Portuguese Word Embeddings: A Case of Gender Bias Outside the English-Speaking World
title_fullStr	Analyzing Discourses in Portuguese Word Embeddings: A Case of Gender Bias Outside the English-Speaking World
title_full_unstemmed	Analyzing Discourses in Portuguese Word Embeddings: A Case of Gender Bias Outside the English-Speaking World
title_short	Analyzing Discourses in Portuguese Word Embeddings: A Case of Gender Bias Outside the English-Speaking World
title_sort	analyzing discourses in portuguese word embeddings a case of gender bias outside the english speaking world
topic	Natural Language Processing Computational Linguistics Algorithmic Sexism Ethics in AI Non-English NLP
url	https://journals-sol.sbc.org.br/index.php/jis/article/view/5958
work_keys_str_mv	AT fernandatiemidesouzataso analyzingdiscoursesinportuguesewordembeddingsacaseofgenderbiasoutsidetheenglishspeakingworld AT valeriaquadrosdosreis analyzingdiscoursesinportuguesewordembeddingsacaseofgenderbiasoutsidetheenglishspeakingworld AT fabioviduanimartinez analyzingdiscoursesinportuguesewordembeddingsacaseofgenderbiasoutsidetheenglishspeakingworld

Analyzing Discourses in Portuguese Word Embeddings: A Case of Gender Bias Outside the English-Speaking World

Similar Items