Whose voice matters? Word embeddings reveal identity bias in news quotes

Abstract This paper investigates identity bias (gender and race) in the South African news selection and representation of COVID-19 vaccination quotes. Social bias studies have qualitatively examined race and gender bias in South African news, given South Africa’s apartheid history; yet, studies tha...

Full description

Saved in:
Bibliographic Details
Main Authors: Nnaemeka Ohamadike, Kevin Durrheim, Mpho Primus
Format: Article
Language:English
Published: SpringerOpen 2025-04-01
Series:EPJ Data Science
Subjects:
Online Access:https://doi.org/10.1140/epjds/s13688-025-00541-1
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850181266161270784
author Nnaemeka Ohamadike
Kevin Durrheim
Mpho Primus
author_facet Nnaemeka Ohamadike
Kevin Durrheim
Mpho Primus
author_sort Nnaemeka Ohamadike
collection DOAJ
description Abstract This paper investigates identity bias (gender and race) in the South African news selection and representation of COVID-19 vaccination quotes. Social bias studies have qualitatively examined race and gender bias in South African news, given South Africa’s apartheid history; yet, studies that examine and quantify these biases at the speaker level using news quotes from a representative South African news corpus remain limited. To address this gap, we examined race and gender bias in news selection and framing of quotes. We used word embedding trained on 22,627 vaccination quotes from 76 South African news sources between 2020 and 2023. These large-scale processing embeddings are unbiased by design but can learn and uncover biases hidden in language. Our findings reveal gender and race bias in the news selection and framing of quotes – journalists privilege White voices as more authoritative and connected to global and technical vaccination discourse but confine black voices to primarily localised contexts. They also quote male speakers more frequently in the news than females. In an era where human biases are becoming increasingly implicit, we argue that embeddings offer a robust tool to unearth, monitor, and evaluate these biases at the micro or speaker level in the news.
format Article
id doaj-art-a7cc46117aef42ba92b320a351f6e2fd
institution OA Journals
issn 2193-1127
language English
publishDate 2025-04-01
publisher SpringerOpen
record_format Article
series EPJ Data Science
spelling doaj-art-a7cc46117aef42ba92b320a351f6e2fd2025-08-20T02:17:57ZengSpringerOpenEPJ Data Science2193-11272025-04-0114111810.1140/epjds/s13688-025-00541-1Whose voice matters? Word embeddings reveal identity bias in news quotesNnaemeka Ohamadike0Kevin Durrheim1Mpho Primus2Centre for Applied Data Science, University of JohannesburgDepartment of Psychology, University of JohannesburgInstitute for Artificial Intelligence Systems, University of JohannesburgAbstract This paper investigates identity bias (gender and race) in the South African news selection and representation of COVID-19 vaccination quotes. Social bias studies have qualitatively examined race and gender bias in South African news, given South Africa’s apartheid history; yet, studies that examine and quantify these biases at the speaker level using news quotes from a representative South African news corpus remain limited. To address this gap, we examined race and gender bias in news selection and framing of quotes. We used word embedding trained on 22,627 vaccination quotes from 76 South African news sources between 2020 and 2023. These large-scale processing embeddings are unbiased by design but can learn and uncover biases hidden in language. Our findings reveal gender and race bias in the news selection and framing of quotes – journalists privilege White voices as more authoritative and connected to global and technical vaccination discourse but confine black voices to primarily localised contexts. They also quote male speakers more frequently in the news than females. In an era where human biases are becoming increasingly implicit, we argue that embeddings offer a robust tool to unearth, monitor, and evaluate these biases at the micro or speaker level in the news.https://doi.org/10.1140/epjds/s13688-025-00541-1Word embeddingRace biasGender biasCOVID-19 vaccinationNews mediaSouth Africa
spellingShingle Nnaemeka Ohamadike
Kevin Durrheim
Mpho Primus
Whose voice matters? Word embeddings reveal identity bias in news quotes
EPJ Data Science
Word embedding
Race bias
Gender bias
COVID-19 vaccination
News media
South Africa
title Whose voice matters? Word embeddings reveal identity bias in news quotes
title_full Whose voice matters? Word embeddings reveal identity bias in news quotes
title_fullStr Whose voice matters? Word embeddings reveal identity bias in news quotes
title_full_unstemmed Whose voice matters? Word embeddings reveal identity bias in news quotes
title_short Whose voice matters? Word embeddings reveal identity bias in news quotes
title_sort whose voice matters word embeddings reveal identity bias in news quotes
topic Word embedding
Race bias
Gender bias
COVID-19 vaccination
News media
South Africa
url https://doi.org/10.1140/epjds/s13688-025-00541-1
work_keys_str_mv AT nnaemekaohamadike whosevoicematterswordembeddingsrevealidentitybiasinnewsquotes
AT kevindurrheim whosevoicematterswordembeddingsrevealidentitybiasinnewsquotes
AT mphoprimus whosevoicematterswordembeddingsrevealidentitybiasinnewsquotes