Stereotypical bias amplification and reversal in an experimental model of human interaction with generative artificial intelligence

Stereotypical biases are readily acquired and expressed by generative artificial intelligence (AI), causing growing societal concern about these systems amplifying existing human bias. This concern rests on reasonable psychological assumptions, but stereotypical bias amplification during human–AI in...

Full description

Saved in:
Bibliographic Details
Main Authors: Kevin Allan, Jacobo Azcona, Somayajulu Sripada, Georgios Leontidis, Clare A. M. Sutherland, Louise H. Phillips, Douglas Martin
Format: Article
Language:English
Published: The Royal Society 2025-04-01
Series:Royal Society Open Science
Subjects:
Online Access:https://royalsocietypublishing.org/doi/10.1098/rsos.241472
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850215813670240256
author Kevin Allan
Jacobo Azcona
Somayajulu Sripada
Georgios Leontidis
Clare A. M. Sutherland
Louise H. Phillips
Douglas Martin
author_facet Kevin Allan
Jacobo Azcona
Somayajulu Sripada
Georgios Leontidis
Clare A. M. Sutherland
Louise H. Phillips
Douglas Martin
author_sort Kevin Allan
collection DOAJ
description Stereotypical biases are readily acquired and expressed by generative artificial intelligence (AI), causing growing societal concern about these systems amplifying existing human bias. This concern rests on reasonable psychological assumptions, but stereotypical bias amplification during human–AI interaction relative to pre-existing baseline levels has not been demonstrated. Here, we use previous psychological work on gendered character traits to capture and control gender stereotypes expressed in character descriptions generated by Open AI’s GPT3.5. In four experiments (N = 782) with a first impressions task, we find that unexplained (‘black-box’) character recommendations using stereotypical traits already convey a potent persuasive influence significantly amplifying baseline stereotyping within first impressions. Recommendations that are counter-stereotypical eliminate and effectively reverse human baseline bias, but these stereotype-challenging influences propagate less well than reinforcing influences from stereotypical recommendations. Critically, the bias amplification and reversal phenomena occur when GPT3.5 elaborates on the core stereotypical content, although GPT3.5’s explanations propagate counter-stereotypical influence more effectively and persuasively than black-box recommendations. Our findings strongly imply that without robust safeguards, generative AI will amplify existing bias. But with safeguards, existing bias can be eliminated and even reversed. Our novel approach safely allows such effects to be studied in various contexts where gender and other bias-inducing social stereotypes operate.
format Article
id doaj-art-6f1e68b8189744dea325ee5252bef116
institution OA Journals
issn 2054-5703
language English
publishDate 2025-04-01
publisher The Royal Society
record_format Article
series Royal Society Open Science
spelling doaj-art-6f1e68b8189744dea325ee5252bef1162025-08-20T02:08:30ZengThe Royal SocietyRoyal Society Open Science2054-57032025-04-0112410.1098/rsos.241472Stereotypical bias amplification and reversal in an experimental model of human interaction with generative artificial intelligenceKevin Allan0Jacobo Azcona1Somayajulu Sripada2Georgios Leontidis3Clare A. M. Sutherland4Louise H. Phillips5Douglas Martin6University of Aberdeen, Aberdeen, UKUniversity of Aberdeen, Aberdeen, UKUniversity of Aberdeen, Aberdeen, UKUniversity of Aberdeen, Aberdeen, UKUniversity of Aberdeen, Aberdeen, UKUniversity of Aberdeen, Aberdeen, UKUniversity of Aberdeen, Aberdeen, UKStereotypical biases are readily acquired and expressed by generative artificial intelligence (AI), causing growing societal concern about these systems amplifying existing human bias. This concern rests on reasonable psychological assumptions, but stereotypical bias amplification during human–AI interaction relative to pre-existing baseline levels has not been demonstrated. Here, we use previous psychological work on gendered character traits to capture and control gender stereotypes expressed in character descriptions generated by Open AI’s GPT3.5. In four experiments (N = 782) with a first impressions task, we find that unexplained (‘black-box’) character recommendations using stereotypical traits already convey a potent persuasive influence significantly amplifying baseline stereotyping within first impressions. Recommendations that are counter-stereotypical eliminate and effectively reverse human baseline bias, but these stereotype-challenging influences propagate less well than reinforcing influences from stereotypical recommendations. Critically, the bias amplification and reversal phenomena occur when GPT3.5 elaborates on the core stereotypical content, although GPT3.5’s explanations propagate counter-stereotypical influence more effectively and persuasively than black-box recommendations. Our findings strongly imply that without robust safeguards, generative AI will amplify existing bias. But with safeguards, existing bias can be eliminated and even reversed. Our novel approach safely allows such effects to be studied in various contexts where gender and other bias-inducing social stereotypes operate.https://royalsocietypublishing.org/doi/10.1098/rsos.241472human–AI interactionstereotypeslarge language modelsbias in AIbias amplificationgenerative AI
spellingShingle Kevin Allan
Jacobo Azcona
Somayajulu Sripada
Georgios Leontidis
Clare A. M. Sutherland
Louise H. Phillips
Douglas Martin
Stereotypical bias amplification and reversal in an experimental model of human interaction with generative artificial intelligence
Royal Society Open Science
human–AI interaction
stereotypes
large language models
bias in AI
bias amplification
generative AI
title Stereotypical bias amplification and reversal in an experimental model of human interaction with generative artificial intelligence
title_full Stereotypical bias amplification and reversal in an experimental model of human interaction with generative artificial intelligence
title_fullStr Stereotypical bias amplification and reversal in an experimental model of human interaction with generative artificial intelligence
title_full_unstemmed Stereotypical bias amplification and reversal in an experimental model of human interaction with generative artificial intelligence
title_short Stereotypical bias amplification and reversal in an experimental model of human interaction with generative artificial intelligence
title_sort stereotypical bias amplification and reversal in an experimental model of human interaction with generative artificial intelligence
topic human–AI interaction
stereotypes
large language models
bias in AI
bias amplification
generative AI
url https://royalsocietypublishing.org/doi/10.1098/rsos.241472
work_keys_str_mv AT kevinallan stereotypicalbiasamplificationandreversalinanexperimentalmodelofhumaninteractionwithgenerativeartificialintelligence
AT jacoboazcona stereotypicalbiasamplificationandreversalinanexperimentalmodelofhumaninteractionwithgenerativeartificialintelligence
AT somayajulusripada stereotypicalbiasamplificationandreversalinanexperimentalmodelofhumaninteractionwithgenerativeartificialintelligence
AT georgiosleontidis stereotypicalbiasamplificationandreversalinanexperimentalmodelofhumaninteractionwithgenerativeartificialintelligence
AT clareamsutherland stereotypicalbiasamplificationandreversalinanexperimentalmodelofhumaninteractionwithgenerativeartificialintelligence
AT louisehphillips stereotypicalbiasamplificationandreversalinanexperimentalmodelofhumaninteractionwithgenerativeartificialintelligence
AT douglasmartin stereotypicalbiasamplificationandreversalinanexperimentalmodelofhumaninteractionwithgenerativeartificialintelligence