Stereotypical bias amplification and reversal in an experimental model of human interaction with generative artificial intelligence

Stereotypical biases are readily acquired and expressed by generative artificial intelligence (AI), causing growing societal concern about these systems amplifying existing human bias. This concern rests on reasonable psychological assumptions, but stereotypical bias amplification during human–AI in...

Full description

Saved in:
Bibliographic Details
Main Authors: Kevin Allan, Jacobo Azcona, Somayajulu Sripada, Georgios Leontidis, Clare A. M. Sutherland, Louise H. Phillips, Douglas Martin
Format: Article
Language:English
Published: The Royal Society 2025-04-01
Series:Royal Society Open Science
Subjects:
Online Access:https://royalsocietypublishing.org/doi/10.1098/rsos.241472
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Stereotypical biases are readily acquired and expressed by generative artificial intelligence (AI), causing growing societal concern about these systems amplifying existing human bias. This concern rests on reasonable psychological assumptions, but stereotypical bias amplification during human–AI interaction relative to pre-existing baseline levels has not been demonstrated. Here, we use previous psychological work on gendered character traits to capture and control gender stereotypes expressed in character descriptions generated by Open AI’s GPT3.5. In four experiments (N = 782) with a first impressions task, we find that unexplained (‘black-box’) character recommendations using stereotypical traits already convey a potent persuasive influence significantly amplifying baseline stereotyping within first impressions. Recommendations that are counter-stereotypical eliminate and effectively reverse human baseline bias, but these stereotype-challenging influences propagate less well than reinforcing influences from stereotypical recommendations. Critically, the bias amplification and reversal phenomena occur when GPT3.5 elaborates on the core stereotypical content, although GPT3.5’s explanations propagate counter-stereotypical influence more effectively and persuasively than black-box recommendations. Our findings strongly imply that without robust safeguards, generative AI will amplify existing bias. But with safeguards, existing bias can be eliminated and even reversed. Our novel approach safely allows such effects to be studied in various contexts where gender and other bias-inducing social stereotypes operate.
ISSN:2054-5703