Stereotypical bias amplification and reversal in an experimental model of human interaction with generative artificial intelligence
Stereotypical biases are readily acquired and expressed by generative artificial intelligence (AI), causing growing societal concern about these systems amplifying existing human bias. This concern rests on reasonable psychological assumptions, but stereotypical bias amplification during human–AI in...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
The Royal Society
2025-04-01
|
| Series: | Royal Society Open Science |
| Subjects: | |
| Online Access: | https://royalsocietypublishing.org/doi/10.1098/rsos.241472 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850215813670240256 |
|---|---|
| author | Kevin Allan Jacobo Azcona Somayajulu Sripada Georgios Leontidis Clare A. M. Sutherland Louise H. Phillips Douglas Martin |
| author_facet | Kevin Allan Jacobo Azcona Somayajulu Sripada Georgios Leontidis Clare A. M. Sutherland Louise H. Phillips Douglas Martin |
| author_sort | Kevin Allan |
| collection | DOAJ |
| description | Stereotypical biases are readily acquired and expressed by generative artificial intelligence (AI), causing growing societal concern about these systems amplifying existing human bias. This concern rests on reasonable psychological assumptions, but stereotypical bias amplification during human–AI interaction relative to pre-existing baseline levels has not been demonstrated. Here, we use previous psychological work on gendered character traits to capture and control gender stereotypes expressed in character descriptions generated by Open AI’s GPT3.5. In four experiments (N = 782) with a first impressions task, we find that unexplained (‘black-box’) character recommendations using stereotypical traits already convey a potent persuasive influence significantly amplifying baseline stereotyping within first impressions. Recommendations that are counter-stereotypical eliminate and effectively reverse human baseline bias, but these stereotype-challenging influences propagate less well than reinforcing influences from stereotypical recommendations. Critically, the bias amplification and reversal phenomena occur when GPT3.5 elaborates on the core stereotypical content, although GPT3.5’s explanations propagate counter-stereotypical influence more effectively and persuasively than black-box recommendations. Our findings strongly imply that without robust safeguards, generative AI will amplify existing bias. But with safeguards, existing bias can be eliminated and even reversed. Our novel approach safely allows such effects to be studied in various contexts where gender and other bias-inducing social stereotypes operate. |
| format | Article |
| id | doaj-art-6f1e68b8189744dea325ee5252bef116 |
| institution | OA Journals |
| issn | 2054-5703 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | The Royal Society |
| record_format | Article |
| series | Royal Society Open Science |
| spelling | doaj-art-6f1e68b8189744dea325ee5252bef1162025-08-20T02:08:30ZengThe Royal SocietyRoyal Society Open Science2054-57032025-04-0112410.1098/rsos.241472Stereotypical bias amplification and reversal in an experimental model of human interaction with generative artificial intelligenceKevin Allan0Jacobo Azcona1Somayajulu Sripada2Georgios Leontidis3Clare A. M. Sutherland4Louise H. Phillips5Douglas Martin6University of Aberdeen, Aberdeen, UKUniversity of Aberdeen, Aberdeen, UKUniversity of Aberdeen, Aberdeen, UKUniversity of Aberdeen, Aberdeen, UKUniversity of Aberdeen, Aberdeen, UKUniversity of Aberdeen, Aberdeen, UKUniversity of Aberdeen, Aberdeen, UKStereotypical biases are readily acquired and expressed by generative artificial intelligence (AI), causing growing societal concern about these systems amplifying existing human bias. This concern rests on reasonable psychological assumptions, but stereotypical bias amplification during human–AI interaction relative to pre-existing baseline levels has not been demonstrated. Here, we use previous psychological work on gendered character traits to capture and control gender stereotypes expressed in character descriptions generated by Open AI’s GPT3.5. In four experiments (N = 782) with a first impressions task, we find that unexplained (‘black-box’) character recommendations using stereotypical traits already convey a potent persuasive influence significantly amplifying baseline stereotyping within first impressions. Recommendations that are counter-stereotypical eliminate and effectively reverse human baseline bias, but these stereotype-challenging influences propagate less well than reinforcing influences from stereotypical recommendations. Critically, the bias amplification and reversal phenomena occur when GPT3.5 elaborates on the core stereotypical content, although GPT3.5’s explanations propagate counter-stereotypical influence more effectively and persuasively than black-box recommendations. Our findings strongly imply that without robust safeguards, generative AI will amplify existing bias. But with safeguards, existing bias can be eliminated and even reversed. Our novel approach safely allows such effects to be studied in various contexts where gender and other bias-inducing social stereotypes operate.https://royalsocietypublishing.org/doi/10.1098/rsos.241472human–AI interactionstereotypeslarge language modelsbias in AIbias amplificationgenerative AI |
| spellingShingle | Kevin Allan Jacobo Azcona Somayajulu Sripada Georgios Leontidis Clare A. M. Sutherland Louise H. Phillips Douglas Martin Stereotypical bias amplification and reversal in an experimental model of human interaction with generative artificial intelligence Royal Society Open Science human–AI interaction stereotypes large language models bias in AI bias amplification generative AI |
| title | Stereotypical bias amplification and reversal in an experimental model of human interaction with generative artificial intelligence |
| title_full | Stereotypical bias amplification and reversal in an experimental model of human interaction with generative artificial intelligence |
| title_fullStr | Stereotypical bias amplification and reversal in an experimental model of human interaction with generative artificial intelligence |
| title_full_unstemmed | Stereotypical bias amplification and reversal in an experimental model of human interaction with generative artificial intelligence |
| title_short | Stereotypical bias amplification and reversal in an experimental model of human interaction with generative artificial intelligence |
| title_sort | stereotypical bias amplification and reversal in an experimental model of human interaction with generative artificial intelligence |
| topic | human–AI interaction stereotypes large language models bias in AI bias amplification generative AI |
| url | https://royalsocietypublishing.org/doi/10.1098/rsos.241472 |
| work_keys_str_mv | AT kevinallan stereotypicalbiasamplificationandreversalinanexperimentalmodelofhumaninteractionwithgenerativeartificialintelligence AT jacoboazcona stereotypicalbiasamplificationandreversalinanexperimentalmodelofhumaninteractionwithgenerativeartificialintelligence AT somayajulusripada stereotypicalbiasamplificationandreversalinanexperimentalmodelofhumaninteractionwithgenerativeartificialintelligence AT georgiosleontidis stereotypicalbiasamplificationandreversalinanexperimentalmodelofhumaninteractionwithgenerativeartificialintelligence AT clareamsutherland stereotypicalbiasamplificationandreversalinanexperimentalmodelofhumaninteractionwithgenerativeartificialintelligence AT louisehphillips stereotypicalbiasamplificationandreversalinanexperimentalmodelofhumaninteractionwithgenerativeartificialintelligence AT douglasmartin stereotypicalbiasamplificationandreversalinanexperimentalmodelofhumaninteractionwithgenerativeartificialintelligence |