Gender Disparities in Artificial Intelligence–Generated Images of Hospital Leadership in the United States

Objective: To evaluate demographic representation in artificial intelligence (AI)–generated images of hospital leadership roles and compare them with real-world data from US hospitals. Patients and Methods: This cross-sectional study, conducted from October 1, 2024 to October 31, 2024, analyzed imag...

Full description

Saved in:
Bibliographic Details
Main Authors: Mia Gisselbaek, MD, Joana Berger-Estilita, MD, PhD, Laurens Minsart, MD, Ekin Köselerli, MD, Arnout Devos, PhD, Francisco Maio Matos, PhD, Odmara L. Barreto Chang, MD, PhD, Peter Dieckmann, PhD, Melanie Suppan, MD, Sarah Saxena, MD, PhD
Format: Article
Language:English
Published: Elsevier 2025-06-01
Series:Mayo Clinic Proceedings: Digital Health
Online Access:http://www.sciencedirect.com/science/article/pii/S2949761225000252
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Objective: To evaluate demographic representation in artificial intelligence (AI)–generated images of hospital leadership roles and compare them with real-world data from US hospitals. Patients and Methods: This cross-sectional study, conducted from October 1, 2024 to October 31, 2024, analyzed images generated by 3 AI text-to-image models: Midjourney 6.0, OpenAI ChatGPT DALL-E 3, and Google Gemini Imagen 3. Standardized prompts were used to create 1200 images representing 4 key leadership roles: chief executive officers, chief medical officers, chief nursing officers, and chief financial officers. Real-world demographic data from 4397 US hospitals showed that chief executive officers were 73.2% men; chief financial officers, 65.2% men; chief medical officers, 85.7% men; and chief nursing officers, 9.4% men (overall: 60.1% men). The primary outcome was gender representation, with secondary outcomes including race/ethnicity and age. Two independent reviewers assessed images, with interrater reliability evaluated using Cohen κ. Results: Interrater agreement was high for gender (κ=0.998) and moderate for race/ethnicity (κ=0.670) and age (κ=0.605). DALL-E overrepresented men (86.5%) and White individuals (94.5%). Midjourney showed improved gender balance (69.5% men) but overrepresented White individuals (75.0%). Imagen achieved near gender parity (50.3% men) but remained predominantly White (51.5%). Statistically significant differences were observed across models and between models and real-world demographics. Conclusion: Artificial intelligence text-to-image models reflect and amplify systemic biases, overrepresenting men and White leaders, while underrepresenting diversity. Ethical AI practices, including diverse training data sets and fairness-aware algorithms, are essential to ensure equitable representation in health care leadership.
ISSN:2949-7612