Quantifying Gender Bias in Large Language Models Using Information-Theoretic and Statistical Analysis
Large language models (LLMs) have revolutionized natural language processing across diverse domains, yet they also raise critical fairness and ethical concerns, particularly regarding gender bias. In this study, we conduct a systematic, mathematically grounded investigation of gender bias in four le...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-04-01
|
| Series: | Information |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2078-2489/16/5/358 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Large language models (LLMs) have revolutionized natural language processing across diverse domains, yet they also raise critical fairness and ethical concerns, particularly regarding gender bias. In this study, we conduct a systematic, mathematically grounded investigation of gender bias in four leading LLMs—GPT-4o, Gemini 1.5 Pro, Sonnet 3.5, and LLaMA 3.1:8b—by evaluating the gender distributions produced when generating “perfect personas” for a wide range of occupational roles spanning healthcare, engineering, and professional services. Leveraging standardized prompts, controlled experimental settings, and repeated trials, our methodology quantifies bias against an ideal uniform distribution using rigorous statistical measures and information-theoretic metrics. Our results reveal marked discrepancies: GPT-4o exhibits pronounced occupational gender segregation, disproportionately linking healthcare roles to female identities while assigning male labels to engineering and physically demanding positions. In contrast, Gemini 1.5 Pro, Sonnet 3.5, and LLaMA 3.1:8b predominantly favor female assignments, albeit with less job-specific precision. These findings demonstrate how architectural decisions, training data composition, and token embedding strategies critically influence gender representation. The study underscores the urgent need for inclusive datasets, advanced bias-mitigation techniques, and continuous model audits to develop AI systems that are not only free from stereotype perpetuation but actively promote equitable and representative information processing. |
|---|---|
| ISSN: | 2078-2489 |