Quantifying Gender Bias in Large Language Models Using Information-Theoretic and Statistical Analysis

Large language models (LLMs) have revolutionized natural language processing across diverse domains, yet they also raise critical fairness and ethical concerns, particularly regarding gender bias. In this study, we conduct a systematic, mathematically grounded investigation of gender bias in four le...

Full description

Saved in:
Bibliographic Details
Main Authors: Imran Mirza, Akbar Anbar Jafari, Cagri Ozcinar, Gholamreza Anbarjafari
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/16/5/358
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Large language models (LLMs) have revolutionized natural language processing across diverse domains, yet they also raise critical fairness and ethical concerns, particularly regarding gender bias. In this study, we conduct a systematic, mathematically grounded investigation of gender bias in four leading LLMs—GPT-4o, Gemini 1.5 Pro, Sonnet 3.5, and LLaMA 3.1:8b—by evaluating the gender distributions produced when generating “perfect personas” for a wide range of occupational roles spanning healthcare, engineering, and professional services. Leveraging standardized prompts, controlled experimental settings, and repeated trials, our methodology quantifies bias against an ideal uniform distribution using rigorous statistical measures and information-theoretic metrics. Our results reveal marked discrepancies: GPT-4o exhibits pronounced occupational gender segregation, disproportionately linking healthcare roles to female identities while assigning male labels to engineering and physically demanding positions. In contrast, Gemini 1.5 Pro, Sonnet 3.5, and LLaMA 3.1:8b predominantly favor female assignments, albeit with less job-specific precision. These findings demonstrate how architectural decisions, training data composition, and token embedding strategies critically influence gender representation. The study underscores the urgent need for inclusive datasets, advanced bias-mitigation techniques, and continuous model audits to develop AI systems that are not only free from stereotype perpetuation but actively promote equitable and representative information processing.
ISSN:2078-2489