Multifaceted Assessment of Responsible Use and Bias in Language Models for Education

Large language models (LLMs) are increasingly being utilized to develop tools and services in various domains, including education. However, due to the nature of the training data, these models are susceptible to inherent social or cognitive biases, which can influence their outputs. Furthermore, th...

Full description

Saved in:
Bibliographic Details
Main Authors: Ishrat Ahmed, Wenxing Liu, Rod D. Roscoe, Elizabeth Reilley, Danielle S. McNamara
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Computers
Subjects:
Online Access:https://www.mdpi.com/2073-431X/14/3/100
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849342810061275136
author Ishrat Ahmed
Wenxing Liu
Rod D. Roscoe
Elizabeth Reilley
Danielle S. McNamara
author_facet Ishrat Ahmed
Wenxing Liu
Rod D. Roscoe
Elizabeth Reilley
Danielle S. McNamara
author_sort Ishrat Ahmed
collection DOAJ
description Large language models (LLMs) are increasingly being utilized to develop tools and services in various domains, including education. However, due to the nature of the training data, these models are susceptible to inherent social or cognitive biases, which can influence their outputs. Furthermore, their handling of critical topics, such as privacy and sensitive questions, is essential for responsible deployment. This study proposes a framework for the automatic detection of biases and violations of responsible use using a synthetic question-based dataset mimicking student–chatbot interactions. We employ the LLM-as-a-judge method to evaluate multiple LLMs for biased responses. Our findings show that some models exhibit more bias than others, highlighting the need for careful consideration when selecting models for deployment in educational and other high-stakes applications. These results emphasize the importance of addressing bias in LLMs and implementing robust mechanisms to uphold responsible AI use in real-world services.
format Article
id doaj-art-032f608287824b9fae5c8b030f43885c
institution Kabale University
issn 2073-431X
language English
publishDate 2025-03-01
publisher MDPI AG
record_format Article
series Computers
spelling doaj-art-032f608287824b9fae5c8b030f43885c2025-08-20T03:43:15ZengMDPI AGComputers2073-431X2025-03-0114310010.3390/computers14030100Multifaceted Assessment of Responsible Use and Bias in Language Models for EducationIshrat Ahmed0Wenxing Liu1Rod D. Roscoe2Elizabeth Reilley3Danielle S. McNamara4Learning Engineering Institute, Arizona State University, Tempe, AZ 85281, USAEnterprise Technology-AI Acceleration, Arizona State University, Tempe, AZ 85281, USALearning Engineering Institute, Arizona State University, Tempe, AZ 85281, USAEnterprise Technology-AI Acceleration, Arizona State University, Tempe, AZ 85281, USALearning Engineering Institute, Arizona State University, Tempe, AZ 85281, USALarge language models (LLMs) are increasingly being utilized to develop tools and services in various domains, including education. However, due to the nature of the training data, these models are susceptible to inherent social or cognitive biases, which can influence their outputs. Furthermore, their handling of critical topics, such as privacy and sensitive questions, is essential for responsible deployment. This study proposes a framework for the automatic detection of biases and violations of responsible use using a synthetic question-based dataset mimicking student–chatbot interactions. We employ the LLM-as-a-judge method to evaluate multiple LLMs for biased responses. Our findings show that some models exhibit more bias than others, highlighting the need for careful consideration when selecting models for deployment in educational and other high-stakes applications. These results emphasize the importance of addressing bias in LLMs and implementing robust mechanisms to uphold responsible AI use in real-world services.https://www.mdpi.com/2073-431X/14/3/100biaseslarge language modelsLLM-as-a-judgeevaluationeducational chatbothigher-Ed
spellingShingle Ishrat Ahmed
Wenxing Liu
Rod D. Roscoe
Elizabeth Reilley
Danielle S. McNamara
Multifaceted Assessment of Responsible Use and Bias in Language Models for Education
Computers
biases
large language models
LLM-as-a-judge
evaluation
educational chatbot
higher-Ed
title Multifaceted Assessment of Responsible Use and Bias in Language Models for Education
title_full Multifaceted Assessment of Responsible Use and Bias in Language Models for Education
title_fullStr Multifaceted Assessment of Responsible Use and Bias in Language Models for Education
title_full_unstemmed Multifaceted Assessment of Responsible Use and Bias in Language Models for Education
title_short Multifaceted Assessment of Responsible Use and Bias in Language Models for Education
title_sort multifaceted assessment of responsible use and bias in language models for education
topic biases
large language models
LLM-as-a-judge
evaluation
educational chatbot
higher-Ed
url https://www.mdpi.com/2073-431X/14/3/100
work_keys_str_mv AT ishratahmed multifacetedassessmentofresponsibleuseandbiasinlanguagemodelsforeducation
AT wenxingliu multifacetedassessmentofresponsibleuseandbiasinlanguagemodelsforeducation
AT roddroscoe multifacetedassessmentofresponsibleuseandbiasinlanguagemodelsforeducation
AT elizabethreilley multifacetedassessmentofresponsibleuseandbiasinlanguagemodelsforeducation
AT daniellesmcnamara multifacetedassessmentofresponsibleuseandbiasinlanguagemodelsforeducation