Multifaceted Assessment of Responsible Use and Bias in Language Models for Education

Large language models (LLMs) are increasingly being utilized to develop tools and services in various domains, including education. However, due to the nature of the training data, these models are susceptible to inherent social or cognitive biases, which can influence their outputs. Furthermore, th...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ishrat Ahmed, Wenxing Liu, Rod D. Roscoe, Elizabeth Reilley, Danielle S. McNamara
Format:	Article
Language:	English
Published:	MDPI AG 2025-03-01
Series:	Computers
Subjects:	biases large language models LLM-as-a-judge evaluation educational chatbot higher-Ed
Online Access:	https://www.mdpi.com/2073-431X/14/3/100
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849342810061275136
author	Ishrat Ahmed Wenxing Liu Rod D. Roscoe Elizabeth Reilley Danielle S. McNamara
author_facet	Ishrat Ahmed Wenxing Liu Rod D. Roscoe Elizabeth Reilley Danielle S. McNamara
author_sort	Ishrat Ahmed
collection	DOAJ
description	Large language models (LLMs) are increasingly being utilized to develop tools and services in various domains, including education. However, due to the nature of the training data, these models are susceptible to inherent social or cognitive biases, which can influence their outputs. Furthermore, their handling of critical topics, such as privacy and sensitive questions, is essential for responsible deployment. This study proposes a framework for the automatic detection of biases and violations of responsible use using a synthetic question-based dataset mimicking student–chatbot interactions. We employ the LLM-as-a-judge method to evaluate multiple LLMs for biased responses. Our findings show that some models exhibit more bias than others, highlighting the need for careful consideration when selecting models for deployment in educational and other high-stakes applications. These results emphasize the importance of addressing bias in LLMs and implementing robust mechanisms to uphold responsible AI use in real-world services.
format	Article
id	doaj-art-032f608287824b9fae5c8b030f43885c
institution	Kabale University
issn	2073-431X
language	English
publishDate	2025-03-01
publisher	MDPI AG
record_format	Article
series	Computers
spelling	doaj-art-032f608287824b9fae5c8b030f43885c2025-08-20T03:43:15ZengMDPI AGComputers2073-431X2025-03-0114310010.3390/computers14030100Multifaceted Assessment of Responsible Use and Bias in Language Models for EducationIshrat Ahmed0Wenxing Liu1Rod D. Roscoe2Elizabeth Reilley3Danielle S. McNamara4Learning Engineering Institute, Arizona State University, Tempe, AZ 85281, USAEnterprise Technology-AI Acceleration, Arizona State University, Tempe, AZ 85281, USALearning Engineering Institute, Arizona State University, Tempe, AZ 85281, USAEnterprise Technology-AI Acceleration, Arizona State University, Tempe, AZ 85281, USALearning Engineering Institute, Arizona State University, Tempe, AZ 85281, USALarge language models (LLMs) are increasingly being utilized to develop tools and services in various domains, including education. However, due to the nature of the training data, these models are susceptible to inherent social or cognitive biases, which can influence their outputs. Furthermore, their handling of critical topics, such as privacy and sensitive questions, is essential for responsible deployment. This study proposes a framework for the automatic detection of biases and violations of responsible use using a synthetic question-based dataset mimicking student–chatbot interactions. We employ the LLM-as-a-judge method to evaluate multiple LLMs for biased responses. Our findings show that some models exhibit more bias than others, highlighting the need for careful consideration when selecting models for deployment in educational and other high-stakes applications. These results emphasize the importance of addressing bias in LLMs and implementing robust mechanisms to uphold responsible AI use in real-world services.https://www.mdpi.com/2073-431X/14/3/100biaseslarge language modelsLLM-as-a-judgeevaluationeducational chatbothigher-Ed
spellingShingle	Ishrat Ahmed Wenxing Liu Rod D. Roscoe Elizabeth Reilley Danielle S. McNamara Multifaceted Assessment of Responsible Use and Bias in Language Models for Education Computers biases large language models LLM-as-a-judge evaluation educational chatbot higher-Ed
title	Multifaceted Assessment of Responsible Use and Bias in Language Models for Education
title_full	Multifaceted Assessment of Responsible Use and Bias in Language Models for Education
title_fullStr	Multifaceted Assessment of Responsible Use and Bias in Language Models for Education
title_full_unstemmed	Multifaceted Assessment of Responsible Use and Bias in Language Models for Education
title_short	Multifaceted Assessment of Responsible Use and Bias in Language Models for Education
title_sort	multifaceted assessment of responsible use and bias in language models for education
topic	biases large language models LLM-as-a-judge evaluation educational chatbot higher-Ed
url	https://www.mdpi.com/2073-431X/14/3/100
work_keys_str_mv	AT ishratahmed multifacetedassessmentofresponsibleuseandbiasinlanguagemodelsforeducation AT wenxingliu multifacetedassessmentofresponsibleuseandbiasinlanguagemodelsforeducation AT roddroscoe multifacetedassessmentofresponsibleuseandbiasinlanguagemodelsforeducation AT elizabethreilley multifacetedassessmentofresponsibleuseandbiasinlanguagemodelsforeducation AT daniellesmcnamara multifacetedassessmentofresponsibleuseandbiasinlanguagemodelsforeducation

Multifaceted Assessment of Responsible Use and Bias in Language Models for Education

Similar Items