Multifaceted Assessment of Responsible Use and Bias in Language Models for Education
Large language models (LLMs) are increasingly being utilized to develop tools and services in various domains, including education. However, due to the nature of the training data, these models are susceptible to inherent social or cognitive biases, which can influence their outputs. Furthermore, th...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-03-01
|
| Series: | Computers |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2073-431X/14/3/100 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849342810061275136 |
|---|---|
| author | Ishrat Ahmed Wenxing Liu Rod D. Roscoe Elizabeth Reilley Danielle S. McNamara |
| author_facet | Ishrat Ahmed Wenxing Liu Rod D. Roscoe Elizabeth Reilley Danielle S. McNamara |
| author_sort | Ishrat Ahmed |
| collection | DOAJ |
| description | Large language models (LLMs) are increasingly being utilized to develop tools and services in various domains, including education. However, due to the nature of the training data, these models are susceptible to inherent social or cognitive biases, which can influence their outputs. Furthermore, their handling of critical topics, such as privacy and sensitive questions, is essential for responsible deployment. This study proposes a framework for the automatic detection of biases and violations of responsible use using a synthetic question-based dataset mimicking student–chatbot interactions. We employ the LLM-as-a-judge method to evaluate multiple LLMs for biased responses. Our findings show that some models exhibit more bias than others, highlighting the need for careful consideration when selecting models for deployment in educational and other high-stakes applications. These results emphasize the importance of addressing bias in LLMs and implementing robust mechanisms to uphold responsible AI use in real-world services. |
| format | Article |
| id | doaj-art-032f608287824b9fae5c8b030f43885c |
| institution | Kabale University |
| issn | 2073-431X |
| language | English |
| publishDate | 2025-03-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Computers |
| spelling | doaj-art-032f608287824b9fae5c8b030f43885c2025-08-20T03:43:15ZengMDPI AGComputers2073-431X2025-03-0114310010.3390/computers14030100Multifaceted Assessment of Responsible Use and Bias in Language Models for EducationIshrat Ahmed0Wenxing Liu1Rod D. Roscoe2Elizabeth Reilley3Danielle S. McNamara4Learning Engineering Institute, Arizona State University, Tempe, AZ 85281, USAEnterprise Technology-AI Acceleration, Arizona State University, Tempe, AZ 85281, USALearning Engineering Institute, Arizona State University, Tempe, AZ 85281, USAEnterprise Technology-AI Acceleration, Arizona State University, Tempe, AZ 85281, USALearning Engineering Institute, Arizona State University, Tempe, AZ 85281, USALarge language models (LLMs) are increasingly being utilized to develop tools and services in various domains, including education. However, due to the nature of the training data, these models are susceptible to inherent social or cognitive biases, which can influence their outputs. Furthermore, their handling of critical topics, such as privacy and sensitive questions, is essential for responsible deployment. This study proposes a framework for the automatic detection of biases and violations of responsible use using a synthetic question-based dataset mimicking student–chatbot interactions. We employ the LLM-as-a-judge method to evaluate multiple LLMs for biased responses. Our findings show that some models exhibit more bias than others, highlighting the need for careful consideration when selecting models for deployment in educational and other high-stakes applications. These results emphasize the importance of addressing bias in LLMs and implementing robust mechanisms to uphold responsible AI use in real-world services.https://www.mdpi.com/2073-431X/14/3/100biaseslarge language modelsLLM-as-a-judgeevaluationeducational chatbothigher-Ed |
| spellingShingle | Ishrat Ahmed Wenxing Liu Rod D. Roscoe Elizabeth Reilley Danielle S. McNamara Multifaceted Assessment of Responsible Use and Bias in Language Models for Education Computers biases large language models LLM-as-a-judge evaluation educational chatbot higher-Ed |
| title | Multifaceted Assessment of Responsible Use and Bias in Language Models for Education |
| title_full | Multifaceted Assessment of Responsible Use and Bias in Language Models for Education |
| title_fullStr | Multifaceted Assessment of Responsible Use and Bias in Language Models for Education |
| title_full_unstemmed | Multifaceted Assessment of Responsible Use and Bias in Language Models for Education |
| title_short | Multifaceted Assessment of Responsible Use and Bias in Language Models for Education |
| title_sort | multifaceted assessment of responsible use and bias in language models for education |
| topic | biases large language models LLM-as-a-judge evaluation educational chatbot higher-Ed |
| url | https://www.mdpi.com/2073-431X/14/3/100 |
| work_keys_str_mv | AT ishratahmed multifacetedassessmentofresponsibleuseandbiasinlanguagemodelsforeducation AT wenxingliu multifacetedassessmentofresponsibleuseandbiasinlanguagemodelsforeducation AT roddroscoe multifacetedassessmentofresponsibleuseandbiasinlanguagemodelsforeducation AT elizabethreilley multifacetedassessmentofresponsibleuseandbiasinlanguagemodelsforeducation AT daniellesmcnamara multifacetedassessmentofresponsibleuseandbiasinlanguagemodelsforeducation |