Beyond accuracy: Multimodal modeling of structured speaking skill indices in young adolescents

This study introduces a novel method for explainable speaking skill assessment that utilizes a unique dataset featuring video recordings of conversational interviews for high-stakes outcomes (i.e., admission to high schools and universities). Unlike traditional automated speaking assessments that pr...

Full description

Saved in:
Bibliographic Details
Main Authors: Candy Olivia Mawalim, Chee Wee Leong, Guy Sivan, Hung-Hsuan Huang, Shogo Okada
Format: Article
Language:English
Published: Elsevier 2025-06-01
Series:Computers and Education: Artificial Intelligence
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666920X25000268
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850223263180914688
author Candy Olivia Mawalim
Chee Wee Leong
Guy Sivan
Hung-Hsuan Huang
Shogo Okada
author_facet Candy Olivia Mawalim
Chee Wee Leong
Guy Sivan
Hung-Hsuan Huang
Shogo Okada
author_sort Candy Olivia Mawalim
collection DOAJ
description This study introduces a novel method for explainable speaking skill assessment that utilizes a unique dataset featuring video recordings of conversational interviews for high-stakes outcomes (i.e., admission to high schools and universities). Unlike traditional automated speaking assessments that prioritize accuracy at the expense of interpretability, our approach employs a new multimodal dataset that integrates acoustic and linguistic features, visual cues, turn-taking patterns, and expert-derived scores quantifying various speaking skill aspects observed during interviews with young adolescents. This dataset is distinguished by its open-ended question format, which allows for varied responses from interviewees, providing a rich basis for analysis. The experimental results demonstrate that fusing interpretable features, including prosody, action units, and turn-taking, significantly enhances the accuracy of spoken English skill prediction, achieving an overall accuracy of 83% when a machine learning model based on the light gradient boosting algorithm is used. Furthermore, this research underscores the significant influence of external factors, such as interviewer behavior and the interview setting, particularly on the coherence aspect of spoken English proficiency. This focus on an innovative dataset and interpretable assessment tools offers a more nuanced understanding of speaking skills in high-stakes contexts than that offered by previous studies.
format Article
id doaj-art-66df0d83831e40edbe8045ccdada3baa
institution OA Journals
issn 2666-920X
language English
publishDate 2025-06-01
publisher Elsevier
record_format Article
series Computers and Education: Artificial Intelligence
spelling doaj-art-66df0d83831e40edbe8045ccdada3baa2025-08-20T02:06:00ZengElsevierComputers and Education: Artificial Intelligence2666-920X2025-06-01810038610.1016/j.caeai.2025.100386Beyond accuracy: Multimodal modeling of structured speaking skill indices in young adolescentsCandy Olivia Mawalim0Chee Wee Leong1Guy Sivan2Hung-Hsuan Huang3Shogo Okada4Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan; Corresponding author.Educational Testing Service, Princeton, NJ, USAVericant.com, Beijing, ChinaThe University of Fukuchiyama, Fukuchiyama, Kyoto, JapanJapan Advanced Institute of Science and Technology, Nomi, Ishikawa, JapanThis study introduces a novel method for explainable speaking skill assessment that utilizes a unique dataset featuring video recordings of conversational interviews for high-stakes outcomes (i.e., admission to high schools and universities). Unlike traditional automated speaking assessments that prioritize accuracy at the expense of interpretability, our approach employs a new multimodal dataset that integrates acoustic and linguistic features, visual cues, turn-taking patterns, and expert-derived scores quantifying various speaking skill aspects observed during interviews with young adolescents. This dataset is distinguished by its open-ended question format, which allows for varied responses from interviewees, providing a rich basis for analysis. The experimental results demonstrate that fusing interpretable features, including prosody, action units, and turn-taking, significantly enhances the accuracy of spoken English skill prediction, achieving an overall accuracy of 83% when a machine learning model based on the light gradient boosting algorithm is used. Furthermore, this research underscores the significant influence of external factors, such as interviewer behavior and the interview setting, particularly on the coherence aspect of spoken English proficiency. This focus on an innovative dataset and interpretable assessment tools offers a more nuanced understanding of speaking skills in high-stakes contexts than that offered by previous studies.http://www.sciencedirect.com/science/article/pii/S2666920X25000268Speaking skillsMultimodalInterpretabilityInterview
spellingShingle Candy Olivia Mawalim
Chee Wee Leong
Guy Sivan
Hung-Hsuan Huang
Shogo Okada
Beyond accuracy: Multimodal modeling of structured speaking skill indices in young adolescents
Computers and Education: Artificial Intelligence
Speaking skills
Multimodal
Interpretability
Interview
title Beyond accuracy: Multimodal modeling of structured speaking skill indices in young adolescents
title_full Beyond accuracy: Multimodal modeling of structured speaking skill indices in young adolescents
title_fullStr Beyond accuracy: Multimodal modeling of structured speaking skill indices in young adolescents
title_full_unstemmed Beyond accuracy: Multimodal modeling of structured speaking skill indices in young adolescents
title_short Beyond accuracy: Multimodal modeling of structured speaking skill indices in young adolescents
title_sort beyond accuracy multimodal modeling of structured speaking skill indices in young adolescents
topic Speaking skills
Multimodal
Interpretability
Interview
url http://www.sciencedirect.com/science/article/pii/S2666920X25000268
work_keys_str_mv AT candyoliviamawalim beyondaccuracymultimodalmodelingofstructuredspeakingskillindicesinyoungadolescents
AT cheeweeleong beyondaccuracymultimodalmodelingofstructuredspeakingskillindicesinyoungadolescents
AT guysivan beyondaccuracymultimodalmodelingofstructuredspeakingskillindicesinyoungadolescents
AT hunghsuanhuang beyondaccuracymultimodalmodelingofstructuredspeakingskillindicesinyoungadolescents
AT shogookada beyondaccuracymultimodalmodelingofstructuredspeakingskillindicesinyoungadolescents