Listen or Read? The Impact of Proficiency and Visual Complexity on Learners’ Reliance on Captions
This study investigates how Chinese EFL (English as a foreign language) learners of low- and high-proficiency levels allocate attention between captions and audio while watching videos, and how visual complexity (single- vs. multi-speaker content) influences caption reliance. The study employed a no...
Saved in:
| Main Author: | |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-04-01
|
| Series: | Behavioral Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-328X/15/4/542 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This study investigates how Chinese EFL (English as a foreign language) learners of low- and high-proficiency levels allocate attention between captions and audio while watching videos, and how visual complexity (single- vs. multi-speaker content) influences caption reliance. The study employed a novel paused transcription method to assess real-time processing. A total of 64 participants (31 low-proficiency [A1–A2] and 33 high-proficiency [C1–C2] learners) viewed single- and multi-speaker videos with English captions. Misleading captions were inserted to objectively measure reliance on captions versus audio. Results revealed significant proficiency effects: Low-proficiency learners prioritized captions (reading scores > listening, <i>Z</i> = −4.55, <i>p</i> < 0.001, <i>r</i> = 0.82), while high-proficiency learners focused on audio (listening > reading, <i>Z</i> = −5.12, <i>p</i> < 0.001, <i>r</i> = 0.89). Multi-speaker videos amplified caption reliance for low-proficiency learners (<i>r</i> = 0.75) and moderately increased reliance for high-proficiency learners (<i>r</i> = 0.52). These findings demonstrate that low-proficiency learners rely overwhelmingly on captions during video viewing, while high-proficiency learners integrate multimodal inputs. Notably, increased visual complexity amplifies caption reliance across proficiency levels. Implications are twofold: Pedagogically, educators could design tiered caption removal protocols as skills improve while incorporating adjustable caption opacity tools. Technologically, future research could focus on developing dynamic captioning systems leveraging eye-tracking and AI to adapt to real-time proficiency, optimizing learning experiences. Additionally, video complexity should be calibrated to learners’ proficiency levels. |
|---|---|
| ISSN: | 2076-328X |