Whisper Automatic Speech Recognition and GPT Large Language Models as Best Practice for Assessing Communication Progress in Autism Spectrum Disorder
Autism Spectrum Disorder (ASD) is a developmental disorder that affects communication, social interaction, and behavior. Communication assessments for children with ASD are often conducted manually, making the process time-consuming, which can lead to delays in developing educational programs and a...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Universitas Negeri Jakarta
2025-04-01
|
| Series: | Jurnal Teknologi Pendidikan |
| Subjects: | |
| Online Access: | https://journal.unj.ac.id/unj/index.php/jtp/article/view/54243 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Autism Spectrum Disorder (ASD) is a developmental disorder that affects communication, social interaction, and behavior. Communication assessments for children with ASD are often conducted manually, making the process time-consuming, which can lead to delays in developing educational programs and a lack of standardization due to subjective evaluations. This study aims to develop an automated framework based on Whisper and GPT-4o to enhance the efficiency and accuracy in evaluating communication abilities and language patterns in children with ASD. This research employs a Research and Development (RnD) approach involving children with ASD (mild and moderate verbal categories) and teachers from four autism schools in Daerah Istimewa Yogyakarta, Indonesia. Data were collected through interviews, classroom observations, audio recordings, and a matrix-based evaluation. Whisper was employed for automated transcription, integrated with GPT-4o for speaker diarization and communication analysis. The combination of these tools resulted in a significant reduction in analysis time by 89.1% compared to manual methods. Whisper achieved a low Word Error Rate (WER) for mild autism (average 5%) and a higher rate for moderate autism (average 23%). GPT-4o contributed to the process with high speaker diarization accuracy (93.9% for mild autism and 89.2% for moderate autism). The framework identified detailed communication improvements through the matrix-based evaluation, including verbal, pragmatic, semantic, sentence structure, and echolalia aspects. It provided insights previously undetected by teachers, such as specific developmental patterns in each aspect.
|
|---|---|
| ISSN: | 1411-2744 2620-3081 |