Whisper Automatic Speech Recognition and GPT Large Language Models as Best Practice for Assessing Communication Progress in Autism Spectrum Disorder

Autism Spectrum Disorder (ASD) is a developmental disorder that affects communication, social interaction, and behavior. Communication assessments for children with ASD are often conducted manually, making the process time-consuming, which can lead to delays in developing educational programs and a...

Full description

Saved in:
Bibliographic Details
Main Authors: Naela Fauzul Muna, Mukhammad Andri Setiawan
Format: Article
Language:English
Published: Universitas Negeri Jakarta 2025-04-01
Series:Jurnal Teknologi Pendidikan
Subjects:
Online Access:https://journal.unj.ac.id/unj/index.php/jtp/article/view/54243
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Autism Spectrum Disorder (ASD) is a developmental disorder that affects communication, social interaction, and behavior. Communication assessments for children with ASD are often conducted manually, making the process time-consuming, which can lead to delays in developing educational programs and a lack of standardization due to subjective evaluations. This study aims to develop an automated framework based on Whisper and GPT-4o to enhance the efficiency and accuracy in evaluating communication abilities and language patterns in children with ASD. This research employs a Research and Development (RnD) approach involving children with ASD (mild and moderate verbal categories) and teachers from four autism schools in Daerah Istimewa Yogyakarta, Indonesia. Data were collected through interviews, classroom observations, audio recordings, and a matrix-based evaluation. Whisper was employed for automated transcription, integrated with GPT-4o for speaker diarization and communication analysis. The combination of these tools resulted in a significant reduction in analysis time by 89.1% compared to manual methods. Whisper achieved a low Word Error Rate (WER) for mild autism (average 5%) and a higher rate for moderate autism (average 23%). GPT-4o contributed to the process with high speaker diarization accuracy (93.9% for mild autism and 89.2% for moderate autism). The framework identified detailed communication improvements through the matrix-based evaluation, including verbal, pragmatic, semantic, sentence structure, and echolalia aspects. It provided insights previously undetected by teachers, such as specific developmental patterns in each aspect.
ISSN:1411-2744
2620-3081