Facial Movements Extracted from Video for the Kinematic Classification of Speech
Speech Sound Disorders (SSDs) are prevalent communication problems in children that pose significant barriers to academic success and social participation. Accurate diagnosis is key to mitigating life-long impacts. We are developing a novel software solution—the Speech Movement and Acoustic Analysis...
        Saved in:
      
    
          | Main Authors: | , , , , , , , | 
|---|---|
| Format: | Article | 
| Language: | English | 
| Published: | MDPI AG
    
        2024-11-01 | 
| Series: | Sensors | 
| Subjects: | |
| Online Access: | https://www.mdpi.com/1424-8220/24/22/7235 | 
| Tags: | Add Tag 
      No Tags, Be the first to tag this record!
   | 
| _version_ | 1846152468561395712 | 
|---|---|
| author | Richard Palmer Roslyn Ward Petra Helmholz Geoffrey R. Strauss Paul Davey Neville Hennessey Linda Orton Aravind Namasivayam | 
| author_facet | Richard Palmer Roslyn Ward Petra Helmholz Geoffrey R. Strauss Paul Davey Neville Hennessey Linda Orton Aravind Namasivayam | 
| author_sort | Richard Palmer | 
| collection | DOAJ | 
| description | Speech Sound Disorders (SSDs) are prevalent communication problems in children that pose significant barriers to academic success and social participation. Accurate diagnosis is key to mitigating life-long impacts. We are developing a novel software solution—the Speech Movement and Acoustic Analysis Tracking (SMAAT) system to facilitate rapid and objective assessment of motor speech control issues underlying SSD. This study evaluates the feasibility of using automatically extracted three-dimensional (3D) facial measurements from single two-dimensional (2D) front-facing video cameras for classifying speech movements. Videos were recorded of 51 adults and 77 children between 3 and 4 years of age (all typically developed for age) saying 20 words from the mandibular and labial-facial levels of the Motor-Speech Hierarchy Probe Wordlist (MSH-PW). Measurements around the jaw and lips were automatically extracted from the 2D video frames using a state-of-the-art facial mesh detection and tracking algorithm, and each individual measurement was tested in a Leave-One-Out Cross-Validation (LOOCV) framework for its word classification performance. Statistics were evaluated at the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>α</mi><mo>=</mo><mn>0.05</mn></mrow></semantics></math></inline-formula> significance level and several measurements were found to exhibit significant classification performance in both the adult and child cohorts. Importantly, measurements of depth indirectly inferred from the 2D video frames were among those found to be significant. The significant measurements were shown to match expectations of facial movements across the 20 words, demonstrating their potential applicability in supporting clinical evaluations of speech production. | 
| format | Article | 
| id | doaj-art-0a6dd034fba74f53a0b5a4d0e3ce9d3f | 
| institution | Kabale University | 
| issn | 1424-8220 | 
| language | English | 
| publishDate | 2024-11-01 | 
| publisher | MDPI AG | 
| record_format | Article | 
| series | Sensors | 
| spelling | doaj-art-0a6dd034fba74f53a0b5a4d0e3ce9d3f2024-11-26T18:21:12ZengMDPI AGSensors1424-82202024-11-012422723510.3390/s24227235Facial Movements Extracted from Video for the Kinematic Classification of SpeechRichard Palmer0Roslyn Ward1Petra Helmholz2Geoffrey R. Strauss3Paul Davey4Neville Hennessey5Linda Orton6Aravind Namasivayam7School of Earth and Planetary Sciences, Curtin University, Perth, WA 6102, AustraliaSchool of Allied Health, Curtin University, Perth, WA 6102, AustraliaSchool of Earth and Planetary Sciences, Curtin University, Perth, WA 6102, AustraliaSchool of Allied Health, Curtin University, Perth, WA 6102, AustraliaSchool of Allied Health, Curtin University, Perth, WA 6102, AustraliaSchool of Allied Health, Curtin University, Perth, WA 6102, AustraliaSchool of Allied Health, Curtin University, Perth, WA 6102, AustraliaDepartment of Speech-Language Pathology, University of Toronto, Toronto, ON M5G 1V7, CanadaSpeech Sound Disorders (SSDs) are prevalent communication problems in children that pose significant barriers to academic success and social participation. Accurate diagnosis is key to mitigating life-long impacts. We are developing a novel software solution—the Speech Movement and Acoustic Analysis Tracking (SMAAT) system to facilitate rapid and objective assessment of motor speech control issues underlying SSD. This study evaluates the feasibility of using automatically extracted three-dimensional (3D) facial measurements from single two-dimensional (2D) front-facing video cameras for classifying speech movements. Videos were recorded of 51 adults and 77 children between 3 and 4 years of age (all typically developed for age) saying 20 words from the mandibular and labial-facial levels of the Motor-Speech Hierarchy Probe Wordlist (MSH-PW). Measurements around the jaw and lips were automatically extracted from the 2D video frames using a state-of-the-art facial mesh detection and tracking algorithm, and each individual measurement was tested in a Leave-One-Out Cross-Validation (LOOCV) framework for its word classification performance. Statistics were evaluated at the <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>α</mi><mo>=</mo><mn>0.05</mn></mrow></semantics></math></inline-formula> significance level and several measurements were found to exhibit significant classification performance in both the adult and child cohorts. Importantly, measurements of depth indirectly inferred from the 2D video frames were among those found to be significant. The significant measurements were shown to match expectations of facial movements across the 20 words, demonstrating their potential applicability in supporting clinical evaluations of speech production.https://www.mdpi.com/1424-8220/24/22/7235digital biomarkerskinematicsspatiotemporal profilingSpeech Sound Disorders | 
| spellingShingle | Richard Palmer Roslyn Ward Petra Helmholz Geoffrey R. Strauss Paul Davey Neville Hennessey Linda Orton Aravind Namasivayam Facial Movements Extracted from Video for the Kinematic Classification of Speech Sensors digital biomarkers kinematics spatiotemporal profiling Speech Sound Disorders | 
| title | Facial Movements Extracted from Video for the Kinematic Classification of Speech | 
| title_full | Facial Movements Extracted from Video for the Kinematic Classification of Speech | 
| title_fullStr | Facial Movements Extracted from Video for the Kinematic Classification of Speech | 
| title_full_unstemmed | Facial Movements Extracted from Video for the Kinematic Classification of Speech | 
| title_short | Facial Movements Extracted from Video for the Kinematic Classification of Speech | 
| title_sort | facial movements extracted from video for the kinematic classification of speech | 
| topic | digital biomarkers kinematics spatiotemporal profiling Speech Sound Disorders | 
| url | https://www.mdpi.com/1424-8220/24/22/7235 | 
| work_keys_str_mv | AT richardpalmer facialmovementsextractedfromvideoforthekinematicclassificationofspeech AT roslynward facialmovementsextractedfromvideoforthekinematicclassificationofspeech AT petrahelmholz facialmovementsextractedfromvideoforthekinematicclassificationofspeech AT geoffreyrstrauss facialmovementsextractedfromvideoforthekinematicclassificationofspeech AT pauldavey facialmovementsextractedfromvideoforthekinematicclassificationofspeech AT nevillehennessey facialmovementsextractedfromvideoforthekinematicclassificationofspeech AT lindaorton facialmovementsextractedfromvideoforthekinematicclassificationofspeech AT aravindnamasivayam facialmovementsextractedfromvideoforthekinematicclassificationofspeech | 
 
       