Leveraging computational linguistics and machine learning for detection of ultra-high risk of mental health disorders in youths
Abstract Mental illnesses often manifest through behavioral changes, with speech serving as a key medium for expressing thoughts and emotions. The use of computational linguistics on speech data in mental illnesses is a promising approach to uncover objective biomarkers for the early detection of me...
Saved in:
| Main Authors: | , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-07-01
|
| Series: | Schizophrenia |
| Online Access: | https://doi.org/10.1038/s41537-025-00649-3 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849342676577550336 |
|---|---|
| author | Jordon Junyang Kho Shangzheng Song Samuel Ming Xuan Tan Nur Hikmah Fitriyah Matheus Calvin Lokadjaja Jie Yin Yee Zixu Yang Eric Yu Hai Chen Jimmy Lee Wilson Wen Bin Goh |
| author_facet | Jordon Junyang Kho Shangzheng Song Samuel Ming Xuan Tan Nur Hikmah Fitriyah Matheus Calvin Lokadjaja Jie Yin Yee Zixu Yang Eric Yu Hai Chen Jimmy Lee Wilson Wen Bin Goh |
| author_sort | Jordon Junyang Kho |
| collection | DOAJ |
| description | Abstract Mental illnesses often manifest through behavioral changes, with speech serving as a key medium for expressing thoughts and emotions. The use of computational linguistics on speech data in mental illnesses is a promising approach to uncover objective biomarkers for the early detection of mental illnesses. This study analyzed speech transcripts from 80 youths at ultra-high risk of psychosis (UHR) and 329 healthy controls, examining text features such as sentiment variability, cohesion, lexical sophistication, morphology, syntactic sophistication, and lexical diversity. Factor analysis revealed five key linguistic themes: Sentiment Intensity and Variability, Linguistic Register Alignment, Phonographic Uniqueness and Recognizability, Morphological Complexity and Imageability, and Lexical Richness and Typicalness. Regression analysis indicated UHR speech is characterized by diminished sentiment variability (β = –0.07), deviation from linguistic registers (β = –0.16), fewer phonographic neighbors (β = –0.11), lower morphological complexity (β = –0.36), and more predictable lexical structures (β = 0.05). Optimized machine learning (ML) models trained on Boruta-selected features achieved a mean AUC of 0.70. Our findings highlight the potential of sentiment and linguistic analyses in speech for training ML models to aid in early detection and monitoring of mental health conditions. |
| format | Article |
| id | doaj-art-3fc8a8861ee84107a2d570c66f34d3ce |
| institution | Kabale University |
| issn | 2754-6993 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Schizophrenia |
| spelling | doaj-art-3fc8a8861ee84107a2d570c66f34d3ce2025-08-20T03:43:16ZengNature PortfolioSchizophrenia2754-69932025-07-011111910.1038/s41537-025-00649-3Leveraging computational linguistics and machine learning for detection of ultra-high risk of mental health disorders in youthsJordon Junyang Kho0Shangzheng Song1Samuel Ming Xuan Tan2Nur Hikmah Fitriyah3Matheus Calvin Lokadjaja4Jie Yin Yee5Zixu Yang6Eric Yu Hai Chen7Jimmy Lee8Wilson Wen Bin Goh9Lee Kong Chian School of Medicine, Nanyang Technological UniversityLee Kong Chian School of Medicine, Nanyang Technological UniversityLee Kong Chian School of Medicine, Nanyang Technological UniversitySchool of Biological Sciences, Nanyang Technological UniversityLee Kong Chian School of Medicine, Nanyang Technological UniversityInstitute of Mental HealthInstitute of Mental HealthLee Kong Chian School of Medicine, Nanyang Technological UniversityLee Kong Chian School of Medicine, Nanyang Technological UniversityLee Kong Chian School of Medicine, Nanyang Technological UniversityAbstract Mental illnesses often manifest through behavioral changes, with speech serving as a key medium for expressing thoughts and emotions. The use of computational linguistics on speech data in mental illnesses is a promising approach to uncover objective biomarkers for the early detection of mental illnesses. This study analyzed speech transcripts from 80 youths at ultra-high risk of psychosis (UHR) and 329 healthy controls, examining text features such as sentiment variability, cohesion, lexical sophistication, morphology, syntactic sophistication, and lexical diversity. Factor analysis revealed five key linguistic themes: Sentiment Intensity and Variability, Linguistic Register Alignment, Phonographic Uniqueness and Recognizability, Morphological Complexity and Imageability, and Lexical Richness and Typicalness. Regression analysis indicated UHR speech is characterized by diminished sentiment variability (β = –0.07), deviation from linguistic registers (β = –0.16), fewer phonographic neighbors (β = –0.11), lower morphological complexity (β = –0.36), and more predictable lexical structures (β = 0.05). Optimized machine learning (ML) models trained on Boruta-selected features achieved a mean AUC of 0.70. Our findings highlight the potential of sentiment and linguistic analyses in speech for training ML models to aid in early detection and monitoring of mental health conditions.https://doi.org/10.1038/s41537-025-00649-3 |
| spellingShingle | Jordon Junyang Kho Shangzheng Song Samuel Ming Xuan Tan Nur Hikmah Fitriyah Matheus Calvin Lokadjaja Jie Yin Yee Zixu Yang Eric Yu Hai Chen Jimmy Lee Wilson Wen Bin Goh Leveraging computational linguistics and machine learning for detection of ultra-high risk of mental health disorders in youths Schizophrenia |
| title | Leveraging computational linguistics and machine learning for detection of ultra-high risk of mental health disorders in youths |
| title_full | Leveraging computational linguistics and machine learning for detection of ultra-high risk of mental health disorders in youths |
| title_fullStr | Leveraging computational linguistics and machine learning for detection of ultra-high risk of mental health disorders in youths |
| title_full_unstemmed | Leveraging computational linguistics and machine learning for detection of ultra-high risk of mental health disorders in youths |
| title_short | Leveraging computational linguistics and machine learning for detection of ultra-high risk of mental health disorders in youths |
| title_sort | leveraging computational linguistics and machine learning for detection of ultra high risk of mental health disorders in youths |
| url | https://doi.org/10.1038/s41537-025-00649-3 |
| work_keys_str_mv | AT jordonjunyangkho leveragingcomputationallinguisticsandmachinelearningfordetectionofultrahighriskofmentalhealthdisordersinyouths AT shangzhengsong leveragingcomputationallinguisticsandmachinelearningfordetectionofultrahighriskofmentalhealthdisordersinyouths AT samuelmingxuantan leveragingcomputationallinguisticsandmachinelearningfordetectionofultrahighriskofmentalhealthdisordersinyouths AT nurhikmahfitriyah leveragingcomputationallinguisticsandmachinelearningfordetectionofultrahighriskofmentalhealthdisordersinyouths AT matheuscalvinlokadjaja leveragingcomputationallinguisticsandmachinelearningfordetectionofultrahighriskofmentalhealthdisordersinyouths AT jieyinyee leveragingcomputationallinguisticsandmachinelearningfordetectionofultrahighriskofmentalhealthdisordersinyouths AT zixuyang leveragingcomputationallinguisticsandmachinelearningfordetectionofultrahighriskofmentalhealthdisordersinyouths AT ericyuhaichen leveragingcomputationallinguisticsandmachinelearningfordetectionofultrahighriskofmentalhealthdisordersinyouths AT jimmylee leveragingcomputationallinguisticsandmachinelearningfordetectionofultrahighriskofmentalhealthdisordersinyouths AT wilsonwenbingoh leveragingcomputationallinguisticsandmachinelearningfordetectionofultrahighriskofmentalhealthdisordersinyouths |