Predicting the Risk of Lumbar Prolapsed Disc: A Gene Signature-Based Machine Learning Analysis

Abstract Introduction Lumbar prolapsed disc (LPD) is a leading cause of low back pain, contributing significantly to global disability and healthcare burden. This study aimed to develop machine learning models to predict the risk of LPD by analysing gene expression profiles for early detection. Meth...

Full description

Saved in:
Bibliographic Details
Main Authors: Fengfeng Wang, Fei Meng, Stanley Sau Ching Wong
Format: Article
Language:English
Published: Adis, Springer Healthcare 2025-05-01
Series:Pain and Therapy
Subjects:
Online Access:https://doi.org/10.1007/s40122-025-00744-4
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850272046145077248
author Fengfeng Wang
Fei Meng
Stanley Sau Ching Wong
author_facet Fengfeng Wang
Fei Meng
Stanley Sau Ching Wong
author_sort Fengfeng Wang
collection DOAJ
description Abstract Introduction Lumbar prolapsed disc (LPD) is a leading cause of low back pain, contributing significantly to global disability and healthcare burden. This study aimed to develop machine learning models to predict the risk of LPD by analysing gene expression profiles for early detection. Methods Transcriptomic data from peripheral blood samples were obtained from the Gene Expression Omnibus (GEO) database, with dataset GSE150408 used for training and GSE124272 for testing. The training dataset included 17 patients with sciatica resulting from LPD, all of whom had magnetic resonance imaging confirmation of single-level LPD at either the L4/5 or L5/S1 levels. Data from 17 healthy volunteers were used as controls. Recursive feature elimination (RFE) was employed to identify the most relevant gene signatures among 23 pain-related genes. Machine learning models, including support vector machine (SVM), random forest, k-nearest neighbours (KNN), logistic regression, and Extreme Gradient Boosting (XGBoost), were trained and evaluated. Model performance was assessed using accuracy, area under the curve (AUC), F1 score, and Matthews correlation coefficient (MCC). Results Eight key gene signatures were identified as significant predictors of LPD, with MMP9 exhibiting the highest importance score. Most of these genes were differentially expressed between patients with LPD and healthy controls (p < 0.05). Among the models, random forest demonstrated the highest accuracy (0.80, 95% CI 0.73–0.85) and MCC (0.64, 95% CI 0.53–0.76), followed by KNN, XGBoost, and SVM. Overall, the random forest model exhibited the most robust performance in predicting the risk of LPD. Conclusion The results of our study suggest that machine learning models based on pain-related gene signatures may identify patients at high risk of developing LPD with reasonably high accuracy. These prediction models could perhaps be integrated into clinical diagnostic tools to enhance early diagnosis and prevention.
format Article
id doaj-art-cc31ae8848764c609ea4564a2f749296
institution OA Journals
issn 2193-8237
2193-651X
language English
publishDate 2025-05-01
publisher Adis, Springer Healthcare
record_format Article
series Pain and Therapy
spelling doaj-art-cc31ae8848764c609ea4564a2f7492962025-08-20T01:51:58ZengAdis, Springer HealthcarePain and Therapy2193-82372193-651X2025-05-011431117112910.1007/s40122-025-00744-4Predicting the Risk of Lumbar Prolapsed Disc: A Gene Signature-Based Machine Learning AnalysisFengfeng Wang0Fei Meng1Stanley Sau Ching Wong2Department of Anaesthesiology, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Queen Mary HospitalDepartment of Anaesthesiology, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Queen Mary HospitalDepartment of Anaesthesiology, School of Clinical Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Queen Mary HospitalAbstract Introduction Lumbar prolapsed disc (LPD) is a leading cause of low back pain, contributing significantly to global disability and healthcare burden. This study aimed to develop machine learning models to predict the risk of LPD by analysing gene expression profiles for early detection. Methods Transcriptomic data from peripheral blood samples were obtained from the Gene Expression Omnibus (GEO) database, with dataset GSE150408 used for training and GSE124272 for testing. The training dataset included 17 patients with sciatica resulting from LPD, all of whom had magnetic resonance imaging confirmation of single-level LPD at either the L4/5 or L5/S1 levels. Data from 17 healthy volunteers were used as controls. Recursive feature elimination (RFE) was employed to identify the most relevant gene signatures among 23 pain-related genes. Machine learning models, including support vector machine (SVM), random forest, k-nearest neighbours (KNN), logistic regression, and Extreme Gradient Boosting (XGBoost), were trained and evaluated. Model performance was assessed using accuracy, area under the curve (AUC), F1 score, and Matthews correlation coefficient (MCC). Results Eight key gene signatures were identified as significant predictors of LPD, with MMP9 exhibiting the highest importance score. Most of these genes were differentially expressed between patients with LPD and healthy controls (p < 0.05). Among the models, random forest demonstrated the highest accuracy (0.80, 95% CI 0.73–0.85) and MCC (0.64, 95% CI 0.53–0.76), followed by KNN, XGBoost, and SVM. Overall, the random forest model exhibited the most robust performance in predicting the risk of LPD. Conclusion The results of our study suggest that machine learning models based on pain-related gene signatures may identify patients at high risk of developing LPD with reasonably high accuracy. These prediction models could perhaps be integrated into clinical diagnostic tools to enhance early diagnosis and prevention.https://doi.org/10.1007/s40122-025-00744-4Low back painLumbar prolapsed discGene signatureTranscriptomicsMachine learningRisk prediction
spellingShingle Fengfeng Wang
Fei Meng
Stanley Sau Ching Wong
Predicting the Risk of Lumbar Prolapsed Disc: A Gene Signature-Based Machine Learning Analysis
Pain and Therapy
Low back pain
Lumbar prolapsed disc
Gene signature
Transcriptomics
Machine learning
Risk prediction
title Predicting the Risk of Lumbar Prolapsed Disc: A Gene Signature-Based Machine Learning Analysis
title_full Predicting the Risk of Lumbar Prolapsed Disc: A Gene Signature-Based Machine Learning Analysis
title_fullStr Predicting the Risk of Lumbar Prolapsed Disc: A Gene Signature-Based Machine Learning Analysis
title_full_unstemmed Predicting the Risk of Lumbar Prolapsed Disc: A Gene Signature-Based Machine Learning Analysis
title_short Predicting the Risk of Lumbar Prolapsed Disc: A Gene Signature-Based Machine Learning Analysis
title_sort predicting the risk of lumbar prolapsed disc a gene signature based machine learning analysis
topic Low back pain
Lumbar prolapsed disc
Gene signature
Transcriptomics
Machine learning
Risk prediction
url https://doi.org/10.1007/s40122-025-00744-4
work_keys_str_mv AT fengfengwang predictingtheriskoflumbarprolapseddiscagenesignaturebasedmachinelearninganalysis
AT feimeng predictingtheriskoflumbarprolapseddiscagenesignaturebasedmachinelearninganalysis
AT stanleysauchingwong predictingtheriskoflumbarprolapseddiscagenesignaturebasedmachinelearninganalysis