Machine learning prediction of coal workers’ pneumoconiosis classification based on few-shot clinical data

Objective Aiming at the problems of the long incubation period, insufficient early diagnosis, and lack of treatment methods of coal workers’ pneumoconiosis (CWP), the objective of this study is to accurately predict the CWP staging based on machine learning (ML) methods and small-sample clinical dat...

Full description

Saved in:
Bibliographic Details
Main Authors: Jiaqi Jia, Jingying Huang, Yuming Cui, Dekun Zhang, Haiquan Li, Songquan Wang, Wenlu Hang
Format: Article
Language:English
Published: SAGE Publishing 2025-07-01
Series:Digital Health
Online Access:https://doi.org/10.1177/20552076251359498
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Objective Aiming at the problems of the long incubation period, insufficient early diagnosis, and lack of treatment methods of coal workers’ pneumoconiosis (CWP), the objective of this study is to accurately predict the CWP staging based on machine learning (ML) methods and small-sample clinical data. Methods The study included a comparative analysis of clinical data from 202 healthy individuals and 81 CWP patients at general Hospital of Xuzhou Mining Group. Firstly, various oversampling techniques were employed to address the issue of data imbalance. Subsequently, multiple ML methods were adopted for supervised learning and prediction of CWP staging. Then, an innovative feature selection method was proposed, integrating the importance and independence of clinical features to achieve high-precision predictions of CWP with a limited number of indicators. Results The study identified ALB, PLT, and WBC as significant predictive factors for CWP through the Random Forest importance assessment method. Furthermore, in terms of integrated feature selection, when the weight ratio of feature importance to independence was 7:3 or 6:4, all ML models showed optimal performance, with the Random Forest (RF)-Adaboost model demonstrating the best predictive accuracy for CWP, reaching a F1 score of 0.8757. Conclusions The integration of clinical biochemical examination data with ML models, especially the RF-Adaboost and support vector machine-particle swarm optimization models, effectively predicted the staging of CWP. The proposed integrated feature selection method, which considered both the importance and independence of features, significantly enhanced model performance, providing a valuable tool for early screening and diagnosis of CWP.
ISSN:2055-2076