Research on medical small sample data classification based on SMOTE and gcForest

Aiming at the problem of poor classification performance in traditional machine learning models caused by shallow model structure and complex data characteristics in small medical sample data, an combine multi- grained improved cascade forest (cgicForest) model was proposed.It enhances the represent...

Full description

Saved in:
Bibliographic Details
Main Authors: Wenchang LIU, Yun WEI, Haoxuan YUAN, Yue GAO
Format: Article
Language:zho
Published: China InfoCom Media Group 2023-06-01
Series:物联网学报
Subjects:
Online Access:http://www.wlwxb.com.cn/zh/article/doi/10.11959/j.issn.2096-3750.2023.00337/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841533788946956288
author Wenchang LIU
Yun WEI
Haoxuan YUAN
Yue GAO
author_facet Wenchang LIU
Yun WEI
Haoxuan YUAN
Yue GAO
author_sort Wenchang LIU
collection DOAJ
description Aiming at the problem of poor classification performance in traditional machine learning models caused by shallow model structure and complex data characteristics in small medical sample data, an combine multi- grained improved cascade forest (cgicForest) model was proposed.It enhances the representation learning ability of the model by adding random sampling into the multi-grained scanning and optimizing the transformation features.It also enhances the model's classification ability by updating the cascade forest’s hierarchical structure.Considering category imbalance problems in datasets, the safe-borderline-SMOTE (SBS) algorithm was proposed to dynamic interpolate around the few class samples belonging to the safety boundary, which can improve the quality of training data.The cgicForest was applied for training and learning, thus the SBS-cgicForest classification model was obtained which can support imbalanced medical small samples data.The model is used on three medical datasets for classification experiments.The results show that the performance indexes of the cgicForest model in the classification of medical small sample data with complex characteristics have increased by 4.1~5.4 percentage points, compared with the multi-grained cascade forest (gcForest) model.The performance indexes have increase by 6.6~11.2 percentage points after the combination with SBS algorithm, the F<sub>1</sub> score was 2~2.5 percentage points higher than that obtained by traditional sampling methods.It provides a reference for solving the classification problem of small medical sample data, and includes support for internet of things applications in smart medical scenarios.
format Article
id doaj-art-2122443b1b9747649b2977ced6ad0728
institution Kabale University
issn 2096-3750
language zho
publishDate 2023-06-01
publisher China InfoCom Media Group
record_format Article
series 物联网学报
spelling doaj-art-2122443b1b9747649b2977ced6ad07282025-01-15T02:54:32ZzhoChina InfoCom Media Group物联网学报2096-37502023-06-017768759578201Research on medical small sample data classification based on SMOTE and gcForestWenchang LIUYun WEIHaoxuan YUANYue GAOAiming at the problem of poor classification performance in traditional machine learning models caused by shallow model structure and complex data characteristics in small medical sample data, an combine multi- grained improved cascade forest (cgicForest) model was proposed.It enhances the representation learning ability of the model by adding random sampling into the multi-grained scanning and optimizing the transformation features.It also enhances the model's classification ability by updating the cascade forest’s hierarchical structure.Considering category imbalance problems in datasets, the safe-borderline-SMOTE (SBS) algorithm was proposed to dynamic interpolate around the few class samples belonging to the safety boundary, which can improve the quality of training data.The cgicForest was applied for training and learning, thus the SBS-cgicForest classification model was obtained which can support imbalanced medical small samples data.The model is used on three medical datasets for classification experiments.The results show that the performance indexes of the cgicForest model in the classification of medical small sample data with complex characteristics have increased by 4.1~5.4 percentage points, compared with the multi-grained cascade forest (gcForest) model.The performance indexes have increase by 6.6~11.2 percentage points after the combination with SBS algorithm, the F<sub>1</sub> score was 2~2.5 percentage points higher than that obtained by traditional sampling methods.It provides a reference for solving the classification problem of small medical sample data, and includes support for internet of things applications in smart medical scenarios.http://www.wlwxb.com.cn/zh/article/doi/10.11959/j.issn.2096-3750.2023.00337/medical datasmall sampleSMOTEgcForest
spellingShingle Wenchang LIU
Yun WEI
Haoxuan YUAN
Yue GAO
Research on medical small sample data classification based on SMOTE and gcForest
物联网学报
medical data
small sample
SMOTE
gcForest
title Research on medical small sample data classification based on SMOTE and gcForest
title_full Research on medical small sample data classification based on SMOTE and gcForest
title_fullStr Research on medical small sample data classification based on SMOTE and gcForest
title_full_unstemmed Research on medical small sample data classification based on SMOTE and gcForest
title_short Research on medical small sample data classification based on SMOTE and gcForest
title_sort research on medical small sample data classification based on smote and gcforest
topic medical data
small sample
SMOTE
gcForest
url http://www.wlwxb.com.cn/zh/article/doi/10.11959/j.issn.2096-3750.2023.00337/
work_keys_str_mv AT wenchangliu researchonmedicalsmallsampledataclassificationbasedonsmoteandgcforest
AT yunwei researchonmedicalsmallsampledataclassificationbasedonsmoteandgcforest
AT haoxuanyuan researchonmedicalsmallsampledataclassificationbasedonsmoteandgcforest
AT yuegao researchonmedicalsmallsampledataclassificationbasedonsmoteandgcforest