Research on medical small sample data classification based on SMOTE and gcForest
Aiming at the problem of poor classification performance in traditional machine learning models caused by shallow model structure and complex data characteristics in small medical sample data, an combine multi- grained improved cascade forest (cgicForest) model was proposed.It enhances the represent...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | zho |
Published: |
China InfoCom Media Group
2023-06-01
|
Series: | 物联网学报 |
Subjects: | |
Online Access: | http://www.wlwxb.com.cn/zh/article/doi/10.11959/j.issn.2096-3750.2023.00337/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841533788946956288 |
---|---|
author | Wenchang LIU Yun WEI Haoxuan YUAN Yue GAO |
author_facet | Wenchang LIU Yun WEI Haoxuan YUAN Yue GAO |
author_sort | Wenchang LIU |
collection | DOAJ |
description | Aiming at the problem of poor classification performance in traditional machine learning models caused by shallow model structure and complex data characteristics in small medical sample data, an combine multi- grained improved cascade forest (cgicForest) model was proposed.It enhances the representation learning ability of the model by adding random sampling into the multi-grained scanning and optimizing the transformation features.It also enhances the model's classification ability by updating the cascade forest’s hierarchical structure.Considering category imbalance problems in datasets, the safe-borderline-SMOTE (SBS) algorithm was proposed to dynamic interpolate around the few class samples belonging to the safety boundary, which can improve the quality of training data.The cgicForest was applied for training and learning, thus the SBS-cgicForest classification model was obtained which can support imbalanced medical small samples data.The model is used on three medical datasets for classification experiments.The results show that the performance indexes of the cgicForest model in the classification of medical small sample data with complex characteristics have increased by 4.1~5.4 percentage points, compared with the multi-grained cascade forest (gcForest) model.The performance indexes have increase by 6.6~11.2 percentage points after the combination with SBS algorithm, the F<sub>1</sub> score was 2~2.5 percentage points higher than that obtained by traditional sampling methods.It provides a reference for solving the classification problem of small medical sample data, and includes support for internet of things applications in smart medical scenarios. |
format | Article |
id | doaj-art-2122443b1b9747649b2977ced6ad0728 |
institution | Kabale University |
issn | 2096-3750 |
language | zho |
publishDate | 2023-06-01 |
publisher | China InfoCom Media Group |
record_format | Article |
series | 物联网学报 |
spelling | doaj-art-2122443b1b9747649b2977ced6ad07282025-01-15T02:54:32ZzhoChina InfoCom Media Group物联网学报2096-37502023-06-017768759578201Research on medical small sample data classification based on SMOTE and gcForestWenchang LIUYun WEIHaoxuan YUANYue GAOAiming at the problem of poor classification performance in traditional machine learning models caused by shallow model structure and complex data characteristics in small medical sample data, an combine multi- grained improved cascade forest (cgicForest) model was proposed.It enhances the representation learning ability of the model by adding random sampling into the multi-grained scanning and optimizing the transformation features.It also enhances the model's classification ability by updating the cascade forest’s hierarchical structure.Considering category imbalance problems in datasets, the safe-borderline-SMOTE (SBS) algorithm was proposed to dynamic interpolate around the few class samples belonging to the safety boundary, which can improve the quality of training data.The cgicForest was applied for training and learning, thus the SBS-cgicForest classification model was obtained which can support imbalanced medical small samples data.The model is used on three medical datasets for classification experiments.The results show that the performance indexes of the cgicForest model in the classification of medical small sample data with complex characteristics have increased by 4.1~5.4 percentage points, compared with the multi-grained cascade forest (gcForest) model.The performance indexes have increase by 6.6~11.2 percentage points after the combination with SBS algorithm, the F<sub>1</sub> score was 2~2.5 percentage points higher than that obtained by traditional sampling methods.It provides a reference for solving the classification problem of small medical sample data, and includes support for internet of things applications in smart medical scenarios.http://www.wlwxb.com.cn/zh/article/doi/10.11959/j.issn.2096-3750.2023.00337/medical datasmall sampleSMOTEgcForest |
spellingShingle | Wenchang LIU Yun WEI Haoxuan YUAN Yue GAO Research on medical small sample data classification based on SMOTE and gcForest 物联网学报 medical data small sample SMOTE gcForest |
title | Research on medical small sample data classification based on SMOTE and gcForest |
title_full | Research on medical small sample data classification based on SMOTE and gcForest |
title_fullStr | Research on medical small sample data classification based on SMOTE and gcForest |
title_full_unstemmed | Research on medical small sample data classification based on SMOTE and gcForest |
title_short | Research on medical small sample data classification based on SMOTE and gcForest |
title_sort | research on medical small sample data classification based on smote and gcforest |
topic | medical data small sample SMOTE gcForest |
url | http://www.wlwxb.com.cn/zh/article/doi/10.11959/j.issn.2096-3750.2023.00337/ |
work_keys_str_mv | AT wenchangliu researchonmedicalsmallsampledataclassificationbasedonsmoteandgcforest AT yunwei researchonmedicalsmallsampledataclassificationbasedonsmoteandgcforest AT haoxuanyuan researchonmedicalsmallsampledataclassificationbasedonsmoteandgcforest AT yuegao researchonmedicalsmallsampledataclassificationbasedonsmoteandgcforest |