Parallel deep forest algorithm based on Spark and three-way interactive information

To address issues such as excessive redundancy and irrelevant features, long class vectors, slow model convergence, and low efficiency of parallel training in parallel deep forests, a parallel deep forest algorithm based on Spark and three-way interactive information was proposed.Firstly, a feature...

Full description

Saved in:
Bibliographic Details
Main Authors: Yimin MAO, Zhan ZHOU, Zhigang CHEN
Format: Article
Language:zho
Published: Editorial Department of Journal on Communications 2023-08-01
Series:Tongxin xuebao
Subjects:
Online Access:http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2023143/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:To address issues such as excessive redundancy and irrelevant features, long class vectors, slow model convergence, and low efficiency of parallel training in parallel deep forests, a parallel deep forest algorithm based on Spark and three-way interactive information was proposed.Firstly, a feature selection based on feature interaction (FSFI) strategy was proposed to filter the original features and eliminate irrelevant and redundant features.Secondly, a multi-granularity vector elimination (MGVE) strategy was proposed, which fused similar class vectors and shortened the class vector length.Subsequently, the cascade forest feature enhancement (CFFE) strategy was proposed to improve the utilization of information and accelerate the convergence speed of the model.Finally, a multi-level load balancing (MLB) strategy was proposed, combined with the Spark framework, to improve the parallelization efficiency through adaptive sub-forest division and heterogeneous skew data partitioning.Experimental results demonstrate that the proposed algorithm significantly improves the model classification effect and reduces the parallelization training time.
ISSN:1000-436X