Big data processing using hybrid Gaussian mixture model with salp swarm algorithm

Abstract The traditional methods used in big data, like cluster creation and query-based data extraction, fail to yield accurate results on massive networks. To address such issues, the proposed approach involves using the Hadoop Distributed File System (HDFS) for data processing, the map-reduce pro...

Full description

Saved in:
Bibliographic Details
Main Authors: R. Saravanakumar, T. TamilSelvi, Digvijay Pandey, Binay Kumar Pandey, Darshan A. Mahajan, Mesfin Esayas Lelisho
Format: Article
Language:English
Published: SpringerOpen 2024-11-01
Series:Journal of Big Data
Subjects:
Online Access:https://doi.org/10.1186/s40537-024-01015-3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850162730139385856
author R. Saravanakumar
T. TamilSelvi
Digvijay Pandey
Binay Kumar Pandey
Darshan A. Mahajan
Mesfin Esayas Lelisho
author_facet R. Saravanakumar
T. TamilSelvi
Digvijay Pandey
Binay Kumar Pandey
Darshan A. Mahajan
Mesfin Esayas Lelisho
author_sort R. Saravanakumar
collection DOAJ
description Abstract The traditional methods used in big data, like cluster creation and query-based data extraction, fail to yield accurate results on massive networks. To address such issues, the proposed approach involves using the Hadoop Distributed File System (HDFS) for data processing, the map-reduce programming paradigm for data processing, and query optimization techniques to quickly and effectively extract accurate outcomes from a variety of options with a high processing capacity. The methodology proposed in this work makes use of Gaussian Mixture Model (GMM) for data clustering and the Salp Swarm Algorithm (SSA) for optimization. The security of preprocessed data stored on networked clusters with interconnections has been ensured by SHA algorithms. Finally, incorporating into consideration the important parameters, evaluation findings for the experimental performance of the model in the indicated methodology are produced. For this work, the estimated range of input file sizes is 60–100 MB. The processing of 100 MB of input files yielded an accuracy of 96% and results for specificity and sensitivity of 90% and 93%, respectively. The outcomes have been compared with well-known methods like fuzzy C-means and K-means approaches, and the results show that the proposed method effectively distributes accurate data processing to cluster nodes with low latency. Moreover, it uses the least amount of memory resources possible when operating on functional CPUs. As a result, the proposed approach outperforms existing techniques.
format Article
id doaj-art-d078bbed49be4f47b192e3010e8c29b3
institution OA Journals
issn 2196-1115
language English
publishDate 2024-11-01
publisher SpringerOpen
record_format Article
series Journal of Big Data
spelling doaj-art-d078bbed49be4f47b192e3010e8c29b32025-08-20T02:22:29ZengSpringerOpenJournal of Big Data2196-11152024-11-0111112910.1186/s40537-024-01015-3Big data processing using hybrid Gaussian mixture model with salp swarm algorithmR. Saravanakumar0T. TamilSelvi1Digvijay Pandey2Binay Kumar Pandey3Darshan A. Mahajan4Mesfin Esayas Lelisho5Department of CSE, Dayananda Sagar Academy of Technology & ManagementDepartment of CSE, Panimalar Institute of TechnologyDepartment of Technical Education Uttar PradeshDepartment of Information Technology, College of Technology, Govind Ballabh Pant University of Agriculture and Technology PantnagarNICMAR University PuneDepartment of Statistics, Mizan-Tepi UniversityAbstract The traditional methods used in big data, like cluster creation and query-based data extraction, fail to yield accurate results on massive networks. To address such issues, the proposed approach involves using the Hadoop Distributed File System (HDFS) for data processing, the map-reduce programming paradigm for data processing, and query optimization techniques to quickly and effectively extract accurate outcomes from a variety of options with a high processing capacity. The methodology proposed in this work makes use of Gaussian Mixture Model (GMM) for data clustering and the Salp Swarm Algorithm (SSA) for optimization. The security of preprocessed data stored on networked clusters with interconnections has been ensured by SHA algorithms. Finally, incorporating into consideration the important parameters, evaluation findings for the experimental performance of the model in the indicated methodology are produced. For this work, the estimated range of input file sizes is 60–100 MB. The processing of 100 MB of input files yielded an accuracy of 96% and results for specificity and sensitivity of 90% and 93%, respectively. The outcomes have been compared with well-known methods like fuzzy C-means and K-means approaches, and the results show that the proposed method effectively distributes accurate data processing to cluster nodes with low latency. Moreover, it uses the least amount of memory resources possible when operating on functional CPUs. As a result, the proposed approach outperforms existing techniques.https://doi.org/10.1186/s40537-024-01015-3Hadoop distributed file system (HDFS)Map-reduceGaussian mixture model (GMM)Salp swarm algorithm (SSA)Secure hash algorithms (SHA)
spellingShingle R. Saravanakumar
T. TamilSelvi
Digvijay Pandey
Binay Kumar Pandey
Darshan A. Mahajan
Mesfin Esayas Lelisho
Big data processing using hybrid Gaussian mixture model with salp swarm algorithm
Journal of Big Data
Hadoop distributed file system (HDFS)
Map-reduce
Gaussian mixture model (GMM)
Salp swarm algorithm (SSA)
Secure hash algorithms (SHA)
title Big data processing using hybrid Gaussian mixture model with salp swarm algorithm
title_full Big data processing using hybrid Gaussian mixture model with salp swarm algorithm
title_fullStr Big data processing using hybrid Gaussian mixture model with salp swarm algorithm
title_full_unstemmed Big data processing using hybrid Gaussian mixture model with salp swarm algorithm
title_short Big data processing using hybrid Gaussian mixture model with salp swarm algorithm
title_sort big data processing using hybrid gaussian mixture model with salp swarm algorithm
topic Hadoop distributed file system (HDFS)
Map-reduce
Gaussian mixture model (GMM)
Salp swarm algorithm (SSA)
Secure hash algorithms (SHA)
url https://doi.org/10.1186/s40537-024-01015-3
work_keys_str_mv AT rsaravanakumar bigdataprocessingusinghybridgaussianmixturemodelwithsalpswarmalgorithm
AT ttamilselvi bigdataprocessingusinghybridgaussianmixturemodelwithsalpswarmalgorithm
AT digvijaypandey bigdataprocessingusinghybridgaussianmixturemodelwithsalpswarmalgorithm
AT binaykumarpandey bigdataprocessingusinghybridgaussianmixturemodelwithsalpswarmalgorithm
AT darshanamahajan bigdataprocessingusinghybridgaussianmixturemodelwithsalpswarmalgorithm
AT mesfinesayaslelisho bigdataprocessingusinghybridgaussianmixturemodelwithsalpswarmalgorithm