Big data processing using hybrid Gaussian mixture model with salp swarm algorithm
Abstract The traditional methods used in big data, like cluster creation and query-based data extraction, fail to yield accurate results on massive networks. To address such issues, the proposed approach involves using the Hadoop Distributed File System (HDFS) for data processing, the map-reduce pro...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
SpringerOpen
2024-11-01
|
| Series: | Journal of Big Data |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s40537-024-01015-3 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850162730139385856 |
|---|---|
| author | R. Saravanakumar T. TamilSelvi Digvijay Pandey Binay Kumar Pandey Darshan A. Mahajan Mesfin Esayas Lelisho |
| author_facet | R. Saravanakumar T. TamilSelvi Digvijay Pandey Binay Kumar Pandey Darshan A. Mahajan Mesfin Esayas Lelisho |
| author_sort | R. Saravanakumar |
| collection | DOAJ |
| description | Abstract The traditional methods used in big data, like cluster creation and query-based data extraction, fail to yield accurate results on massive networks. To address such issues, the proposed approach involves using the Hadoop Distributed File System (HDFS) for data processing, the map-reduce programming paradigm for data processing, and query optimization techniques to quickly and effectively extract accurate outcomes from a variety of options with a high processing capacity. The methodology proposed in this work makes use of Gaussian Mixture Model (GMM) for data clustering and the Salp Swarm Algorithm (SSA) for optimization. The security of preprocessed data stored on networked clusters with interconnections has been ensured by SHA algorithms. Finally, incorporating into consideration the important parameters, evaluation findings for the experimental performance of the model in the indicated methodology are produced. For this work, the estimated range of input file sizes is 60–100 MB. The processing of 100 MB of input files yielded an accuracy of 96% and results for specificity and sensitivity of 90% and 93%, respectively. The outcomes have been compared with well-known methods like fuzzy C-means and K-means approaches, and the results show that the proposed method effectively distributes accurate data processing to cluster nodes with low latency. Moreover, it uses the least amount of memory resources possible when operating on functional CPUs. As a result, the proposed approach outperforms existing techniques. |
| format | Article |
| id | doaj-art-d078bbed49be4f47b192e3010e8c29b3 |
| institution | OA Journals |
| issn | 2196-1115 |
| language | English |
| publishDate | 2024-11-01 |
| publisher | SpringerOpen |
| record_format | Article |
| series | Journal of Big Data |
| spelling | doaj-art-d078bbed49be4f47b192e3010e8c29b32025-08-20T02:22:29ZengSpringerOpenJournal of Big Data2196-11152024-11-0111112910.1186/s40537-024-01015-3Big data processing using hybrid Gaussian mixture model with salp swarm algorithmR. Saravanakumar0T. TamilSelvi1Digvijay Pandey2Binay Kumar Pandey3Darshan A. Mahajan4Mesfin Esayas Lelisho5Department of CSE, Dayananda Sagar Academy of Technology & ManagementDepartment of CSE, Panimalar Institute of TechnologyDepartment of Technical Education Uttar PradeshDepartment of Information Technology, College of Technology, Govind Ballabh Pant University of Agriculture and Technology PantnagarNICMAR University PuneDepartment of Statistics, Mizan-Tepi UniversityAbstract The traditional methods used in big data, like cluster creation and query-based data extraction, fail to yield accurate results on massive networks. To address such issues, the proposed approach involves using the Hadoop Distributed File System (HDFS) for data processing, the map-reduce programming paradigm for data processing, and query optimization techniques to quickly and effectively extract accurate outcomes from a variety of options with a high processing capacity. The methodology proposed in this work makes use of Gaussian Mixture Model (GMM) for data clustering and the Salp Swarm Algorithm (SSA) for optimization. The security of preprocessed data stored on networked clusters with interconnections has been ensured by SHA algorithms. Finally, incorporating into consideration the important parameters, evaluation findings for the experimental performance of the model in the indicated methodology are produced. For this work, the estimated range of input file sizes is 60–100 MB. The processing of 100 MB of input files yielded an accuracy of 96% and results for specificity and sensitivity of 90% and 93%, respectively. The outcomes have been compared with well-known methods like fuzzy C-means and K-means approaches, and the results show that the proposed method effectively distributes accurate data processing to cluster nodes with low latency. Moreover, it uses the least amount of memory resources possible when operating on functional CPUs. As a result, the proposed approach outperforms existing techniques.https://doi.org/10.1186/s40537-024-01015-3Hadoop distributed file system (HDFS)Map-reduceGaussian mixture model (GMM)Salp swarm algorithm (SSA)Secure hash algorithms (SHA) |
| spellingShingle | R. Saravanakumar T. TamilSelvi Digvijay Pandey Binay Kumar Pandey Darshan A. Mahajan Mesfin Esayas Lelisho Big data processing using hybrid Gaussian mixture model with salp swarm algorithm Journal of Big Data Hadoop distributed file system (HDFS) Map-reduce Gaussian mixture model (GMM) Salp swarm algorithm (SSA) Secure hash algorithms (SHA) |
| title | Big data processing using hybrid Gaussian mixture model with salp swarm algorithm |
| title_full | Big data processing using hybrid Gaussian mixture model with salp swarm algorithm |
| title_fullStr | Big data processing using hybrid Gaussian mixture model with salp swarm algorithm |
| title_full_unstemmed | Big data processing using hybrid Gaussian mixture model with salp swarm algorithm |
| title_short | Big data processing using hybrid Gaussian mixture model with salp swarm algorithm |
| title_sort | big data processing using hybrid gaussian mixture model with salp swarm algorithm |
| topic | Hadoop distributed file system (HDFS) Map-reduce Gaussian mixture model (GMM) Salp swarm algorithm (SSA) Secure hash algorithms (SHA) |
| url | https://doi.org/10.1186/s40537-024-01015-3 |
| work_keys_str_mv | AT rsaravanakumar bigdataprocessingusinghybridgaussianmixturemodelwithsalpswarmalgorithm AT ttamilselvi bigdataprocessingusinghybridgaussianmixturemodelwithsalpswarmalgorithm AT digvijaypandey bigdataprocessingusinghybridgaussianmixturemodelwithsalpswarmalgorithm AT binaykumarpandey bigdataprocessingusinghybridgaussianmixturemodelwithsalpswarmalgorithm AT darshanamahajan bigdataprocessingusinghybridgaussianmixturemodelwithsalpswarmalgorithm AT mesfinesayaslelisho bigdataprocessingusinghybridgaussianmixturemodelwithsalpswarmalgorithm |