A hybrid approach to financial big data analysis using extended ensemble learning and optimized spark streaming

The financial sector faces mounting challenges in processing vast volumes of high-velocity data to support intelligent, real-time decision-making. Traditional machine learning models often fall short in accuracy, scalability, and responsiveness when dealing with large, dynamic financial datasets. Th...

Full description

Saved in:
Bibliographic Details
Main Author: Muhammad Babar
Format: Article
Language:English
Published: Elsevier 2025-09-01
Series:Journal of Open Innovation: Technology, Market and Complexity
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2199853125001374
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849389740098322432
author Muhammad Babar
author_facet Muhammad Babar
author_sort Muhammad Babar
collection DOAJ
description The financial sector faces mounting challenges in processing vast volumes of high-velocity data to support intelligent, real-time decision-making. Traditional machine learning models often fall short in accuracy, scalability, and responsiveness when dealing with large, dynamic financial datasets. This study presents a hybrid architecture that integrates extended ensemble learning with an optimized big data processing pipeline based on Apache Spark Streaming to address these limitations. The core ensemble combines K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and K-Neighbors Classifier (KNC) to improve classification robustness and generalization. The system is designed for distributed and parallel execution, leveraging Spark’s map-reduce capabilities for high-throughput, low-latency data handling. Empirical evaluations using the Portuguese Bank Marketing dataset demonstrate that the proposed architecture achieves a high prediction accuracy of 90.9%, outperforming individual models such as Logistic Regression, SVM, and Random Forest. The ensemble model also reports a mean absolute error (MAE) of 0.023 and a mean squared error (MSE) of 0.0018. Regarding system performance, it processes 10,000 records per second with an average latency of 150 ms and maintains memory usage around 4GB, making it suitable for real-time financial analytics. The proposed architecture significantly enhances precision in predicting client behaviors, such as loan subscription decisions, and supports robust, scalable financial decision-making. This research offers valuable insights for integrating ensemble learning with big data technologies in FinTech, enabling more accurate, transparent, and efficient financial systems.
format Article
id doaj-art-3eb38a815f304c8ea7b62ab9a0d5e3fc
institution Kabale University
issn 2199-8531
language English
publishDate 2025-09-01
publisher Elsevier
record_format Article
series Journal of Open Innovation: Technology, Market and Complexity
spelling doaj-art-3eb38a815f304c8ea7b62ab9a0d5e3fc2025-08-20T03:41:52ZengElsevierJournal of Open Innovation: Technology, Market and Complexity2199-85312025-09-0111310060210.1016/j.joitmc.2025.100602A hybrid approach to financial big data analysis using extended ensemble learning and optimized spark streamingMuhammad Babar0Robotics and Internet of Things Lab, Prince Sultan University, Riyadh, Saudi ArabiaThe financial sector faces mounting challenges in processing vast volumes of high-velocity data to support intelligent, real-time decision-making. Traditional machine learning models often fall short in accuracy, scalability, and responsiveness when dealing with large, dynamic financial datasets. This study presents a hybrid architecture that integrates extended ensemble learning with an optimized big data processing pipeline based on Apache Spark Streaming to address these limitations. The core ensemble combines K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and K-Neighbors Classifier (KNC) to improve classification robustness and generalization. The system is designed for distributed and parallel execution, leveraging Spark’s map-reduce capabilities for high-throughput, low-latency data handling. Empirical evaluations using the Portuguese Bank Marketing dataset demonstrate that the proposed architecture achieves a high prediction accuracy of 90.9%, outperforming individual models such as Logistic Regression, SVM, and Random Forest. The ensemble model also reports a mean absolute error (MAE) of 0.023 and a mean squared error (MSE) of 0.0018. Regarding system performance, it processes 10,000 records per second with an average latency of 150 ms and maintains memory usage around 4GB, making it suitable for real-time financial analytics. The proposed architecture significantly enhances precision in predicting client behaviors, such as loan subscription decisions, and supports robust, scalable financial decision-making. This research offers valuable insights for integrating ensemble learning with big data technologies in FinTech, enabling more accurate, transparent, and efficient financial systems.http://www.sciencedirect.com/science/article/pii/S2199853125001374Machine learningBig dataAIData analysisClassifiersFinance
spellingShingle Muhammad Babar
A hybrid approach to financial big data analysis using extended ensemble learning and optimized spark streaming
Journal of Open Innovation: Technology, Market and Complexity
Machine learning
Big data
AI
Data analysis
Classifiers
Finance
title A hybrid approach to financial big data analysis using extended ensemble learning and optimized spark streaming
title_full A hybrid approach to financial big data analysis using extended ensemble learning and optimized spark streaming
title_fullStr A hybrid approach to financial big data analysis using extended ensemble learning and optimized spark streaming
title_full_unstemmed A hybrid approach to financial big data analysis using extended ensemble learning and optimized spark streaming
title_short A hybrid approach to financial big data analysis using extended ensemble learning and optimized spark streaming
title_sort hybrid approach to financial big data analysis using extended ensemble learning and optimized spark streaming
topic Machine learning
Big data
AI
Data analysis
Classifiers
Finance
url http://www.sciencedirect.com/science/article/pii/S2199853125001374
work_keys_str_mv AT muhammadbabar ahybridapproachtofinancialbigdataanalysisusingextendedensemblelearningandoptimizedsparkstreaming
AT muhammadbabar hybridapproachtofinancialbigdataanalysisusingextendedensemblelearningandoptimizedsparkstreaming