Big Data Analytics in IoT, social media, NLP, and information security: trends, challenges, and applications

Abstract This paper presents a comprehensive, domain-specific survey and experimental evaluation of machine learning techniques for Big Data Analytics across four critical domains: IoT, Social Media, Natural Language Processing (NLP), and Information Security. A novel taxonomic framework is introduc...

Full description

Saved in:
Bibliographic Details
Main Author: Kamal Taha
Format: Article
Language:English
Published: SpringerOpen 2025-06-01
Series:Journal of Big Data
Subjects:
Online Access:https://doi.org/10.1186/s40537-025-01192-9
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850207274189979648
author Kamal Taha
author_facet Kamal Taha
author_sort Kamal Taha
collection DOAJ
description Abstract This paper presents a comprehensive, domain-specific survey and experimental evaluation of machine learning techniques for Big Data Analytics across four critical domains: IoT, Social Media, Natural Language Processing (NLP), and Information Security. A novel taxonomic framework is introduced to classify and analyze the suitability of algorithms based on empirical, experimental, and computational perspectives. The study integrates large-scale experimental benchmarking of key techniques—including CNN, XGBoost, Self-Supervised Learning (SSL), Graph Neural Networks (GNN), ELM, KNN, and Decision Trees—using real-world and synthetic datasets, and evaluates them across five performance metrics: accuracy, F1-score, precision, recall, and computational time. Key findings reveal that: (1) GNN and Self-Supervised Learning (SSL) are top performers in terms of predictive performance and efficiency in domains such as IoT and Social Media, (2) XGBoost and CNN offer superior accuracy and robustness across structured and unstructured data tasks, though CNN incurs higher computational costs, (3) ELM and Decision Trees are better suited for lightweight or interpretable applications, and (4) KNN generally underperforms in scalability and predictive strength for large-scale tasks. The taxonomy and experiments collectively demonstrate the need for context-aware algorithm selection, particularly for real-time and scalable Big Data applications. By aligning algorithmic properties with domain-specific challenges, this study offers actionable insights for researchers and practitioners seeking effective analytic strategies in the evolving landscape of Big Data.
format Article
id doaj-art-c91f4576b97c48e19baa9964598d687f
institution OA Journals
issn 2196-1115
language English
publishDate 2025-06-01
publisher SpringerOpen
record_format Article
series Journal of Big Data
spelling doaj-art-c91f4576b97c48e19baa9964598d687f2025-08-20T02:10:34ZengSpringerOpenJournal of Big Data2196-11152025-06-0112119110.1186/s40537-025-01192-9Big Data Analytics in IoT, social media, NLP, and information security: trends, challenges, and applicationsKamal Taha0Department of Computer Science, Khalifa UniversityAbstract This paper presents a comprehensive, domain-specific survey and experimental evaluation of machine learning techniques for Big Data Analytics across four critical domains: IoT, Social Media, Natural Language Processing (NLP), and Information Security. A novel taxonomic framework is introduced to classify and analyze the suitability of algorithms based on empirical, experimental, and computational perspectives. The study integrates large-scale experimental benchmarking of key techniques—including CNN, XGBoost, Self-Supervised Learning (SSL), Graph Neural Networks (GNN), ELM, KNN, and Decision Trees—using real-world and synthetic datasets, and evaluates them across five performance metrics: accuracy, F1-score, precision, recall, and computational time. Key findings reveal that: (1) GNN and Self-Supervised Learning (SSL) are top performers in terms of predictive performance and efficiency in domains such as IoT and Social Media, (2) XGBoost and CNN offer superior accuracy and robustness across structured and unstructured data tasks, though CNN incurs higher computational costs, (3) ELM and Decision Trees are better suited for lightweight or interpretable applications, and (4) KNN generally underperforms in scalability and predictive strength for large-scale tasks. The taxonomy and experiments collectively demonstrate the need for context-aware algorithm selection, particularly for real-time and scalable Big Data applications. By aligning algorithmic properties with domain-specific challenges, this study offers actionable insights for researchers and practitioners seeking effective analytic strategies in the evolving landscape of Big Data.https://doi.org/10.1186/s40537-025-01192-9Big Data AnalyticsMachine learningIoTSocial media analysisNLPInformation security
spellingShingle Kamal Taha
Big Data Analytics in IoT, social media, NLP, and information security: trends, challenges, and applications
Journal of Big Data
Big Data Analytics
Machine learning
IoT
Social media analysis
NLP
Information security
title Big Data Analytics in IoT, social media, NLP, and information security: trends, challenges, and applications
title_full Big Data Analytics in IoT, social media, NLP, and information security: trends, challenges, and applications
title_fullStr Big Data Analytics in IoT, social media, NLP, and information security: trends, challenges, and applications
title_full_unstemmed Big Data Analytics in IoT, social media, NLP, and information security: trends, challenges, and applications
title_short Big Data Analytics in IoT, social media, NLP, and information security: trends, challenges, and applications
title_sort big data analytics in iot social media nlp and information security trends challenges and applications
topic Big Data Analytics
Machine learning
IoT
Social media analysis
NLP
Information security
url https://doi.org/10.1186/s40537-025-01192-9
work_keys_str_mv AT kamaltaha bigdataanalyticsiniotsocialmedianlpandinformationsecuritytrendschallengesandapplications