Real-Time Automated Cyber Threat Classification and Emerging Threat Detection Framework

Automating cyber threat intelligence (CTI) collection and analysis in real time is critical for the timely detection and mitigation of cyber threats. Cybersecurity researchers have recently recommended CTI as a proactive and robust method for automated cyber threat prediction. This automated solutio...

Full description

Saved in:
Bibliographic Details
Main Authors: Alemayehu Tilahun Haile, Surafel Lemma Abebe, Henock Mulugeta Melaku
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Open Journal of the Computer Society
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11037544/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849427135725305856
author Alemayehu Tilahun Haile
Surafel Lemma Abebe
Henock Mulugeta Melaku
author_facet Alemayehu Tilahun Haile
Surafel Lemma Abebe
Henock Mulugeta Melaku
author_sort Alemayehu Tilahun Haile
collection DOAJ
description Automating cyber threat intelligence (CTI) collection and analysis in real time is critical for the timely detection and mitigation of cyber threats. Cybersecurity researchers have recently recommended CTI as a proactive and robust method for automated cyber threat prediction. This automated solution collects and analyzes real-time data from social media, cybersecurity forums, and hacker forums where cybersecurity analysts and hackers discuss cybersecurity-related topics to discover potential threats. In this article, we propose a comprehensive framework that automates both cyber threat classification and emerging threat detection using real-time data from surface, deep, and dark web sources. We collected real-time data from hackers and security forums to construct binary and multiclass cyber threat classifications. We employed a labeled leaked dataset to be considered as ground truth for classification. Machine and deep learning techniques were used to perform the classification. Latent Dirichlet allocation (LDA) and nonnegative matrix factorization (NMF) were used to analyze topic distribution over time and identify emerging threats. This approach allows for the identification of zero-day attacks and other emerging threats by monitoring shifts in topics. Using a support vector machine with the bag-of-words (binary term weight) model achieved the highest accuracies of 93.67 and 96.35 for binary and multiclass classifications, respectively. Moreover, LDA and NMF were used to extract the top topics from various numbers of topics. The LDA model is well suited for identifying emerging trends and useful for real-time threat monitoring in cybersecurity.
format Article
id doaj-art-caa63576efac49e59dad00c176f55fdd
institution Kabale University
issn 2644-1268
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Open Journal of the Computer Society
spelling doaj-art-caa63576efac49e59dad00c176f55fdd2025-08-20T03:29:06ZengIEEEIEEE Open Journal of the Computer Society2644-12682025-01-01692193010.1109/OJCS.2025.358023511037544Real-Time Automated Cyber Threat Classification and Emerging Threat Detection FrameworkAlemayehu Tilahun Haile0https://orcid.org/0009-0004-8511-2788Surafel Lemma Abebe1https://orcid.org/0000-0002-2137-8673Henock Mulugeta Melaku2https://orcid.org/0000-0002-0467-8121School of Information Technology and Engineering, College of Technology and Built Environment, Addis Ababa University, Addis Ababa, EthiopiaSchool of Electrical and Computer Engineering, College of Technology and Built Environment, Addis Ababa University, Addis Ababa, EthiopiaSchool of Information Technology and Engineering, College of Technology and Built Environment, Addis Ababa University, Addis Ababa, EthiopiaAutomating cyber threat intelligence (CTI) collection and analysis in real time is critical for the timely detection and mitigation of cyber threats. Cybersecurity researchers have recently recommended CTI as a proactive and robust method for automated cyber threat prediction. This automated solution collects and analyzes real-time data from social media, cybersecurity forums, and hacker forums where cybersecurity analysts and hackers discuss cybersecurity-related topics to discover potential threats. In this article, we propose a comprehensive framework that automates both cyber threat classification and emerging threat detection using real-time data from surface, deep, and dark web sources. We collected real-time data from hackers and security forums to construct binary and multiclass cyber threat classifications. We employed a labeled leaked dataset to be considered as ground truth for classification. Machine and deep learning techniques were used to perform the classification. Latent Dirichlet allocation (LDA) and nonnegative matrix factorization (NMF) were used to analyze topic distribution over time and identify emerging threats. This approach allows for the identification of zero-day attacks and other emerging threats by monitoring shifts in topics. Using a support vector machine with the bag-of-words (binary term weight) model achieved the highest accuracies of 93.67 and 96.35 for binary and multiclass classifications, respectively. Moreover, LDA and NMF were used to extract the top topics from various numbers of topics. The LDA model is well suited for identifying emerging trends and useful for real-time threat monitoring in cybersecurity.https://ieeexplore.ieee.org/document/11037544/Cyber threat intelligence (CTI)emerging threat detectionLatent Dirichlet allocation (LDA)natural language processingnonnegative matrix factorization (NMF)threat categorization
spellingShingle Alemayehu Tilahun Haile
Surafel Lemma Abebe
Henock Mulugeta Melaku
Real-Time Automated Cyber Threat Classification and Emerging Threat Detection Framework
IEEE Open Journal of the Computer Society
Cyber threat intelligence (CTI)
emerging threat detection
Latent Dirichlet allocation (LDA)
natural language processing
nonnegative matrix factorization (NMF)
threat categorization
title Real-Time Automated Cyber Threat Classification and Emerging Threat Detection Framework
title_full Real-Time Automated Cyber Threat Classification and Emerging Threat Detection Framework
title_fullStr Real-Time Automated Cyber Threat Classification and Emerging Threat Detection Framework
title_full_unstemmed Real-Time Automated Cyber Threat Classification and Emerging Threat Detection Framework
title_short Real-Time Automated Cyber Threat Classification and Emerging Threat Detection Framework
title_sort real time automated cyber threat classification and emerging threat detection framework
topic Cyber threat intelligence (CTI)
emerging threat detection
Latent Dirichlet allocation (LDA)
natural language processing
nonnegative matrix factorization (NMF)
threat categorization
url https://ieeexplore.ieee.org/document/11037544/
work_keys_str_mv AT alemayehutilahunhaile realtimeautomatedcyberthreatclassificationandemergingthreatdetectionframework
AT surafellemmaabebe realtimeautomatedcyberthreatclassificationandemergingthreatdetectionframework
AT henockmulugetamelaku realtimeautomatedcyberthreatclassificationandemergingthreatdetectionframework