Enhancing Cyber Security: Comparing the Accuracy of the Bert Model with Other Common Deep Learning Models in Identifying Email Spam
Spam emails constitute a significant percentage of email traffic and are considered a cybersecurity threat, often leading to phishing attacks, malware infections, and financial fraud. These emails, sent in bulk for commercial and malicious purposes, can bypass traditional spam filters, necessitating...
Saved in:
| Main Author: | |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Bilijipub publisher
2025-03-01
|
| Series: | Advances in Engineering and Intelligence Systems |
| Subjects: | |
| Online Access: | https://aeis.bilijipub.com/article_218015_22cff3c48c0b925871a99a009b7951b5.pdf |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Spam emails constitute a significant percentage of email traffic and are considered a cybersecurity threat, often leading to phishing attacks, malware infections, and financial fraud. These emails, sent in bulk for commercial and malicious purposes, can bypass traditional spam filters, necessitating the development of high-accuracy models for effective detection. A major challenge in spam filtering is reducing false positives, which can lead to legitimate emails being incorrectly classified as spam, impacting users' email communication. In this study, deep learning (DL) and natural language processing (NLP) methods were employed to develop a spam detection model. Five DL-based models—Dense, CNN, LSTM, CNN-LSTM, and BERT—were evaluated. Data preprocessing included stemming, lemmatization, and text vectorization using Word2Vec to enhance feature extraction. The models were trained on a real dataset, and their accuracy was assessed using multiple evaluation indices. The findings demonstrated that, among the tested models, BERT achieved the highest accuracy (99.33%), outperforming all other approaches in spam detection. Its ability to understand contextual relationships and mitigate false positives makes it highly suitable for real-world applications. Given its computational demands, future research should focus on optimizing BERT for real-time deployment through model compression and parallel execution. Additionally, further testing on larger and more diverse datasets and implementing multilingual spam filtering capabilities will enhance its practical utility. |
|---|---|
| ISSN: | 2821-0263 |