Dark Side of the Web: Dark Web Classification Based on TextCNN and Topic Modeling Weight

The Dark Web is an internet domain that ensures user anonymity and has increasingly become a focal point for illegal activities and a repository for information on cyberattacks owing to the challenges in tracking its users. This study examined the classification of the Dark Web in relation to these...

Full description

Saved in:
Bibliographic Details
Main Authors: Gun-Yoon Shin, Younghoan Jang, Dong-Wook Kim, Sungjin Park, A-Ran Park, Younghwan Kim, Myung-Mook Han
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10375503/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849328803727278080
author Gun-Yoon Shin
Younghoan Jang
Dong-Wook Kim
Sungjin Park
A-Ran Park
Younghwan Kim
Myung-Mook Han
author_facet Gun-Yoon Shin
Younghoan Jang
Dong-Wook Kim
Sungjin Park
A-Ran Park
Younghwan Kim
Myung-Mook Han
author_sort Gun-Yoon Shin
collection DOAJ
description The Dark Web is an internet domain that ensures user anonymity and has increasingly become a focal point for illegal activities and a repository for information on cyberattacks owing to the challenges in tracking its users. This study examined the classification of the Dark Web in relation to these cyber threats. We processed Dark Web texts to extract vector types suitable for machine learning classification. Traditional methods utilizing the entirety of Dark Web texts to generate features result in vectors including all words found on the Dark Web. However, this approach incorporates extraneous information in the vectors, diminishing learning effectiveness and extending processing duration. The research aimed to optimize the classification process by selectively focusing on keywords within each class, thereby curtailing word vector dimensions. This optimization was facilitated by leveraging the anonymity characteristic of the Dark Web and employing topic-modeling-based weight generation. These methods enabled the creation of word vectors with a constrained feature set, enhancing the distinction of Dark Web classes. To further improve classification performance, we integrated TextCNN with topic modeling weights. For validation, we employed two datasets and compared the performance of the model with other text classification algorithms, where the proposed model demonstrated superior effectiveness in Dark Web classification.
format Article
id doaj-art-60d307afa1304f298efbfed9af9b62ac
institution Kabale University
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-60d307afa1304f298efbfed9af9b62ac2025-08-20T03:47:28ZengIEEEIEEE Access2169-35362024-01-0112363613637110.1109/ACCESS.2023.334773710375503Dark Side of the Web: Dark Web Classification Based on TextCNN and Topic Modeling WeightGun-Yoon Shin0https://orcid.org/0000-0001-9695-7613Younghoan Jang1https://orcid.org/0000-0002-5907-8879Dong-Wook Kim2https://orcid.org/0000-0002-4428-9323Sungjin Park3A-Ran Park4Younghwan Kim5Myung-Mook Han6https://orcid.org/0000-0002-0017-7944Department of AI Software, Gachon University, Seongnam-si, Republic of KoreaDepartment of AI Software, Gachon University, Seongnam-si, Republic of KoreaDepartment of AI Software, Gachon University, Seongnam-si, Republic of KoreaCyber Warfare, LIG Nex1, Seongnam-si, Republic of KoreaCyber Warfare, LIG Nex1, Seongnam-si, Republic of KoreaCyber Warfare, LIG Nex1, Seongnam-si, Republic of KoreaDepartment of AI Software, Gachon University, Seongnam-si, Republic of KoreaThe Dark Web is an internet domain that ensures user anonymity and has increasingly become a focal point for illegal activities and a repository for information on cyberattacks owing to the challenges in tracking its users. This study examined the classification of the Dark Web in relation to these cyber threats. We processed Dark Web texts to extract vector types suitable for machine learning classification. Traditional methods utilizing the entirety of Dark Web texts to generate features result in vectors including all words found on the Dark Web. However, this approach incorporates extraneous information in the vectors, diminishing learning effectiveness and extending processing duration. The research aimed to optimize the classification process by selectively focusing on keywords within each class, thereby curtailing word vector dimensions. This optimization was facilitated by leveraging the anonymity characteristic of the Dark Web and employing topic-modeling-based weight generation. These methods enabled the creation of word vectors with a constrained feature set, enhancing the distinction of Dark Web classes. To further improve classification performance, we integrated TextCNN with topic modeling weights. For validation, we employed two datasets and compared the performance of the model with other text classification algorithms, where the proposed model demonstrated superior effectiveness in Dark Web classification.https://ieeexplore.ieee.org/document/10375503/Dark webdark web analysistext classificationtopic modelingmodel explanation
spellingShingle Gun-Yoon Shin
Younghoan Jang
Dong-Wook Kim
Sungjin Park
A-Ran Park
Younghwan Kim
Myung-Mook Han
Dark Side of the Web: Dark Web Classification Based on TextCNN and Topic Modeling Weight
IEEE Access
Dark web
dark web analysis
text classification
topic modeling
model explanation
title Dark Side of the Web: Dark Web Classification Based on TextCNN and Topic Modeling Weight
title_full Dark Side of the Web: Dark Web Classification Based on TextCNN and Topic Modeling Weight
title_fullStr Dark Side of the Web: Dark Web Classification Based on TextCNN and Topic Modeling Weight
title_full_unstemmed Dark Side of the Web: Dark Web Classification Based on TextCNN and Topic Modeling Weight
title_short Dark Side of the Web: Dark Web Classification Based on TextCNN and Topic Modeling Weight
title_sort dark side of the web dark web classification based on textcnn and topic modeling weight
topic Dark web
dark web analysis
text classification
topic modeling
model explanation
url https://ieeexplore.ieee.org/document/10375503/
work_keys_str_mv AT gunyoonshin darksideofthewebdarkwebclassificationbasedontextcnnandtopicmodelingweight
AT younghoanjang darksideofthewebdarkwebclassificationbasedontextcnnandtopicmodelingweight
AT dongwookkim darksideofthewebdarkwebclassificationbasedontextcnnandtopicmodelingweight
AT sungjinpark darksideofthewebdarkwebclassificationbasedontextcnnandtopicmodelingweight
AT aranpark darksideofthewebdarkwebclassificationbasedontextcnnandtopicmodelingweight
AT younghwankim darksideofthewebdarkwebclassificationbasedontextcnnandtopicmodelingweight
AT myungmookhan darksideofthewebdarkwebclassificationbasedontextcnnandtopicmodelingweight