Dark Side of the Web: Dark Web Classification Based on TextCNN and Topic Modeling Weight
The Dark Web is an internet domain that ensures user anonymity and has increasingly become a focal point for illegal activities and a repository for information on cyberattacks owing to the challenges in tracking its users. This study examined the classification of the Dark Web in relation to these...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10375503/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849328803727278080 |
|---|---|
| author | Gun-Yoon Shin Younghoan Jang Dong-Wook Kim Sungjin Park A-Ran Park Younghwan Kim Myung-Mook Han |
| author_facet | Gun-Yoon Shin Younghoan Jang Dong-Wook Kim Sungjin Park A-Ran Park Younghwan Kim Myung-Mook Han |
| author_sort | Gun-Yoon Shin |
| collection | DOAJ |
| description | The Dark Web is an internet domain that ensures user anonymity and has increasingly become a focal point for illegal activities and a repository for information on cyberattacks owing to the challenges in tracking its users. This study examined the classification of the Dark Web in relation to these cyber threats. We processed Dark Web texts to extract vector types suitable for machine learning classification. Traditional methods utilizing the entirety of Dark Web texts to generate features result in vectors including all words found on the Dark Web. However, this approach incorporates extraneous information in the vectors, diminishing learning effectiveness and extending processing duration. The research aimed to optimize the classification process by selectively focusing on keywords within each class, thereby curtailing word vector dimensions. This optimization was facilitated by leveraging the anonymity characteristic of the Dark Web and employing topic-modeling-based weight generation. These methods enabled the creation of word vectors with a constrained feature set, enhancing the distinction of Dark Web classes. To further improve classification performance, we integrated TextCNN with topic modeling weights. For validation, we employed two datasets and compared the performance of the model with other text classification algorithms, where the proposed model demonstrated superior effectiveness in Dark Web classification. |
| format | Article |
| id | doaj-art-60d307afa1304f298efbfed9af9b62ac |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2024-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-60d307afa1304f298efbfed9af9b62ac2025-08-20T03:47:28ZengIEEEIEEE Access2169-35362024-01-0112363613637110.1109/ACCESS.2023.334773710375503Dark Side of the Web: Dark Web Classification Based on TextCNN and Topic Modeling WeightGun-Yoon Shin0https://orcid.org/0000-0001-9695-7613Younghoan Jang1https://orcid.org/0000-0002-5907-8879Dong-Wook Kim2https://orcid.org/0000-0002-4428-9323Sungjin Park3A-Ran Park4Younghwan Kim5Myung-Mook Han6https://orcid.org/0000-0002-0017-7944Department of AI Software, Gachon University, Seongnam-si, Republic of KoreaDepartment of AI Software, Gachon University, Seongnam-si, Republic of KoreaDepartment of AI Software, Gachon University, Seongnam-si, Republic of KoreaCyber Warfare, LIG Nex1, Seongnam-si, Republic of KoreaCyber Warfare, LIG Nex1, Seongnam-si, Republic of KoreaCyber Warfare, LIG Nex1, Seongnam-si, Republic of KoreaDepartment of AI Software, Gachon University, Seongnam-si, Republic of KoreaThe Dark Web is an internet domain that ensures user anonymity and has increasingly become a focal point for illegal activities and a repository for information on cyberattacks owing to the challenges in tracking its users. This study examined the classification of the Dark Web in relation to these cyber threats. We processed Dark Web texts to extract vector types suitable for machine learning classification. Traditional methods utilizing the entirety of Dark Web texts to generate features result in vectors including all words found on the Dark Web. However, this approach incorporates extraneous information in the vectors, diminishing learning effectiveness and extending processing duration. The research aimed to optimize the classification process by selectively focusing on keywords within each class, thereby curtailing word vector dimensions. This optimization was facilitated by leveraging the anonymity characteristic of the Dark Web and employing topic-modeling-based weight generation. These methods enabled the creation of word vectors with a constrained feature set, enhancing the distinction of Dark Web classes. To further improve classification performance, we integrated TextCNN with topic modeling weights. For validation, we employed two datasets and compared the performance of the model with other text classification algorithms, where the proposed model demonstrated superior effectiveness in Dark Web classification.https://ieeexplore.ieee.org/document/10375503/Dark webdark web analysistext classificationtopic modelingmodel explanation |
| spellingShingle | Gun-Yoon Shin Younghoan Jang Dong-Wook Kim Sungjin Park A-Ran Park Younghwan Kim Myung-Mook Han Dark Side of the Web: Dark Web Classification Based on TextCNN and Topic Modeling Weight IEEE Access Dark web dark web analysis text classification topic modeling model explanation |
| title | Dark Side of the Web: Dark Web Classification Based on TextCNN and Topic Modeling Weight |
| title_full | Dark Side of the Web: Dark Web Classification Based on TextCNN and Topic Modeling Weight |
| title_fullStr | Dark Side of the Web: Dark Web Classification Based on TextCNN and Topic Modeling Weight |
| title_full_unstemmed | Dark Side of the Web: Dark Web Classification Based on TextCNN and Topic Modeling Weight |
| title_short | Dark Side of the Web: Dark Web Classification Based on TextCNN and Topic Modeling Weight |
| title_sort | dark side of the web dark web classification based on textcnn and topic modeling weight |
| topic | Dark web dark web analysis text classification topic modeling model explanation |
| url | https://ieeexplore.ieee.org/document/10375503/ |
| work_keys_str_mv | AT gunyoonshin darksideofthewebdarkwebclassificationbasedontextcnnandtopicmodelingweight AT younghoanjang darksideofthewebdarkwebclassificationbasedontextcnnandtopicmodelingweight AT dongwookkim darksideofthewebdarkwebclassificationbasedontextcnnandtopicmodelingweight AT sungjinpark darksideofthewebdarkwebclassificationbasedontextcnnandtopicmodelingweight AT aranpark darksideofthewebdarkwebclassificationbasedontextcnnandtopicmodelingweight AT younghwankim darksideofthewebdarkwebclassificationbasedontextcnnandtopicmodelingweight AT myungmookhan darksideofthewebdarkwebclassificationbasedontextcnnandtopicmodelingweight |