Fingerprint-Based Near-Duplicate Document Detection with Applications to SNS Spam Detection

Social networking has been used widely by millions of people over the world. It has become the most popular way for people who want to connect and interact online with their friends. Currently, there are many social networking sites, for instance, Facebook, My Space, and Twitter, with a huge number...

Full description

Saved in:
Bibliographic Details
Main Authors: Phuc-Tran Ho, Sung-Ryul Kim
Format: Article
Language:English
Published: Wiley 2014-05-01
Series:International Journal of Distributed Sensor Networks
Online Access:https://doi.org/10.1155/2014/612970
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849704320785711104
author Phuc-Tran Ho
Sung-Ryul Kim
author_facet Phuc-Tran Ho
Sung-Ryul Kim
author_sort Phuc-Tran Ho
collection DOAJ
description Social networking has been used widely by millions of people over the world. It has become the most popular way for people who want to connect and interact online with their friends. Currently, there are many social networking sites, for instance, Facebook, My Space, and Twitter, with a huge number of active users. Therefore, they are also good places for spammers or cheaters who want to steal the personal information of users or advertise their products. Recently, many proposed methods are applied to detect spam comments on social networks with different techniques. In this paper, we propose a similarity-based method that combines fingerprinting technique with trie-tree data structure and meet-in-the-middle approach in order to achieve a higher accuracy in spam comments detection. Using our proposed approach, we are able to detect around 98% spam comments in our dataset.
format Article
id doaj-art-04a8e8d83ed54c06a437f898facd2bff
institution DOAJ
issn 1550-1477
language English
publishDate 2014-05-01
publisher Wiley
record_format Article
series International Journal of Distributed Sensor Networks
spelling doaj-art-04a8e8d83ed54c06a437f898facd2bff2025-08-20T03:16:47ZengWileyInternational Journal of Distributed Sensor Networks1550-14772014-05-011010.1155/2014/612970612970Fingerprint-Based Near-Duplicate Document Detection with Applications to SNS Spam DetectionPhuc-Tran Ho0Sung-Ryul Kim1 Department of Advanced Technology Fusion, Konkuk University, Seoul 143-701, Republic of Korea Department of Internet & Multimedia Engineering, Konkuk University, Seoul 143-701, Republic of KoreaSocial networking has been used widely by millions of people over the world. It has become the most popular way for people who want to connect and interact online with their friends. Currently, there are many social networking sites, for instance, Facebook, My Space, and Twitter, with a huge number of active users. Therefore, they are also good places for spammers or cheaters who want to steal the personal information of users or advertise their products. Recently, many proposed methods are applied to detect spam comments on social networks with different techniques. In this paper, we propose a similarity-based method that combines fingerprinting technique with trie-tree data structure and meet-in-the-middle approach in order to achieve a higher accuracy in spam comments detection. Using our proposed approach, we are able to detect around 98% spam comments in our dataset.https://doi.org/10.1155/2014/612970
spellingShingle Phuc-Tran Ho
Sung-Ryul Kim
Fingerprint-Based Near-Duplicate Document Detection with Applications to SNS Spam Detection
International Journal of Distributed Sensor Networks
title Fingerprint-Based Near-Duplicate Document Detection with Applications to SNS Spam Detection
title_full Fingerprint-Based Near-Duplicate Document Detection with Applications to SNS Spam Detection
title_fullStr Fingerprint-Based Near-Duplicate Document Detection with Applications to SNS Spam Detection
title_full_unstemmed Fingerprint-Based Near-Duplicate Document Detection with Applications to SNS Spam Detection
title_short Fingerprint-Based Near-Duplicate Document Detection with Applications to SNS Spam Detection
title_sort fingerprint based near duplicate document detection with applications to sns spam detection
url https://doi.org/10.1155/2014/612970
work_keys_str_mv AT phuctranho fingerprintbasednearduplicatedocumentdetectionwithapplicationstosnsspamdetection
AT sungryulkim fingerprintbasednearduplicatedocumentdetectionwithapplicationstosnsspamdetection