Efficient and Scalable Graph Similarity Joins in MapReduce

Along with the emergence of massive graph-modeled data, it is of great importance to investigate graph similarity joins due to their wide applications for multiple purposes, including data cleaning, and near duplicate detection. This paper considers graph similarity joins with edit distance constrai...

Full description

Saved in:
Bibliographic Details
Main Authors: Yifan Chen, Xiang Zhao, Chuan Xiao, Weiming Zhang, Jiuyang Tang
Format: Article
Language:English
Published: Wiley 2014-01-01
Series:The Scientific World Journal
Online Access:http://dx.doi.org/10.1155/2014/749028
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832551903332925440
author Yifan Chen
Xiang Zhao
Chuan Xiao
Weiming Zhang
Jiuyang Tang
author_facet Yifan Chen
Xiang Zhao
Chuan Xiao
Weiming Zhang
Jiuyang Tang
author_sort Yifan Chen
collection DOAJ
description Along with the emergence of massive graph-modeled data, it is of great importance to investigate graph similarity joins due to their wide applications for multiple purposes, including data cleaning, and near duplicate detection. This paper considers graph similarity joins with edit distance constraints, which return pairs of graphs such that their edit distances are no larger than a given threshold. Leveraging the MapReduce programming model, we propose MGSJoin, a scalable algorithm following the filtering-verification framework for efficient graph similarity joins. It relies on counting overlapping graph signatures for filtering out nonpromising candidates. With the potential issue of too many key-value pairs in the filtering phase, spectral Bloom filters are introduced to reduce the number of key-value pairs. Furthermore, we integrate the multiway join strategy to boost the verification, where a MapReduce-based method is proposed for GED calculation. The superior efficiency and scalability of the proposed algorithms are demonstrated by extensive experimental results.
format Article
id doaj-art-c4df1849516f427d973c4a13fc363ffb
institution Kabale University
issn 2356-6140
1537-744X
language English
publishDate 2014-01-01
publisher Wiley
record_format Article
series The Scientific World Journal
spelling doaj-art-c4df1849516f427d973c4a13fc363ffb2025-02-03T06:00:07ZengWileyThe Scientific World Journal2356-61401537-744X2014-01-01201410.1155/2014/749028749028Efficient and Scalable Graph Similarity Joins in MapReduceYifan Chen0Xiang Zhao1Chuan Xiao2Weiming Zhang3Jiuyang Tang4College of Information System and Management, National University of Defense Technology, Changsha 410073, ChinaCollege of Information System and Management, National University of Defense Technology, Changsha 410073, ChinaNagoya University, Nagoya, JapanCollege of Information System and Management, National University of Defense Technology, Changsha 410073, ChinaCollege of Information System and Management, National University of Defense Technology, Changsha 410073, ChinaAlong with the emergence of massive graph-modeled data, it is of great importance to investigate graph similarity joins due to their wide applications for multiple purposes, including data cleaning, and near duplicate detection. This paper considers graph similarity joins with edit distance constraints, which return pairs of graphs such that their edit distances are no larger than a given threshold. Leveraging the MapReduce programming model, we propose MGSJoin, a scalable algorithm following the filtering-verification framework for efficient graph similarity joins. It relies on counting overlapping graph signatures for filtering out nonpromising candidates. With the potential issue of too many key-value pairs in the filtering phase, spectral Bloom filters are introduced to reduce the number of key-value pairs. Furthermore, we integrate the multiway join strategy to boost the verification, where a MapReduce-based method is proposed for GED calculation. The superior efficiency and scalability of the proposed algorithms are demonstrated by extensive experimental results.http://dx.doi.org/10.1155/2014/749028
spellingShingle Yifan Chen
Xiang Zhao
Chuan Xiao
Weiming Zhang
Jiuyang Tang
Efficient and Scalable Graph Similarity Joins in MapReduce
The Scientific World Journal
title Efficient and Scalable Graph Similarity Joins in MapReduce
title_full Efficient and Scalable Graph Similarity Joins in MapReduce
title_fullStr Efficient and Scalable Graph Similarity Joins in MapReduce
title_full_unstemmed Efficient and Scalable Graph Similarity Joins in MapReduce
title_short Efficient and Scalable Graph Similarity Joins in MapReduce
title_sort efficient and scalable graph similarity joins in mapreduce
url http://dx.doi.org/10.1155/2014/749028
work_keys_str_mv AT yifanchen efficientandscalablegraphsimilarityjoinsinmapreduce
AT xiangzhao efficientandscalablegraphsimilarityjoinsinmapreduce
AT chuanxiao efficientandscalablegraphsimilarityjoinsinmapreduce
AT weimingzhang efficientandscalablegraphsimilarityjoinsinmapreduce
AT jiuyangtang efficientandscalablegraphsimilarityjoinsinmapreduce