Efficient and Scalable Graph Similarity Joins in MapReduce
Along with the emergence of massive graph-modeled data, it is of great importance to investigate graph similarity joins due to their wide applications for multiple purposes, including data cleaning, and near duplicate detection. This paper considers graph similarity joins with edit distance constrai...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2014-01-01
|
Series: | The Scientific World Journal |
Online Access: | http://dx.doi.org/10.1155/2014/749028 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832551903332925440 |
---|---|
author | Yifan Chen Xiang Zhao Chuan Xiao Weiming Zhang Jiuyang Tang |
author_facet | Yifan Chen Xiang Zhao Chuan Xiao Weiming Zhang Jiuyang Tang |
author_sort | Yifan Chen |
collection | DOAJ |
description | Along with the emergence of massive graph-modeled data, it is of great importance to investigate graph similarity joins due to their wide applications for multiple purposes, including data cleaning, and near duplicate detection. This paper considers graph similarity joins with edit distance constraints, which return pairs of graphs such that their edit distances are no larger than a given threshold. Leveraging the MapReduce programming model, we propose MGSJoin, a scalable algorithm following the filtering-verification framework for efficient graph similarity joins. It relies on counting overlapping graph signatures for filtering out nonpromising candidates. With the potential issue of too many key-value pairs in the filtering phase, spectral Bloom filters are introduced to reduce the number of key-value pairs. Furthermore, we integrate the multiway join strategy to boost the verification, where a MapReduce-based method is proposed for GED calculation. The superior efficiency and scalability of the proposed algorithms are demonstrated by extensive experimental results. |
format | Article |
id | doaj-art-c4df1849516f427d973c4a13fc363ffb |
institution | Kabale University |
issn | 2356-6140 1537-744X |
language | English |
publishDate | 2014-01-01 |
publisher | Wiley |
record_format | Article |
series | The Scientific World Journal |
spelling | doaj-art-c4df1849516f427d973c4a13fc363ffb2025-02-03T06:00:07ZengWileyThe Scientific World Journal2356-61401537-744X2014-01-01201410.1155/2014/749028749028Efficient and Scalable Graph Similarity Joins in MapReduceYifan Chen0Xiang Zhao1Chuan Xiao2Weiming Zhang3Jiuyang Tang4College of Information System and Management, National University of Defense Technology, Changsha 410073, ChinaCollege of Information System and Management, National University of Defense Technology, Changsha 410073, ChinaNagoya University, Nagoya, JapanCollege of Information System and Management, National University of Defense Technology, Changsha 410073, ChinaCollege of Information System and Management, National University of Defense Technology, Changsha 410073, ChinaAlong with the emergence of massive graph-modeled data, it is of great importance to investigate graph similarity joins due to their wide applications for multiple purposes, including data cleaning, and near duplicate detection. This paper considers graph similarity joins with edit distance constraints, which return pairs of graphs such that their edit distances are no larger than a given threshold. Leveraging the MapReduce programming model, we propose MGSJoin, a scalable algorithm following the filtering-verification framework for efficient graph similarity joins. It relies on counting overlapping graph signatures for filtering out nonpromising candidates. With the potential issue of too many key-value pairs in the filtering phase, spectral Bloom filters are introduced to reduce the number of key-value pairs. Furthermore, we integrate the multiway join strategy to boost the verification, where a MapReduce-based method is proposed for GED calculation. The superior efficiency and scalability of the proposed algorithms are demonstrated by extensive experimental results.http://dx.doi.org/10.1155/2014/749028 |
spellingShingle | Yifan Chen Xiang Zhao Chuan Xiao Weiming Zhang Jiuyang Tang Efficient and Scalable Graph Similarity Joins in MapReduce The Scientific World Journal |
title | Efficient and Scalable Graph Similarity Joins in MapReduce |
title_full | Efficient and Scalable Graph Similarity Joins in MapReduce |
title_fullStr | Efficient and Scalable Graph Similarity Joins in MapReduce |
title_full_unstemmed | Efficient and Scalable Graph Similarity Joins in MapReduce |
title_short | Efficient and Scalable Graph Similarity Joins in MapReduce |
title_sort | efficient and scalable graph similarity joins in mapreduce |
url | http://dx.doi.org/10.1155/2014/749028 |
work_keys_str_mv | AT yifanchen efficientandscalablegraphsimilarityjoinsinmapreduce AT xiangzhao efficientandscalablegraphsimilarityjoinsinmapreduce AT chuanxiao efficientandscalablegraphsimilarityjoinsinmapreduce AT weimingzhang efficientandscalablegraphsimilarityjoinsinmapreduce AT jiuyangtang efficientandscalablegraphsimilarityjoinsinmapreduce |