High Performance Frequent Subgraph Mining on Transaction Datasets: A Survey and Performance Comparison

Graph data mining has been a crucial as well as inevitable area of research. Large amounts of graph data are produced in many areas, such as Bioinformatics, Cheminformatics, Social Networks, etc. Scalable graph data mining methods are getting increasingly popular and necessary due to increased graph...

Full description

Saved in:
Bibliographic Details
Main Authors: Bismita S. Jena, Cynthia Khan, Rajshekhar Sunderraman
Format: Article
Language:English
Published: Tsinghua University Press 2019-09-01
Series:Big Data Mining and Analytics
Subjects:
Online Access:https://www.sciopen.com/article/10.26599/BDMA.2019.9020006
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832568910323384320
author Bismita S. Jena
Cynthia Khan
Rajshekhar Sunderraman
author_facet Bismita S. Jena
Cynthia Khan
Rajshekhar Sunderraman
author_sort Bismita S. Jena
collection DOAJ
description Graph data mining has been a crucial as well as inevitable area of research. Large amounts of graph data are produced in many areas, such as Bioinformatics, Cheminformatics, Social Networks, etc. Scalable graph data mining methods are getting increasingly popular and necessary due to increased graph complexities. Frequent subgraph mining is one such area where the task is to find overly recurring patterns/subgraphs. To tackle this problem, many main memory-based methods were proposed, which proved to be inefficient as the data size grew exponentially over time. In the past few years, several research groups have attempted to handle the Frequent Subgraph Mining (FSM) problem in multiple ways. Many authors have tried to achieve better performance using Graphic Processing Units (GPUs) which has multi-fold improvement over in-memory while dealing with large datasets. Later, Google’s MapReduce model with the Hadoop framework proved to be a major breakthrough in high performance large batch processing. Although MapReduce came with many benefits, its disk I/O and non-iterative style model could not help much for FSM domain since subgraph mining process is an iterative approach. In recent years, Spark has emerged to be the De Facto industry standard with its distributed in-memory computing capability. This is a right fit solution for iterative style of programming as well. In this survey, we cover how high-performance computing has helped in improving the performance tremendously in the transactional directed and undirected aspect of graphs and performance comparisons of various FSM techniques are done based on experimental results.
format Article
id doaj-art-5fc6b66827c744a78b0c6ce79b7cdf5f
institution Kabale University
issn 2096-0654
language English
publishDate 2019-09-01
publisher Tsinghua University Press
record_format Article
series Big Data Mining and Analytics
spelling doaj-art-5fc6b66827c744a78b0c6ce79b7cdf5f2025-02-02T23:47:57ZengTsinghua University PressBig Data Mining and Analytics2096-06542019-09-012315918010.26599/BDMA.2019.9020006High Performance Frequent Subgraph Mining on Transaction Datasets: A Survey and Performance ComparisonBismita S. Jena0Cynthia Khan1Rajshekhar Sunderraman2<institution content-type="dept">Department of Computer Science</institution>, <institution>Georgia State University</institution>, <city>Atlanta</city>, <state>GA</state> <postal-code>30302</postal-code>, <country>USA</country>.<institution content-type="dept">Department of Computer Science</institution>, <institution>Georgia State University</institution>, <city>Atlanta</city>, <state>GA</state> <postal-code>30302</postal-code>, <country>USA</country>.<institution content-type="dept">Department of Computer Science</institution>, <institution>Georgia State University</institution>, <city>Atlanta</city>, <state>GA</state> <postal-code>30302</postal-code>, <country>USA</country>.Graph data mining has been a crucial as well as inevitable area of research. Large amounts of graph data are produced in many areas, such as Bioinformatics, Cheminformatics, Social Networks, etc. Scalable graph data mining methods are getting increasingly popular and necessary due to increased graph complexities. Frequent subgraph mining is one such area where the task is to find overly recurring patterns/subgraphs. To tackle this problem, many main memory-based methods were proposed, which proved to be inefficient as the data size grew exponentially over time. In the past few years, several research groups have attempted to handle the Frequent Subgraph Mining (FSM) problem in multiple ways. Many authors have tried to achieve better performance using Graphic Processing Units (GPUs) which has multi-fold improvement over in-memory while dealing with large datasets. Later, Google’s MapReduce model with the Hadoop framework proved to be a major breakthrough in high performance large batch processing. Although MapReduce came with many benefits, its disk I/O and non-iterative style model could not help much for FSM domain since subgraph mining process is an iterative approach. In recent years, Spark has emerged to be the De Facto industry standard with its distributed in-memory computing capability. This is a right fit solution for iterative style of programming as well. In this survey, we cover how high-performance computing has helped in improving the performance tremendously in the transactional directed and undirected aspect of graphs and performance comparisons of various FSM techniques are done based on experimental results.https://www.sciopen.com/article/10.26599/BDMA.2019.9020006frequent subgraphsisomorphismspark
spellingShingle Bismita S. Jena
Cynthia Khan
Rajshekhar Sunderraman
High Performance Frequent Subgraph Mining on Transaction Datasets: A Survey and Performance Comparison
Big Data Mining and Analytics
frequent subgraphs
isomorphism
spark
title High Performance Frequent Subgraph Mining on Transaction Datasets: A Survey and Performance Comparison
title_full High Performance Frequent Subgraph Mining on Transaction Datasets: A Survey and Performance Comparison
title_fullStr High Performance Frequent Subgraph Mining on Transaction Datasets: A Survey and Performance Comparison
title_full_unstemmed High Performance Frequent Subgraph Mining on Transaction Datasets: A Survey and Performance Comparison
title_short High Performance Frequent Subgraph Mining on Transaction Datasets: A Survey and Performance Comparison
title_sort high performance frequent subgraph mining on transaction datasets a survey and performance comparison
topic frequent subgraphs
isomorphism
spark
url https://www.sciopen.com/article/10.26599/BDMA.2019.9020006
work_keys_str_mv AT bismitasjena highperformancefrequentsubgraphminingontransactiondatasetsasurveyandperformancecomparison
AT cynthiakhan highperformancefrequentsubgraphminingontransactiondatasetsasurveyandperformancecomparison
AT rajshekharsunderraman highperformancefrequentsubgraphminingontransactiondatasetsasurveyandperformancecomparison