High Performance Frequent Subgraph Mining on Transaction Datasets: A Survey and Performance Comparison

Graph data mining has been a crucial as well as inevitable area of research. Large amounts of graph data are produced in many areas, such as Bioinformatics, Cheminformatics, Social Networks, etc. Scalable graph data mining methods are getting increasingly popular and necessary due to increased graph...

Full description

Saved in:

Bibliographic Details
Main Authors:	Bismita S. Jena, Cynthia Khan, Rajshekhar Sunderraman
Format:	Article
Language:	English
Published:	Tsinghua University Press 2019-09-01
Series:	Big Data Mining and Analytics
Subjects:	frequent subgraphs isomorphism spark
Online Access:	https://www.sciopen.com/article/10.26599/BDMA.2019.9020006
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832568910323384320
author	Bismita S. Jena Cynthia Khan Rajshekhar Sunderraman
author_facet	Bismita S. Jena Cynthia Khan Rajshekhar Sunderraman
author_sort	Bismita S. Jena
collection	DOAJ
description	Graph data mining has been a crucial as well as inevitable area of research. Large amounts of graph data are produced in many areas, such as Bioinformatics, Cheminformatics, Social Networks, etc. Scalable graph data mining methods are getting increasingly popular and necessary due to increased graph complexities. Frequent subgraph mining is one such area where the task is to find overly recurring patterns/subgraphs. To tackle this problem, many main memory-based methods were proposed, which proved to be inefficient as the data size grew exponentially over time. In the past few years, several research groups have attempted to handle the Frequent Subgraph Mining (FSM) problem in multiple ways. Many authors have tried to achieve better performance using Graphic Processing Units (GPUs) which has multi-fold improvement over in-memory while dealing with large datasets. Later, Google’s MapReduce model with the Hadoop framework proved to be a major breakthrough in high performance large batch processing. Although MapReduce came with many benefits, its disk I/O and non-iterative style model could not help much for FSM domain since subgraph mining process is an iterative approach. In recent years, Spark has emerged to be the De Facto industry standard with its distributed in-memory computing capability. This is a right fit solution for iterative style of programming as well. In this survey, we cover how high-performance computing has helped in improving the performance tremendously in the transactional directed and undirected aspect of graphs and performance comparisons of various FSM techniques are done based on experimental results.
format	Article
id	doaj-art-5fc6b66827c744a78b0c6ce79b7cdf5f
institution	Kabale University
issn	2096-0654
language	English
publishDate	2019-09-01
publisher	Tsinghua University Press
record_format	Article
series	Big Data Mining and Analytics
spelling	doaj-art-5fc6b66827c744a78b0c6ce79b7cdf5f2025-02-02T23:47:57ZengTsinghua University PressBig Data Mining and Analytics2096-06542019-09-012315918010.26599/BDMA.2019.9020006High Performance Frequent Subgraph Mining on Transaction Datasets: A Survey and Performance ComparisonBismita S. Jena0Cynthia Khan1Rajshekhar Sunderraman2<institution content-type="dept">Department of Computer Science</institution>, <institution>Georgia State University</institution>, <city>Atlanta</city>, <state>GA</state> <postal-code>30302</postal-code>, <country>USA</country>.<institution content-type="dept">Department of Computer Science</institution>, <institution>Georgia State University</institution>, <city>Atlanta</city>, <state>GA</state> <postal-code>30302</postal-code>, <country>USA</country>.<institution content-type="dept">Department of Computer Science</institution>, <institution>Georgia State University</institution>, <city>Atlanta</city>, <state>GA</state> <postal-code>30302</postal-code>, <country>USA</country>.Graph data mining has been a crucial as well as inevitable area of research. Large amounts of graph data are produced in many areas, such as Bioinformatics, Cheminformatics, Social Networks, etc. Scalable graph data mining methods are getting increasingly popular and necessary due to increased graph complexities. Frequent subgraph mining is one such area where the task is to find overly recurring patterns/subgraphs. To tackle this problem, many main memory-based methods were proposed, which proved to be inefficient as the data size grew exponentially over time. In the past few years, several research groups have attempted to handle the Frequent Subgraph Mining (FSM) problem in multiple ways. Many authors have tried to achieve better performance using Graphic Processing Units (GPUs) which has multi-fold improvement over in-memory while dealing with large datasets. Later, Google’s MapReduce model with the Hadoop framework proved to be a major breakthrough in high performance large batch processing. Although MapReduce came with many benefits, its disk I/O and non-iterative style model could not help much for FSM domain since subgraph mining process is an iterative approach. In recent years, Spark has emerged to be the De Facto industry standard with its distributed in-memory computing capability. This is a right fit solution for iterative style of programming as well. In this survey, we cover how high-performance computing has helped in improving the performance tremendously in the transactional directed and undirected aspect of graphs and performance comparisons of various FSM techniques are done based on experimental results.https://www.sciopen.com/article/10.26599/BDMA.2019.9020006frequent subgraphsisomorphismspark
spellingShingle	Bismita S. Jena Cynthia Khan Rajshekhar Sunderraman High Performance Frequent Subgraph Mining on Transaction Datasets: A Survey and Performance Comparison Big Data Mining and Analytics frequent subgraphs isomorphism spark
title	High Performance Frequent Subgraph Mining on Transaction Datasets: A Survey and Performance Comparison
title_full	High Performance Frequent Subgraph Mining on Transaction Datasets: A Survey and Performance Comparison
title_fullStr	High Performance Frequent Subgraph Mining on Transaction Datasets: A Survey and Performance Comparison
title_full_unstemmed	High Performance Frequent Subgraph Mining on Transaction Datasets: A Survey and Performance Comparison
title_short	High Performance Frequent Subgraph Mining on Transaction Datasets: A Survey and Performance Comparison
title_sort	high performance frequent subgraph mining on transaction datasets a survey and performance comparison
topic	frequent subgraphs isomorphism spark
url	https://www.sciopen.com/article/10.26599/BDMA.2019.9020006
work_keys_str_mv	AT bismitasjena highperformancefrequentsubgraphminingontransactiondatasetsasurveyandperformancecomparison AT cynthiakhan highperformancefrequentsubgraphminingontransactiondatasetsasurveyandperformancecomparison AT rajshekharsunderraman highperformancefrequentsubgraphminingontransactiondatasetsasurveyandperformancecomparison

High Performance Frequent Subgraph Mining on Transaction Datasets: A Survey and Performance Comparison

Similar Items