Accelerating Streaming Subgraph Matching via Vector Databases

Graphs are widely used in applications such as social network analysis, bioinformatics, and recommendation systems to represent relationships and complex dependencies. Subgraph matching, which involves finding instances of a query subgraph within a larger graph, is crucial for tasks such as fraud de...

Full description

Saved in:

Bibliographic Details
Main Authors:	Liuyi Chen, Yi Ding, Xushuo Tang, Fangyue Chen, Siyuan Gong, Xu Zhou, Zhengyi Yang
Format:	Article
Language:	English
Published:	American Association for the Advancement of Science (AAAS) 2025-01-01
Series:	Intelligent Computing
Online Access:	https://spj.science.org/doi/10.34133/icomputing.0131
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849705705586556928
author	Liuyi Chen Yi Ding Xushuo Tang Fangyue Chen Siyuan Gong Xu Zhou Zhengyi Yang
author_facet	Liuyi Chen Yi Ding Xushuo Tang Fangyue Chen Siyuan Gong Xu Zhou Zhengyi Yang
author_sort	Liuyi Chen
collection	DOAJ
description	Graphs are widely used in applications such as social network analysis, bioinformatics, and recommendation systems to represent relationships and complex dependencies. Subgraph matching, which involves finding instances of a query subgraph within a larger graph, is crucial for tasks such as fraud detection, pattern recognition, and semantic search. The streaming subgraph matching problem, an extension of this task, aims to efficiently process queries in a stream with minimal latency. This is particularly important in real-time applications such as dynamic monitoring and network anomaly detection, where quick query responses are essential. To address streaming subgraph matching, existing methods incorporate precomputed indices, such as tree structures. However, these approaches often fail to scale efficiently under high query arrival rates or for large graphs due to limitations in caching, query reuse, and indexing performance. In this paper, we adopt a framework that leverages a subgraph index based on graph embeddings, enabling effective caching and reuse of query results. Building on this foundation, we perform k-nearest neighbor search on high-dimensional vectors by using a vector database for indexing. Inverted file and product quantization techniques within the vector database were employed to accelerate the process. Experimental evaluations on 16 diverse real-world datasets show that our approach reduces processing time by an average of 87.7% compared to the state-of-the-art method, achieves cache hit rates ranging from 70% to 90%, and demonstrates robustness and consistency across varying batch sizes and datasets.
format	Article
id	doaj-art-bed58f34a05941bf857c9bfcdecde0a7
institution	DOAJ
issn	2771-5892
language	English
publishDate	2025-01-01
publisher	American Association for the Advancement of Science (AAAS)
record_format	Article
series	Intelligent Computing
spelling	doaj-art-bed58f34a05941bf857c9bfcdecde0a72025-08-20T03:16:24ZengAmerican Association for the Advancement of Science (AAAS)Intelligent Computing2771-58922025-01-01410.34133/icomputing.0131Accelerating Streaming Subgraph Matching via Vector DatabasesLiuyi Chen0Yi Ding1Xushuo Tang2Fangyue Chen3Siyuan Gong4Xu Zhou5Zhengyi Yang6College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.Euler AI, Sydney, Australia.Euler AI, Sydney, Australia.College of Computer Science and Engineering, The University of New South Wales, Sydney, Australia.College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.College of Computer Science and Engineering, The University of New South Wales, Sydney, Australia.Graphs are widely used in applications such as social network analysis, bioinformatics, and recommendation systems to represent relationships and complex dependencies. Subgraph matching, which involves finding instances of a query subgraph within a larger graph, is crucial for tasks such as fraud detection, pattern recognition, and semantic search. The streaming subgraph matching problem, an extension of this task, aims to efficiently process queries in a stream with minimal latency. This is particularly important in real-time applications such as dynamic monitoring and network anomaly detection, where quick query responses are essential. To address streaming subgraph matching, existing methods incorporate precomputed indices, such as tree structures. However, these approaches often fail to scale efficiently under high query arrival rates or for large graphs due to limitations in caching, query reuse, and indexing performance. In this paper, we adopt a framework that leverages a subgraph index based on graph embeddings, enabling effective caching and reuse of query results. Building on this foundation, we perform k-nearest neighbor search on high-dimensional vectors by using a vector database for indexing. Inverted file and product quantization techniques within the vector database were employed to accelerate the process. Experimental evaluations on 16 diverse real-world datasets show that our approach reduces processing time by an average of 87.7% compared to the state-of-the-art method, achieves cache hit rates ranging from 70% to 90%, and demonstrates robustness and consistency across varying batch sizes and datasets.https://spj.science.org/doi/10.34133/icomputing.0131
spellingShingle	Liuyi Chen Yi Ding Xushuo Tang Fangyue Chen Siyuan Gong Xu Zhou Zhengyi Yang Accelerating Streaming Subgraph Matching via Vector Databases Intelligent Computing
title	Accelerating Streaming Subgraph Matching via Vector Databases
title_full	Accelerating Streaming Subgraph Matching via Vector Databases
title_fullStr	Accelerating Streaming Subgraph Matching via Vector Databases
title_full_unstemmed	Accelerating Streaming Subgraph Matching via Vector Databases
title_short	Accelerating Streaming Subgraph Matching via Vector Databases
title_sort	accelerating streaming subgraph matching via vector databases
url	https://spj.science.org/doi/10.34133/icomputing.0131
work_keys_str_mv	AT liuyichen acceleratingstreamingsubgraphmatchingviavectordatabases AT yiding acceleratingstreamingsubgraphmatchingviavectordatabases AT xushuotang acceleratingstreamingsubgraphmatchingviavectordatabases AT fangyuechen acceleratingstreamingsubgraphmatchingviavectordatabases AT siyuangong acceleratingstreamingsubgraphmatchingviavectordatabases AT xuzhou acceleratingstreamingsubgraphmatchingviavectordatabases AT zhengyiyang acceleratingstreamingsubgraphmatchingviavectordatabases

Accelerating Streaming Subgraph Matching via Vector Databases

Similar Items