Accelerating Streaming Subgraph Matching via Vector Databases

Graphs are widely used in applications such as social network analysis, bioinformatics, and recommendation systems to represent relationships and complex dependencies. Subgraph matching, which involves finding instances of a query subgraph within a larger graph, is crucial for tasks such as fraud de...

Full description

Saved in:
Bibliographic Details
Main Authors: Liuyi Chen, Yi Ding, Xushuo Tang, Fangyue Chen, Siyuan Gong, Xu Zhou, Zhengyi Yang
Format: Article
Language:English
Published: American Association for the Advancement of Science (AAAS) 2025-01-01
Series:Intelligent Computing
Online Access:https://spj.science.org/doi/10.34133/icomputing.0131
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849705705586556928
author Liuyi Chen
Yi Ding
Xushuo Tang
Fangyue Chen
Siyuan Gong
Xu Zhou
Zhengyi Yang
author_facet Liuyi Chen
Yi Ding
Xushuo Tang
Fangyue Chen
Siyuan Gong
Xu Zhou
Zhengyi Yang
author_sort Liuyi Chen
collection DOAJ
description Graphs are widely used in applications such as social network analysis, bioinformatics, and recommendation systems to represent relationships and complex dependencies. Subgraph matching, which involves finding instances of a query subgraph within a larger graph, is crucial for tasks such as fraud detection, pattern recognition, and semantic search. The streaming subgraph matching problem, an extension of this task, aims to efficiently process queries in a stream with minimal latency. This is particularly important in real-time applications such as dynamic monitoring and network anomaly detection, where quick query responses are essential. To address streaming subgraph matching, existing methods incorporate precomputed indices, such as tree structures. However, these approaches often fail to scale efficiently under high query arrival rates or for large graphs due to limitations in caching, query reuse, and indexing performance. In this paper, we adopt a framework that leverages a subgraph index based on graph embeddings, enabling effective caching and reuse of query results. Building on this foundation, we perform k-nearest neighbor search on high-dimensional vectors by using a vector database for indexing. Inverted file and product quantization techniques within the vector database were employed to accelerate the process. Experimental evaluations on 16 diverse real-world datasets show that our approach reduces processing time by an average of 87.7% compared to the state-of-the-art method, achieves cache hit rates ranging from 70% to 90%, and demonstrates robustness and consistency across varying batch sizes and datasets.
format Article
id doaj-art-bed58f34a05941bf857c9bfcdecde0a7
institution DOAJ
issn 2771-5892
language English
publishDate 2025-01-01
publisher American Association for the Advancement of Science (AAAS)
record_format Article
series Intelligent Computing
spelling doaj-art-bed58f34a05941bf857c9bfcdecde0a72025-08-20T03:16:24ZengAmerican Association for the Advancement of Science (AAAS)Intelligent Computing2771-58922025-01-01410.34133/icomputing.0131Accelerating Streaming Subgraph Matching via Vector DatabasesLiuyi Chen0Yi Ding1Xushuo Tang2Fangyue Chen3Siyuan Gong4Xu Zhou5Zhengyi Yang6College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.Euler AI, Sydney, Australia.Euler AI, Sydney, Australia.College of Computer Science and Engineering, The University of New South Wales, Sydney, Australia.College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.College of Computer Science and Engineering, The University of New South Wales, Sydney, Australia.Graphs are widely used in applications such as social network analysis, bioinformatics, and recommendation systems to represent relationships and complex dependencies. Subgraph matching, which involves finding instances of a query subgraph within a larger graph, is crucial for tasks such as fraud detection, pattern recognition, and semantic search. The streaming subgraph matching problem, an extension of this task, aims to efficiently process queries in a stream with minimal latency. This is particularly important in real-time applications such as dynamic monitoring and network anomaly detection, where quick query responses are essential. To address streaming subgraph matching, existing methods incorporate precomputed indices, such as tree structures. However, these approaches often fail to scale efficiently under high query arrival rates or for large graphs due to limitations in caching, query reuse, and indexing performance. In this paper, we adopt a framework that leverages a subgraph index based on graph embeddings, enabling effective caching and reuse of query results. Building on this foundation, we perform k-nearest neighbor search on high-dimensional vectors by using a vector database for indexing. Inverted file and product quantization techniques within the vector database were employed to accelerate the process. Experimental evaluations on 16 diverse real-world datasets show that our approach reduces processing time by an average of 87.7% compared to the state-of-the-art method, achieves cache hit rates ranging from 70% to 90%, and demonstrates robustness and consistency across varying batch sizes and datasets.https://spj.science.org/doi/10.34133/icomputing.0131
spellingShingle Liuyi Chen
Yi Ding
Xushuo Tang
Fangyue Chen
Siyuan Gong
Xu Zhou
Zhengyi Yang
Accelerating Streaming Subgraph Matching via Vector Databases
Intelligent Computing
title Accelerating Streaming Subgraph Matching via Vector Databases
title_full Accelerating Streaming Subgraph Matching via Vector Databases
title_fullStr Accelerating Streaming Subgraph Matching via Vector Databases
title_full_unstemmed Accelerating Streaming Subgraph Matching via Vector Databases
title_short Accelerating Streaming Subgraph Matching via Vector Databases
title_sort accelerating streaming subgraph matching via vector databases
url https://spj.science.org/doi/10.34133/icomputing.0131
work_keys_str_mv AT liuyichen acceleratingstreamingsubgraphmatchingviavectordatabases
AT yiding acceleratingstreamingsubgraphmatchingviavectordatabases
AT xushuotang acceleratingstreamingsubgraphmatchingviavectordatabases
AT fangyuechen acceleratingstreamingsubgraphmatchingviavectordatabases
AT siyuangong acceleratingstreamingsubgraphmatchingviavectordatabases
AT xuzhou acceleratingstreamingsubgraphmatchingviavectordatabases
AT zhengyiyang acceleratingstreamingsubgraphmatchingviavectordatabases