Accelerating Streaming Subgraph Matching via Vector Databases

Graphs are widely used in applications such as social network analysis, bioinformatics, and recommendation systems to represent relationships and complex dependencies. Subgraph matching, which involves finding instances of a query subgraph within a larger graph, is crucial for tasks such as fraud de...

Full description

Saved in:
Bibliographic Details
Main Authors: Liuyi Chen, Yi Ding, Xushuo Tang, Fangyue Chen, Siyuan Gong, Xu Zhou, Zhengyi Yang
Format: Article
Language:English
Published: American Association for the Advancement of Science (AAAS) 2025-01-01
Series:Intelligent Computing
Online Access:https://spj.science.org/doi/10.34133/icomputing.0131
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Graphs are widely used in applications such as social network analysis, bioinformatics, and recommendation systems to represent relationships and complex dependencies. Subgraph matching, which involves finding instances of a query subgraph within a larger graph, is crucial for tasks such as fraud detection, pattern recognition, and semantic search. The streaming subgraph matching problem, an extension of this task, aims to efficiently process queries in a stream with minimal latency. This is particularly important in real-time applications such as dynamic monitoring and network anomaly detection, where quick query responses are essential. To address streaming subgraph matching, existing methods incorporate precomputed indices, such as tree structures. However, these approaches often fail to scale efficiently under high query arrival rates or for large graphs due to limitations in caching, query reuse, and indexing performance. In this paper, we adopt a framework that leverages a subgraph index based on graph embeddings, enabling effective caching and reuse of query results. Building on this foundation, we perform k-nearest neighbor search on high-dimensional vectors by using a vector database for indexing. Inverted file and product quantization techniques within the vector database were employed to accelerate the process. Experimental evaluations on 16 diverse real-world datasets show that our approach reduces processing time by an average of 87.7% compared to the state-of-the-art method, achieves cache hit rates ranging from 70% to 90%, and demonstrates robustness and consistency across varying batch sizes and datasets.
ISSN:2771-5892