Accelerating Streaming Subgraph Matching via Vector Databases
Graphs are widely used in applications such as social network analysis, bioinformatics, and recommendation systems to represent relationships and complex dependencies. Subgraph matching, which involves finding instances of a query subgraph within a larger graph, is crucial for tasks such as fraud de...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
American Association for the Advancement of Science (AAAS)
2025-01-01
|
| Series: | Intelligent Computing |
| Online Access: | https://spj.science.org/doi/10.34133/icomputing.0131 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849705705586556928 |
|---|---|
| author | Liuyi Chen Yi Ding Xushuo Tang Fangyue Chen Siyuan Gong Xu Zhou Zhengyi Yang |
| author_facet | Liuyi Chen Yi Ding Xushuo Tang Fangyue Chen Siyuan Gong Xu Zhou Zhengyi Yang |
| author_sort | Liuyi Chen |
| collection | DOAJ |
| description | Graphs are widely used in applications such as social network analysis, bioinformatics, and recommendation systems to represent relationships and complex dependencies. Subgraph matching, which involves finding instances of a query subgraph within a larger graph, is crucial for tasks such as fraud detection, pattern recognition, and semantic search. The streaming subgraph matching problem, an extension of this task, aims to efficiently process queries in a stream with minimal latency. This is particularly important in real-time applications such as dynamic monitoring and network anomaly detection, where quick query responses are essential. To address streaming subgraph matching, existing methods incorporate precomputed indices, such as tree structures. However, these approaches often fail to scale efficiently under high query arrival rates or for large graphs due to limitations in caching, query reuse, and indexing performance. In this paper, we adopt a framework that leverages a subgraph index based on graph embeddings, enabling effective caching and reuse of query results. Building on this foundation, we perform k-nearest neighbor search on high-dimensional vectors by using a vector database for indexing. Inverted file and product quantization techniques within the vector database were employed to accelerate the process. Experimental evaluations on 16 diverse real-world datasets show that our approach reduces processing time by an average of 87.7% compared to the state-of-the-art method, achieves cache hit rates ranging from 70% to 90%, and demonstrates robustness and consistency across varying batch sizes and datasets. |
| format | Article |
| id | doaj-art-bed58f34a05941bf857c9bfcdecde0a7 |
| institution | DOAJ |
| issn | 2771-5892 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | American Association for the Advancement of Science (AAAS) |
| record_format | Article |
| series | Intelligent Computing |
| spelling | doaj-art-bed58f34a05941bf857c9bfcdecde0a72025-08-20T03:16:24ZengAmerican Association for the Advancement of Science (AAAS)Intelligent Computing2771-58922025-01-01410.34133/icomputing.0131Accelerating Streaming Subgraph Matching via Vector DatabasesLiuyi Chen0Yi Ding1Xushuo Tang2Fangyue Chen3Siyuan Gong4Xu Zhou5Zhengyi Yang6College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.Euler AI, Sydney, Australia.Euler AI, Sydney, Australia.College of Computer Science and Engineering, The University of New South Wales, Sydney, Australia.College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.College of Computer Science and Engineering, The University of New South Wales, Sydney, Australia.Graphs are widely used in applications such as social network analysis, bioinformatics, and recommendation systems to represent relationships and complex dependencies. Subgraph matching, which involves finding instances of a query subgraph within a larger graph, is crucial for tasks such as fraud detection, pattern recognition, and semantic search. The streaming subgraph matching problem, an extension of this task, aims to efficiently process queries in a stream with minimal latency. This is particularly important in real-time applications such as dynamic monitoring and network anomaly detection, where quick query responses are essential. To address streaming subgraph matching, existing methods incorporate precomputed indices, such as tree structures. However, these approaches often fail to scale efficiently under high query arrival rates or for large graphs due to limitations in caching, query reuse, and indexing performance. In this paper, we adopt a framework that leverages a subgraph index based on graph embeddings, enabling effective caching and reuse of query results. Building on this foundation, we perform k-nearest neighbor search on high-dimensional vectors by using a vector database for indexing. Inverted file and product quantization techniques within the vector database were employed to accelerate the process. Experimental evaluations on 16 diverse real-world datasets show that our approach reduces processing time by an average of 87.7% compared to the state-of-the-art method, achieves cache hit rates ranging from 70% to 90%, and demonstrates robustness and consistency across varying batch sizes and datasets.https://spj.science.org/doi/10.34133/icomputing.0131 |
| spellingShingle | Liuyi Chen Yi Ding Xushuo Tang Fangyue Chen Siyuan Gong Xu Zhou Zhengyi Yang Accelerating Streaming Subgraph Matching via Vector Databases Intelligent Computing |
| title | Accelerating Streaming Subgraph Matching via Vector Databases |
| title_full | Accelerating Streaming Subgraph Matching via Vector Databases |
| title_fullStr | Accelerating Streaming Subgraph Matching via Vector Databases |
| title_full_unstemmed | Accelerating Streaming Subgraph Matching via Vector Databases |
| title_short | Accelerating Streaming Subgraph Matching via Vector Databases |
| title_sort | accelerating streaming subgraph matching via vector databases |
| url | https://spj.science.org/doi/10.34133/icomputing.0131 |
| work_keys_str_mv | AT liuyichen acceleratingstreamingsubgraphmatchingviavectordatabases AT yiding acceleratingstreamingsubgraphmatchingviavectordatabases AT xushuotang acceleratingstreamingsubgraphmatchingviavectordatabases AT fangyuechen acceleratingstreamingsubgraphmatchingviavectordatabases AT siyuangong acceleratingstreamingsubgraphmatchingviavectordatabases AT xuzhou acceleratingstreamingsubgraphmatchingviavectordatabases AT zhengyiyang acceleratingstreamingsubgraphmatchingviavectordatabases |