PIDQA—Question Answering on Piping and Instrumentation Diagrams

This paper introduces a novel framework enabling natural language question answering on Piping and Instrumentation Diagrams (P&IDs), addressing a critical gap between engineering design documentation and intuitive information retrieval. Our approach transforms static P&IDs into queryable kno...

Full description

Saved in:

Bibliographic Details
Main Authors:	Mohit Gupta, Chialing Wei, Thomas Czerniawski, Ricardo Eiris
Format:	Article
Language:	English
Published:	MDPI AG 2025-04-01
Series:	Machine Learning and Knowledge Extraction
Subjects:	P&ID information retrieval knowledge graphs question answering RAG Neo4j
Online Access:	https://www.mdpi.com/2504-4990/7/2/39
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849705966097924096
author	Mohit Gupta Chialing Wei Thomas Czerniawski Ricardo Eiris
author_facet	Mohit Gupta Chialing Wei Thomas Czerniawski Ricardo Eiris
author_sort	Mohit Gupta
collection	DOAJ
description	This paper introduces a novel framework enabling natural language question answering on Piping and Instrumentation Diagrams (P&IDs), addressing a critical gap between engineering design documentation and intuitive information retrieval. Our approach transforms static P&IDs into queryable knowledge bases through a three-stage pipeline. First, we recognize entities in a P&ID image and organize their relationships to form a base entity graph. Second, this entity graph is converted into a Labeled Property Graph (LPG), enriched with semantic attributes for nodes and edges. Third, a Large Language Model (LLM)-based information retrieval system translates a user query into a graph query language (Cypher) and retrieves the answer by executing it on LPG. For our experiments, we augmented a publicly available P&ID image dataset with our novel PIDQA dataset, which comprises 64,000 question–answer pairs spanning four categories: (I) simple counting, (II) spatial counting, (III) spatial connections, and (IV) value-based questions. Our experiments (using gpt-3.5-turbo) demonstrate that grounding the LLM with dynamic few-shot sampling robustly elevates accuracy by 10.6–43.5% over schema contextualization alone, even under high lexical diversity conditions (e.g., paraphrasing, ambiguity). By reducing barriers in retrieving P&ID data, this work advances human–AI collaboration for industrial workflows in design validation and safety audits.
format	Article
id	doaj-art-3384a1280c04463bbaa3e5fe7b757bff
institution	DOAJ
issn	2504-4990
language	English
publishDate	2025-04-01
publisher	MDPI AG
record_format	Article
series	Machine Learning and Knowledge Extraction
spelling	doaj-art-3384a1280c04463bbaa3e5fe7b757bff2025-08-20T03:16:19ZengMDPI AGMachine Learning and Knowledge Extraction2504-49902025-04-01723910.3390/make7020039PIDQA—Question Answering on Piping and Instrumentation DiagramsMohit Gupta0Chialing Wei1Thomas Czerniawski2Ricardo Eiris3School of Sustainable Engineering and the Built Environment, Arizona State University, Tempe, AZ 85287-1404, USASchool of Sustainable Engineering and the Built Environment, Arizona State University, Tempe, AZ 85287-1404, USASchool of Sustainable Engineering and the Built Environment, Arizona State University, Tempe, AZ 85287-1404, USASchool of Sustainable Engineering and the Built Environment, Arizona State University, Tempe, AZ 85287-1404, USAThis paper introduces a novel framework enabling natural language question answering on Piping and Instrumentation Diagrams (P&IDs), addressing a critical gap between engineering design documentation and intuitive information retrieval. Our approach transforms static P&IDs into queryable knowledge bases through a three-stage pipeline. First, we recognize entities in a P&ID image and organize their relationships to form a base entity graph. Second, this entity graph is converted into a Labeled Property Graph (LPG), enriched with semantic attributes for nodes and edges. Third, a Large Language Model (LLM)-based information retrieval system translates a user query into a graph query language (Cypher) and retrieves the answer by executing it on LPG. For our experiments, we augmented a publicly available P&ID image dataset with our novel PIDQA dataset, which comprises 64,000 question–answer pairs spanning four categories: (I) simple counting, (II) spatial counting, (III) spatial connections, and (IV) value-based questions. Our experiments (using gpt-3.5-turbo) demonstrate that grounding the LLM with dynamic few-shot sampling robustly elevates accuracy by 10.6–43.5% over schema contextualization alone, even under high lexical diversity conditions (e.g., paraphrasing, ambiguity). By reducing barriers in retrieving P&ID data, this work advances human–AI collaboration for industrial workflows in design validation and safety audits.https://www.mdpi.com/2504-4990/7/2/39P&IDinformation retrievalknowledge graphsquestion answeringRAGNeo4j
spellingShingle	Mohit Gupta Chialing Wei Thomas Czerniawski Ricardo Eiris PIDQA—Question Answering on Piping and Instrumentation Diagrams Machine Learning and Knowledge Extraction P&ID information retrieval knowledge graphs question answering RAG Neo4j
title	PIDQA—Question Answering on Piping and Instrumentation Diagrams
title_full	PIDQA—Question Answering on Piping and Instrumentation Diagrams
title_fullStr	PIDQA—Question Answering on Piping and Instrumentation Diagrams
title_full_unstemmed	PIDQA—Question Answering on Piping and Instrumentation Diagrams
title_short	PIDQA—Question Answering on Piping and Instrumentation Diagrams
title_sort	pidqa question answering on piping and instrumentation diagrams
topic	P&ID information retrieval knowledge graphs question answering RAG Neo4j
url	https://www.mdpi.com/2504-4990/7/2/39
work_keys_str_mv	AT mohitgupta pidqaquestionansweringonpipingandinstrumentationdiagrams AT chialingwei pidqaquestionansweringonpipingandinstrumentationdiagrams AT thomasczerniawski pidqaquestionansweringonpipingandinstrumentationdiagrams AT ricardoeiris pidqaquestionansweringonpipingandinstrumentationdiagrams

PIDQA—Question Answering on Piping and Instrumentation Diagrams

Similar Items