Knowledge Graph as Pre-Training Corpus for Structural Reasoning via Multi-Hop Linearization

Large language models have demonstrated exceptional performance across various natural language processing tasks. However, their reliance on unstructured text corpora for pre-training limits their effectiveness in tasks requiring structured reasoning such as multi-hop question-answering. Knowledge G...

Full description

Saved in:
Bibliographic Details
Main Authors: Wooyoung Kim, Haemin Jung, Wooju Kim
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10817607/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849731343867445248
author Wooyoung Kim
Haemin Jung
Wooju Kim
author_facet Wooyoung Kim
Haemin Jung
Wooju Kim
author_sort Wooyoung Kim
collection DOAJ
description Large language models have demonstrated exceptional performance across various natural language processing tasks. However, their reliance on unstructured text corpora for pre-training limits their effectiveness in tasks requiring structured reasoning such as multi-hop question-answering. Knowledge Graphs provide a rich, structured source of relational data, offering an opportunity to enhance the reasoning capabilities of Large language models. In this paper, we propose a novel framework, Knowledge Graph as Pre-training Corpus (KGPC), which transforms knowledge graphs into text using a multi-hop linearization process. Unlike existing approaches that linearize singular triples, our method captures the interconnected nature of knowledge graphs by linking multiple triples across multiple hops, preserving their relational structure during the pre-training phase. This structured knowledge injection improves language models to perform complex reasoning tasks. We evaluate our approach on multi-hop reasoning benchmarks, demonstrating significant performance gains over existing models, particularly in question-answering tasks. Our results highlight the potential of multi-hop linearization in enhancing the structural reasoning capacity of language models, reducing error propagation, and improving the integration of structured knowledge into language models.
format Article
id doaj-art-d3858ab822064a9794bed59e9ca9bc81
institution DOAJ
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-d3858ab822064a9794bed59e9ca9bc812025-08-20T03:08:35ZengIEEEIEEE Access2169-35362025-01-01137273728310.1109/ACCESS.2024.352357910817607Knowledge Graph as Pre-Training Corpus for Structural Reasoning via Multi-Hop LinearizationWooyoung Kim0https://orcid.org/0009-0001-5930-9892Haemin Jung1https://orcid.org/0000-0002-1182-9303Wooju Kim2https://orcid.org/0000-0001-5828-178XDepartment of Industrial Engineering, Yonsei University, Seoul, Republic of KoreaDepartment of Industrial Management Engineering, Korea National University of Transportation, Chungju-si, Chungcheongbuk-do, Republic of KoreaDepartment of Industrial Engineering, Yonsei University, Seoul, Republic of KoreaLarge language models have demonstrated exceptional performance across various natural language processing tasks. However, their reliance on unstructured text corpora for pre-training limits their effectiveness in tasks requiring structured reasoning such as multi-hop question-answering. Knowledge Graphs provide a rich, structured source of relational data, offering an opportunity to enhance the reasoning capabilities of Large language models. In this paper, we propose a novel framework, Knowledge Graph as Pre-training Corpus (KGPC), which transforms knowledge graphs into text using a multi-hop linearization process. Unlike existing approaches that linearize singular triples, our method captures the interconnected nature of knowledge graphs by linking multiple triples across multiple hops, preserving their relational structure during the pre-training phase. This structured knowledge injection improves language models to perform complex reasoning tasks. We evaluate our approach on multi-hop reasoning benchmarks, demonstrating significant performance gains over existing models, particularly in question-answering tasks. Our results highlight the potential of multi-hop linearization in enhancing the structural reasoning capacity of language models, reducing error propagation, and improving the integration of structured knowledge into language models.https://ieeexplore.ieee.org/document/10817607/Large language modelknowledge graphmulti-hop reasoningquestion-answering
spellingShingle Wooyoung Kim
Haemin Jung
Wooju Kim
Knowledge Graph as Pre-Training Corpus for Structural Reasoning via Multi-Hop Linearization
IEEE Access
Large language model
knowledge graph
multi-hop reasoning
question-answering
title Knowledge Graph as Pre-Training Corpus for Structural Reasoning via Multi-Hop Linearization
title_full Knowledge Graph as Pre-Training Corpus for Structural Reasoning via Multi-Hop Linearization
title_fullStr Knowledge Graph as Pre-Training Corpus for Structural Reasoning via Multi-Hop Linearization
title_full_unstemmed Knowledge Graph as Pre-Training Corpus for Structural Reasoning via Multi-Hop Linearization
title_short Knowledge Graph as Pre-Training Corpus for Structural Reasoning via Multi-Hop Linearization
title_sort knowledge graph as pre training corpus for structural reasoning via multi hop linearization
topic Large language model
knowledge graph
multi-hop reasoning
question-answering
url https://ieeexplore.ieee.org/document/10817607/
work_keys_str_mv AT wooyoungkim knowledgegraphaspretrainingcorpusforstructuralreasoningviamultihoplinearization
AT haeminjung knowledgegraphaspretrainingcorpusforstructuralreasoningviamultihoplinearization
AT woojukim knowledgegraphaspretrainingcorpusforstructuralreasoningviamultihoplinearization