Knowledge Graph as Pre-Training Corpus for Structural Reasoning via Multi-Hop Linearization
Large language models have demonstrated exceptional performance across various natural language processing tasks. However, their reliance on unstructured text corpora for pre-training limits their effectiveness in tasks requiring structured reasoning such as multi-hop question-answering. Knowledge G...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10817607/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849731343867445248 |
|---|---|
| author | Wooyoung Kim Haemin Jung Wooju Kim |
| author_facet | Wooyoung Kim Haemin Jung Wooju Kim |
| author_sort | Wooyoung Kim |
| collection | DOAJ |
| description | Large language models have demonstrated exceptional performance across various natural language processing tasks. However, their reliance on unstructured text corpora for pre-training limits their effectiveness in tasks requiring structured reasoning such as multi-hop question-answering. Knowledge Graphs provide a rich, structured source of relational data, offering an opportunity to enhance the reasoning capabilities of Large language models. In this paper, we propose a novel framework, Knowledge Graph as Pre-training Corpus (KGPC), which transforms knowledge graphs into text using a multi-hop linearization process. Unlike existing approaches that linearize singular triples, our method captures the interconnected nature of knowledge graphs by linking multiple triples across multiple hops, preserving their relational structure during the pre-training phase. This structured knowledge injection improves language models to perform complex reasoning tasks. We evaluate our approach on multi-hop reasoning benchmarks, demonstrating significant performance gains over existing models, particularly in question-answering tasks. Our results highlight the potential of multi-hop linearization in enhancing the structural reasoning capacity of language models, reducing error propagation, and improving the integration of structured knowledge into language models. |
| format | Article |
| id | doaj-art-d3858ab822064a9794bed59e9ca9bc81 |
| institution | DOAJ |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-d3858ab822064a9794bed59e9ca9bc812025-08-20T03:08:35ZengIEEEIEEE Access2169-35362025-01-01137273728310.1109/ACCESS.2024.352357910817607Knowledge Graph as Pre-Training Corpus for Structural Reasoning via Multi-Hop LinearizationWooyoung Kim0https://orcid.org/0009-0001-5930-9892Haemin Jung1https://orcid.org/0000-0002-1182-9303Wooju Kim2https://orcid.org/0000-0001-5828-178XDepartment of Industrial Engineering, Yonsei University, Seoul, Republic of KoreaDepartment of Industrial Management Engineering, Korea National University of Transportation, Chungju-si, Chungcheongbuk-do, Republic of KoreaDepartment of Industrial Engineering, Yonsei University, Seoul, Republic of KoreaLarge language models have demonstrated exceptional performance across various natural language processing tasks. However, their reliance on unstructured text corpora for pre-training limits their effectiveness in tasks requiring structured reasoning such as multi-hop question-answering. Knowledge Graphs provide a rich, structured source of relational data, offering an opportunity to enhance the reasoning capabilities of Large language models. In this paper, we propose a novel framework, Knowledge Graph as Pre-training Corpus (KGPC), which transforms knowledge graphs into text using a multi-hop linearization process. Unlike existing approaches that linearize singular triples, our method captures the interconnected nature of knowledge graphs by linking multiple triples across multiple hops, preserving their relational structure during the pre-training phase. This structured knowledge injection improves language models to perform complex reasoning tasks. We evaluate our approach on multi-hop reasoning benchmarks, demonstrating significant performance gains over existing models, particularly in question-answering tasks. Our results highlight the potential of multi-hop linearization in enhancing the structural reasoning capacity of language models, reducing error propagation, and improving the integration of structured knowledge into language models.https://ieeexplore.ieee.org/document/10817607/Large language modelknowledge graphmulti-hop reasoningquestion-answering |
| spellingShingle | Wooyoung Kim Haemin Jung Wooju Kim Knowledge Graph as Pre-Training Corpus for Structural Reasoning via Multi-Hop Linearization IEEE Access Large language model knowledge graph multi-hop reasoning question-answering |
| title | Knowledge Graph as Pre-Training Corpus for Structural Reasoning via Multi-Hop Linearization |
| title_full | Knowledge Graph as Pre-Training Corpus for Structural Reasoning via Multi-Hop Linearization |
| title_fullStr | Knowledge Graph as Pre-Training Corpus for Structural Reasoning via Multi-Hop Linearization |
| title_full_unstemmed | Knowledge Graph as Pre-Training Corpus for Structural Reasoning via Multi-Hop Linearization |
| title_short | Knowledge Graph as Pre-Training Corpus for Structural Reasoning via Multi-Hop Linearization |
| title_sort | knowledge graph as pre training corpus for structural reasoning via multi hop linearization |
| topic | Large language model knowledge graph multi-hop reasoning question-answering |
| url | https://ieeexplore.ieee.org/document/10817607/ |
| work_keys_str_mv | AT wooyoungkim knowledgegraphaspretrainingcorpusforstructuralreasoningviamultihoplinearization AT haeminjung knowledgegraphaspretrainingcorpusforstructuralreasoningviamultihoplinearization AT woojukim knowledgegraphaspretrainingcorpusforstructuralreasoningviamultihoplinearization |