Extraction-Augmented Generation of Scientific Abstracts Using Knowledge Graphs

Graph-to-text generation for specialized tasks, such as scientific abstract generation, is challenging due to the limited availability of structured knowledge graphs and the need to balance semantic accuracy with paragraph coherence. This motivates our proposal of an Extraction-Augmented Scientific...

Full description

Saved in:
Bibliographic Details
Main Authors: Haotong Wang, Yves Lepage
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10929048/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849392081098768384
author Haotong Wang
Yves Lepage
author_facet Haotong Wang
Yves Lepage
author_sort Haotong Wang
collection DOAJ
description Graph-to-text generation for specialized tasks, such as scientific abstract generation, is challenging due to the limited availability of structured knowledge graphs and the need to balance semantic accuracy with paragraph coherence. This motivates our proposal of an Extraction-Augmented Scientific Abstract Generation (EASAG) model which includes the processes of self-extraction, graph fusion, and abstract generation. The model performs self-determination of entities, followed by fine-grained extraction for each entity, predicting the target entity by specifying relations to construct semantic triples. The accumulated triples are then represented more logically through knowledge fusion using two proposed methods: Multi-hop Longest Subchain (MLS) and Label Ordering (LO). The former focuses on uncovering the core logical chain of the content, while the latter functionally segments sequences within the knowledge graph. Experimental results indicate that our model improves the quality of generated scientific abstracts through knowledge richness and the integration of discrete information. The two knowledge fusion methods are designed to enhance specific aspects, with one focusing on semantic accuracy and the other on maintaining paragraph structure integrity. Through fine-grained extraction, we reconstructed the Abstract Generation Dataset (AGENDA) and the newly developed ACL Abstract Graph Dataset (ACL-AGD) containing the latest Natural Language Processing (NLP) research, both datasets composed of graph-abstract pairs. Analysis reveals that these datasets exhibit richer relations, enhanced graph connectivity, and a more uniform distribution of relations.
format Article
id doaj-art-c7b4b3d7c18f417d8780d9a1e78eb7af
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-c7b4b3d7c18f417d8780d9a1e78eb7af2025-08-20T03:40:51ZengIEEEIEEE Access2169-35362025-01-0113487754879110.1109/ACCESS.2025.355175610929048Extraction-Augmented Generation of Scientific Abstracts Using Knowledge GraphsHaotong Wang0https://orcid.org/0000-0003-3209-5932Yves Lepage1Graduate School of Information, Production and Systems, Waseda University, Kitakyushu, JapanGraduate School of Information, Production and Systems, Waseda University, Kitakyushu, JapanGraph-to-text generation for specialized tasks, such as scientific abstract generation, is challenging due to the limited availability of structured knowledge graphs and the need to balance semantic accuracy with paragraph coherence. This motivates our proposal of an Extraction-Augmented Scientific Abstract Generation (EASAG) model which includes the processes of self-extraction, graph fusion, and abstract generation. The model performs self-determination of entities, followed by fine-grained extraction for each entity, predicting the target entity by specifying relations to construct semantic triples. The accumulated triples are then represented more logically through knowledge fusion using two proposed methods: Multi-hop Longest Subchain (MLS) and Label Ordering (LO). The former focuses on uncovering the core logical chain of the content, while the latter functionally segments sequences within the knowledge graph. Experimental results indicate that our model improves the quality of generated scientific abstracts through knowledge richness and the integration of discrete information. The two knowledge fusion methods are designed to enhance specific aspects, with one focusing on semantic accuracy and the other on maintaining paragraph structure integrity. Through fine-grained extraction, we reconstructed the Abstract Generation Dataset (AGENDA) and the newly developed ACL Abstract Graph Dataset (ACL-AGD) containing the latest Natural Language Processing (NLP) research, both datasets composed of graph-abstract pairs. Analysis reveals that these datasets exhibit richer relations, enhanced graph connectivity, and a more uniform distribution of relations.https://ieeexplore.ieee.org/document/10929048/Extraction-augmented generationscientific abstractknowledge graphsdatasets
spellingShingle Haotong Wang
Yves Lepage
Extraction-Augmented Generation of Scientific Abstracts Using Knowledge Graphs
IEEE Access
Extraction-augmented generation
scientific abstract
knowledge graphs
datasets
title Extraction-Augmented Generation of Scientific Abstracts Using Knowledge Graphs
title_full Extraction-Augmented Generation of Scientific Abstracts Using Knowledge Graphs
title_fullStr Extraction-Augmented Generation of Scientific Abstracts Using Knowledge Graphs
title_full_unstemmed Extraction-Augmented Generation of Scientific Abstracts Using Knowledge Graphs
title_short Extraction-Augmented Generation of Scientific Abstracts Using Knowledge Graphs
title_sort extraction augmented generation of scientific abstracts using knowledge graphs
topic Extraction-augmented generation
scientific abstract
knowledge graphs
datasets
url https://ieeexplore.ieee.org/document/10929048/
work_keys_str_mv AT haotongwang extractionaugmentedgenerationofscientificabstractsusingknowledgegraphs
AT yveslepage extractionaugmentedgenerationofscientificabstractsusingknowledgegraphs