Extraction-Augmented Generation of Scientific Abstracts Using Knowledge Graphs
Graph-to-text generation for specialized tasks, such as scientific abstract generation, is challenging due to the limited availability of structured knowledge graphs and the need to balance semantic accuracy with paragraph coherence. This motivates our proposal of an Extraction-Augmented Scientific...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10929048/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Graph-to-text generation for specialized tasks, such as scientific abstract generation, is challenging due to the limited availability of structured knowledge graphs and the need to balance semantic accuracy with paragraph coherence. This motivates our proposal of an Extraction-Augmented Scientific Abstract Generation (EASAG) model which includes the processes of self-extraction, graph fusion, and abstract generation. The model performs self-determination of entities, followed by fine-grained extraction for each entity, predicting the target entity by specifying relations to construct semantic triples. The accumulated triples are then represented more logically through knowledge fusion using two proposed methods: Multi-hop Longest Subchain (MLS) and Label Ordering (LO). The former focuses on uncovering the core logical chain of the content, while the latter functionally segments sequences within the knowledge graph. Experimental results indicate that our model improves the quality of generated scientific abstracts through knowledge richness and the integration of discrete information. The two knowledge fusion methods are designed to enhance specific aspects, with one focusing on semantic accuracy and the other on maintaining paragraph structure integrity. Through fine-grained extraction, we reconstructed the Abstract Generation Dataset (AGENDA) and the newly developed ACL Abstract Graph Dataset (ACL-AGD) containing the latest Natural Language Processing (NLP) research, both datasets composed of graph-abstract pairs. Analysis reveals that these datasets exhibit richer relations, enhanced graph connectivity, and a more uniform distribution of relations. |
|---|---|
| ISSN: | 2169-3536 |