Improving Data-to-Text Generation via Preserving High-Frequency Phrases and Fact-Checking

Transforming numerical data into natural language descriptions (data-to-text) requires presenting the data in the correct context, supplementing plausible details, and creating an overall coherent and non-conflicting narrative. In this work, we propose a generate-extract-correct pipeline for the tas...

Full description

Saved in:
Bibliographic Details
Main Authors: Ethan Joseph, Julian Lioanag, Mei Si
Format: Article
Language:English
Published: Accademia University Press 2021-12-01
Series:IJCoL
Online Access:https://journals.openedition.org/ijcol/909
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850262751240257536
author Ethan Joseph
Julian Lioanag
Mei Si
author_facet Ethan Joseph
Julian Lioanag
Mei Si
author_sort Ethan Joseph
collection DOAJ
description Transforming numerical data into natural language descriptions (data-to-text) requires presenting the data in the correct context, supplementing plausible details, and creating an overall coherent and non-conflicting narrative. In this work, we propose a generate-extract-correct pipeline for the task. We use transfer learning with an auxiliary task of keeping high-frequency word sequences from the training data for text generation. We then apply information extraction to the generated text to check its accuracy, followed by correction, and thus ensure the coherence of the generated narrative. We demonstrate the effectiveness of this approach with both objective and subjective evaluations. Using an empirical evaluation, we show that people rated our system’s outputs similarly to human-written text regarding its coherence, conciseness, and grammar.
format Article
id doaj-art-6fcfa0c0628c4281bd95766bd6b2a8cc
institution OA Journals
issn 2499-4553
language English
publishDate 2021-12-01
publisher Accademia University Press
record_format Article
series IJCoL
spelling doaj-art-6fcfa0c0628c4281bd95766bd6b2a8cc2025-08-20T01:55:08ZengAccademia University PressIJCoL2499-45532021-12-01722324410.4000/ijcol.909Improving Data-to-Text Generation via Preserving High-Frequency Phrases and Fact-CheckingEthan JosephJulian LioanagMei SiTransforming numerical data into natural language descriptions (data-to-text) requires presenting the data in the correct context, supplementing plausible details, and creating an overall coherent and non-conflicting narrative. In this work, we propose a generate-extract-correct pipeline for the task. We use transfer learning with an auxiliary task of keeping high-frequency word sequences from the training data for text generation. We then apply information extraction to the generated text to check its accuracy, followed by correction, and thus ensure the coherence of the generated narrative. We demonstrate the effectiveness of this approach with both objective and subjective evaluations. Using an empirical evaluation, we show that people rated our system’s outputs similarly to human-written text regarding its coherence, conciseness, and grammar.https://journals.openedition.org/ijcol/909
spellingShingle Ethan Joseph
Julian Lioanag
Mei Si
Improving Data-to-Text Generation via Preserving High-Frequency Phrases and Fact-Checking
IJCoL
title Improving Data-to-Text Generation via Preserving High-Frequency Phrases and Fact-Checking
title_full Improving Data-to-Text Generation via Preserving High-Frequency Phrases and Fact-Checking
title_fullStr Improving Data-to-Text Generation via Preserving High-Frequency Phrases and Fact-Checking
title_full_unstemmed Improving Data-to-Text Generation via Preserving High-Frequency Phrases and Fact-Checking
title_short Improving Data-to-Text Generation via Preserving High-Frequency Phrases and Fact-Checking
title_sort improving data to text generation via preserving high frequency phrases and fact checking
url https://journals.openedition.org/ijcol/909
work_keys_str_mv AT ethanjoseph improvingdatatotextgenerationviapreservinghighfrequencyphrasesandfactchecking
AT julianlioanag improvingdatatotextgenerationviapreservinghighfrequencyphrasesandfactchecking
AT meisi improvingdatatotextgenerationviapreservinghighfrequencyphrasesandfactchecking