CGM: Copy Mechanism GPT with Mask for Ellipsis and Anaphora Resolution in Dialogue

GPT (Generative Pre-trained Transformer) is a generative language model that demonstrates outstanding performance in the field of text generation. Generally, the attention mechanism of the transformer model behaves similarly to a copy distribution. However, due to the absence of a dedicated encoder,...

Full description

Saved in:
Bibliographic Details
Main Authors: Ji-Won Cho, Jinyoung Oh, Jeong-Won Cha
Format: Article
Language:English
Published: MDPI AG 2024-12-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/1/5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850113433041633280
author Ji-Won Cho
Jinyoung Oh
Jeong-Won Cha
author_facet Ji-Won Cho
Jinyoung Oh
Jeong-Won Cha
author_sort Ji-Won Cho
collection DOAJ
description GPT (Generative Pre-trained Transformer) is a generative language model that demonstrates outstanding performance in the field of text generation. Generally, the attention mechanism of the transformer model behaves similarly to a copy distribution. However, due to the absence of a dedicated encoder, it is challenging to ensure that the input is retained for generation. We propose a model that emphasizes the copy mechanism in GPT. We generate masks for the input words to initialize the distribution and explicitly encourage copying through training. To demonstrate the effectiveness of our approach, we conducted experiments to restore ellipsis and anaphora in dialogue. In a single domain, we achieved 0.4319 (BLEU), 0.6408 (Rouge-L), 0.9040 (simCSE), and 0.9070 (BERTScore), while in multi-domain settings we obtained 0.4611 (BLEU), 0.6379 (Rouge-L), 0.8902 (simCSE), and 0.8999 (BERTScore). Additionally, we evaluated the operation of the copy mechanism on out-of-domain data, yielding excellent results. We anticipate that applying the copy mechanism to GPT will be useful for utilizing language models in constrained situations.
format Article
id doaj-art-c2f968d2f78747f78f2d66051484a689
institution OA Journals
issn 2076-3417
language English
publishDate 2024-12-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-c2f968d2f78747f78f2d66051484a6892025-08-20T02:37:09ZengMDPI AGApplied Sciences2076-34172024-12-01151510.3390/app15010005CGM: Copy Mechanism GPT with Mask for Ellipsis and Anaphora Resolution in DialogueJi-Won Cho0Jinyoung Oh1Jeong-Won Cha2Department of Computer Engineering, Changwon National University, Changwon 51140, Republic of KoreaDepartment of Computer Engineering, Changwon National University, Changwon 51140, Republic of KoreaDepartment of Computer Engineering, Changwon National University, Changwon 51140, Republic of KoreaGPT (Generative Pre-trained Transformer) is a generative language model that demonstrates outstanding performance in the field of text generation. Generally, the attention mechanism of the transformer model behaves similarly to a copy distribution. However, due to the absence of a dedicated encoder, it is challenging to ensure that the input is retained for generation. We propose a model that emphasizes the copy mechanism in GPT. We generate masks for the input words to initialize the distribution and explicitly encourage copying through training. To demonstrate the effectiveness of our approach, we conducted experiments to restore ellipsis and anaphora in dialogue. In a single domain, we achieved 0.4319 (BLEU), 0.6408 (Rouge-L), 0.9040 (simCSE), and 0.9070 (BERTScore), while in multi-domain settings we obtained 0.4611 (BLEU), 0.6379 (Rouge-L), 0.8902 (simCSE), and 0.8999 (BERTScore). Additionally, we evaluated the operation of the copy mechanism on out-of-domain data, yielding excellent results. We anticipate that applying the copy mechanism to GPT will be useful for utilizing language models in constrained situations.https://www.mdpi.com/2076-3417/15/1/5copy mechanismcurriculum learningpre-trained models
spellingShingle Ji-Won Cho
Jinyoung Oh
Jeong-Won Cha
CGM: Copy Mechanism GPT with Mask for Ellipsis and Anaphora Resolution in Dialogue
Applied Sciences
copy mechanism
curriculum learning
pre-trained models
title CGM: Copy Mechanism GPT with Mask for Ellipsis and Anaphora Resolution in Dialogue
title_full CGM: Copy Mechanism GPT with Mask for Ellipsis and Anaphora Resolution in Dialogue
title_fullStr CGM: Copy Mechanism GPT with Mask for Ellipsis and Anaphora Resolution in Dialogue
title_full_unstemmed CGM: Copy Mechanism GPT with Mask for Ellipsis and Anaphora Resolution in Dialogue
title_short CGM: Copy Mechanism GPT with Mask for Ellipsis and Anaphora Resolution in Dialogue
title_sort cgm copy mechanism gpt with mask for ellipsis and anaphora resolution in dialogue
topic copy mechanism
curriculum learning
pre-trained models
url https://www.mdpi.com/2076-3417/15/1/5
work_keys_str_mv AT jiwoncho cgmcopymechanismgptwithmaskforellipsisandanaphoraresolutionindialogue
AT jinyoungoh cgmcopymechanismgptwithmaskforellipsisandanaphoraresolutionindialogue
AT jeongwoncha cgmcopymechanismgptwithmaskforellipsisandanaphoraresolutionindialogue