CGM: Copy Mechanism GPT with Mask for Ellipsis and Anaphora Resolution in Dialogue
GPT (Generative Pre-trained Transformer) is a generative language model that demonstrates outstanding performance in the field of text generation. Generally, the attention mechanism of the transformer model behaves similarly to a copy distribution. However, due to the absence of a dedicated encoder,...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2024-12-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/1/5 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850113433041633280 |
|---|---|
| author | Ji-Won Cho Jinyoung Oh Jeong-Won Cha |
| author_facet | Ji-Won Cho Jinyoung Oh Jeong-Won Cha |
| author_sort | Ji-Won Cho |
| collection | DOAJ |
| description | GPT (Generative Pre-trained Transformer) is a generative language model that demonstrates outstanding performance in the field of text generation. Generally, the attention mechanism of the transformer model behaves similarly to a copy distribution. However, due to the absence of a dedicated encoder, it is challenging to ensure that the input is retained for generation. We propose a model that emphasizes the copy mechanism in GPT. We generate masks for the input words to initialize the distribution and explicitly encourage copying through training. To demonstrate the effectiveness of our approach, we conducted experiments to restore ellipsis and anaphora in dialogue. In a single domain, we achieved 0.4319 (BLEU), 0.6408 (Rouge-L), 0.9040 (simCSE), and 0.9070 (BERTScore), while in multi-domain settings we obtained 0.4611 (BLEU), 0.6379 (Rouge-L), 0.8902 (simCSE), and 0.8999 (BERTScore). Additionally, we evaluated the operation of the copy mechanism on out-of-domain data, yielding excellent results. We anticipate that applying the copy mechanism to GPT will be useful for utilizing language models in constrained situations. |
| format | Article |
| id | doaj-art-c2f968d2f78747f78f2d66051484a689 |
| institution | OA Journals |
| issn | 2076-3417 |
| language | English |
| publishDate | 2024-12-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-c2f968d2f78747f78f2d66051484a6892025-08-20T02:37:09ZengMDPI AGApplied Sciences2076-34172024-12-01151510.3390/app15010005CGM: Copy Mechanism GPT with Mask for Ellipsis and Anaphora Resolution in DialogueJi-Won Cho0Jinyoung Oh1Jeong-Won Cha2Department of Computer Engineering, Changwon National University, Changwon 51140, Republic of KoreaDepartment of Computer Engineering, Changwon National University, Changwon 51140, Republic of KoreaDepartment of Computer Engineering, Changwon National University, Changwon 51140, Republic of KoreaGPT (Generative Pre-trained Transformer) is a generative language model that demonstrates outstanding performance in the field of text generation. Generally, the attention mechanism of the transformer model behaves similarly to a copy distribution. However, due to the absence of a dedicated encoder, it is challenging to ensure that the input is retained for generation. We propose a model that emphasizes the copy mechanism in GPT. We generate masks for the input words to initialize the distribution and explicitly encourage copying through training. To demonstrate the effectiveness of our approach, we conducted experiments to restore ellipsis and anaphora in dialogue. In a single domain, we achieved 0.4319 (BLEU), 0.6408 (Rouge-L), 0.9040 (simCSE), and 0.9070 (BERTScore), while in multi-domain settings we obtained 0.4611 (BLEU), 0.6379 (Rouge-L), 0.8902 (simCSE), and 0.8999 (BERTScore). Additionally, we evaluated the operation of the copy mechanism on out-of-domain data, yielding excellent results. We anticipate that applying the copy mechanism to GPT will be useful for utilizing language models in constrained situations.https://www.mdpi.com/2076-3417/15/1/5copy mechanismcurriculum learningpre-trained models |
| spellingShingle | Ji-Won Cho Jinyoung Oh Jeong-Won Cha CGM: Copy Mechanism GPT with Mask for Ellipsis and Anaphora Resolution in Dialogue Applied Sciences copy mechanism curriculum learning pre-trained models |
| title | CGM: Copy Mechanism GPT with Mask for Ellipsis and Anaphora Resolution in Dialogue |
| title_full | CGM: Copy Mechanism GPT with Mask for Ellipsis and Anaphora Resolution in Dialogue |
| title_fullStr | CGM: Copy Mechanism GPT with Mask for Ellipsis and Anaphora Resolution in Dialogue |
| title_full_unstemmed | CGM: Copy Mechanism GPT with Mask for Ellipsis and Anaphora Resolution in Dialogue |
| title_short | CGM: Copy Mechanism GPT with Mask for Ellipsis and Anaphora Resolution in Dialogue |
| title_sort | cgm copy mechanism gpt with mask for ellipsis and anaphora resolution in dialogue |
| topic | copy mechanism curriculum learning pre-trained models |
| url | https://www.mdpi.com/2076-3417/15/1/5 |
| work_keys_str_mv | AT jiwoncho cgmcopymechanismgptwithmaskforellipsisandanaphoraresolutionindialogue AT jinyoungoh cgmcopymechanismgptwithmaskforellipsisandanaphoraresolutionindialogue AT jeongwoncha cgmcopymechanismgptwithmaskforellipsisandanaphoraresolutionindialogue |