Automatic text generation system for endangered languages based on conditional generative adversarial networks

This paper explores the application of Conditional Generative Adversarial Networks (CGANs) in the field of endangered language text generation. The focus is on overcoming challenges associated with discrete data handling in natural language generation by utilizing an improved CGAN model. We introduc...

Full description

Saved in:
Bibliographic Details
Main Author: Zhong Luo
Format: Article
Language:English
Published: Elsevier 2025-12-01
Series:Systems and Soft Computing
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2772941925001243
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850121770537844736
author Zhong Luo
author_facet Zhong Luo
author_sort Zhong Luo
collection DOAJ
description This paper explores the application of Conditional Generative Adversarial Networks (CGANs) in the field of endangered language text generation. The focus is on overcoming challenges associated with discrete data handling in natural language generation by utilizing an improved CGAN model. We introduce a specialized Loss function, based on the MaliGAN model, which directs the discriminator to guide the generator towards producing texts that not only align closely with individual word accuracy but also maintain overall semantic coherence. Additionally, a beam search decoding strategy is implemented to enhance the global semantic information and diversity of the text output. Our experimental evaluations across multiple datasets, including the Tujia language, Image_COCO, and EMNLP2017 WMT News, demonstrate significant improvements. The LFMGAN model, a variant of CGANs, notably increased BLEU-4 scores by up to 50.7 % for the Tujia language and achieved ROUGE-L score enhancements of up to 86.3 % in the Image_COCO dataset. These results underscore the model's robustness and its potential in preserving linguistic diversity. We discuss integrating advanced models like GPT-2 and RoBERTa to address training instability and gradient explosion challenges. Future research directions include optimizing CGAN parameters using algorithms like particle swarm optimization, refining discriminator outputs in loss calculations, and incorporating cultural and linguistic features specific to endangered languages to improve the quality of the generated texts.
format Article
id doaj-art-e86a67bad98b4c149e5ef6a472c63fc8
institution OA Journals
issn 2772-9419
language English
publishDate 2025-12-01
publisher Elsevier
record_format Article
series Systems and Soft Computing
spelling doaj-art-e86a67bad98b4c149e5ef6a472c63fc82025-08-20T02:35:00ZengElsevierSystems and Soft Computing2772-94192025-12-01720030610.1016/j.sasc.2025.200306Automatic text generation system for endangered languages based on conditional generative adversarial networksZhong Luo0Corresponding author.; Faculty of Language and Literature, Anhui Sanlian University, Hefei 230601, ChinaThis paper explores the application of Conditional Generative Adversarial Networks (CGANs) in the field of endangered language text generation. The focus is on overcoming challenges associated with discrete data handling in natural language generation by utilizing an improved CGAN model. We introduce a specialized Loss function, based on the MaliGAN model, which directs the discriminator to guide the generator towards producing texts that not only align closely with individual word accuracy but also maintain overall semantic coherence. Additionally, a beam search decoding strategy is implemented to enhance the global semantic information and diversity of the text output. Our experimental evaluations across multiple datasets, including the Tujia language, Image_COCO, and EMNLP2017 WMT News, demonstrate significant improvements. The LFMGAN model, a variant of CGANs, notably increased BLEU-4 scores by up to 50.7 % for the Tujia language and achieved ROUGE-L score enhancements of up to 86.3 % in the Image_COCO dataset. These results underscore the model's robustness and its potential in preserving linguistic diversity. We discuss integrating advanced models like GPT-2 and RoBERTa to address training instability and gradient explosion challenges. Future research directions include optimizing CGAN parameters using algorithms like particle swarm optimization, refining discriminator outputs in loss calculations, and incorporating cultural and linguistic features specific to endangered languages to improve the quality of the generated texts.http://www.sciencedirect.com/science/article/pii/S2772941925001243Conditional generative adversarial networksNatural language generationEndangered languagesText generation
spellingShingle Zhong Luo
Automatic text generation system for endangered languages based on conditional generative adversarial networks
Systems and Soft Computing
Conditional generative adversarial networks
Natural language generation
Endangered languages
Text generation
title Automatic text generation system for endangered languages based on conditional generative adversarial networks
title_full Automatic text generation system for endangered languages based on conditional generative adversarial networks
title_fullStr Automatic text generation system for endangered languages based on conditional generative adversarial networks
title_full_unstemmed Automatic text generation system for endangered languages based on conditional generative adversarial networks
title_short Automatic text generation system for endangered languages based on conditional generative adversarial networks
title_sort automatic text generation system for endangered languages based on conditional generative adversarial networks
topic Conditional generative adversarial networks
Natural language generation
Endangered languages
Text generation
url http://www.sciencedirect.com/science/article/pii/S2772941925001243
work_keys_str_mv AT zhongluo automatictextgenerationsystemforendangeredlanguagesbasedonconditionalgenerativeadversarialnetworks