Vietnamese Sentiment Analysis under Limited Training Data Based on Deep Neural Networks

The annotated dataset is an essential requirement to develop an artificial intelligence (AI) system effectively and expect the generalization of the predictive models and to avoid overfitting. Lack of the training data is a big barrier so that AI systems can broaden in several domains which have no...

Full description

Saved in:

Bibliographic Details
Main Authors:	Huu-Thanh Duong, Tram-Anh Nguyen-Thi, Vinh Truong Hoang
Format:	Article
Language:	English
Published:	Wiley 2022-01-01
Series:	Complexity
Online Access:	http://dx.doi.org/10.1155/2022/3188449
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832565683632734208
author	Huu-Thanh Duong Tram-Anh Nguyen-Thi Vinh Truong Hoang
author_facet	Huu-Thanh Duong Tram-Anh Nguyen-Thi Vinh Truong Hoang
author_sort	Huu-Thanh Duong
collection	DOAJ
description	The annotated dataset is an essential requirement to develop an artificial intelligence (AI) system effectively and expect the generalization of the predictive models and to avoid overfitting. Lack of the training data is a big barrier so that AI systems can broaden in several domains which have no or missing training data. Building these datasets is a tedious and expensive task and depends on the domains and languages. This is especially a big challenge for low-resource languages. In this paper, we experiment and evaluate many various approaches on sentiment analysis problems so that they can still obtain high performances under limited training data. This paper uses the preprocessing techniques to clean and normalize the data and generate the new samples from the limited training dataset based on many text augmentation techniques such as lexicon substitution, sentence shuffling, back translation, syntax-tree transformation, and embedding mixup. Several experiments have been performed for both well-known machine learning-based classifiers and deep learning models. We compare, analyze, and evaluate the results to indicate the advantage and disadvantage points of the techniques for each approach. The experimental results show that the data augmentation techniques enhance the accuracy of the predictive models; this promises that smart systems can be applied widely in several domains under limited training data.
format	Article
id	doaj-art-e24bd9189fac427c939c8d17e12c4950
institution	Kabale University
issn	1099-0526
language	English
publishDate	2022-01-01
publisher	Wiley
record_format	Article
series	Complexity
spelling	doaj-art-e24bd9189fac427c939c8d17e12c49502025-02-03T01:07:05ZengWileyComplexity1099-05262022-01-01202210.1155/2022/3188449Vietnamese Sentiment Analysis under Limited Training Data Based on Deep Neural NetworksHuu-Thanh Duong0Tram-Anh Nguyen-Thi1Vinh Truong Hoang2Faculty of Information TechnologyDepartment of Fundamental StudiesFaculty of Information TechnologyThe annotated dataset is an essential requirement to develop an artificial intelligence (AI) system effectively and expect the generalization of the predictive models and to avoid overfitting. Lack of the training data is a big barrier so that AI systems can broaden in several domains which have no or missing training data. Building these datasets is a tedious and expensive task and depends on the domains and languages. This is especially a big challenge for low-resource languages. In this paper, we experiment and evaluate many various approaches on sentiment analysis problems so that they can still obtain high performances under limited training data. This paper uses the preprocessing techniques to clean and normalize the data and generate the new samples from the limited training dataset based on many text augmentation techniques such as lexicon substitution, sentence shuffling, back translation, syntax-tree transformation, and embedding mixup. Several experiments have been performed for both well-known machine learning-based classifiers and deep learning models. We compare, analyze, and evaluate the results to indicate the advantage and disadvantage points of the techniques for each approach. The experimental results show that the data augmentation techniques enhance the accuracy of the predictive models; this promises that smart systems can be applied widely in several domains under limited training data.http://dx.doi.org/10.1155/2022/3188449
spellingShingle	Huu-Thanh Duong Tram-Anh Nguyen-Thi Vinh Truong Hoang Vietnamese Sentiment Analysis under Limited Training Data Based on Deep Neural Networks Complexity
title	Vietnamese Sentiment Analysis under Limited Training Data Based on Deep Neural Networks
title_full	Vietnamese Sentiment Analysis under Limited Training Data Based on Deep Neural Networks
title_fullStr	Vietnamese Sentiment Analysis under Limited Training Data Based on Deep Neural Networks
title_full_unstemmed	Vietnamese Sentiment Analysis under Limited Training Data Based on Deep Neural Networks
title_short	Vietnamese Sentiment Analysis under Limited Training Data Based on Deep Neural Networks
title_sort	vietnamese sentiment analysis under limited training data based on deep neural networks
url	http://dx.doi.org/10.1155/2022/3188449
work_keys_str_mv	AT huuthanhduong vietnamesesentimentanalysisunderlimitedtrainingdatabasedondeepneuralnetworks AT tramanhnguyenthi vietnamesesentimentanalysisunderlimitedtrainingdatabasedondeepneuralnetworks AT vinhtruonghoang vietnamesesentimentanalysisunderlimitedtrainingdatabasedondeepneuralnetworks

Vietnamese Sentiment Analysis under Limited Training Data Based on Deep Neural Networks

Similar Items