Vietnamese Sentiment Analysis under Limited Training Data Based on Deep Neural Networks

The annotated dataset is an essential requirement to develop an artificial intelligence (AI) system effectively and expect the generalization of the predictive models and to avoid overfitting. Lack of the training data is a big barrier so that AI systems can broaden in several domains which have no...

Full description

Saved in:
Bibliographic Details
Main Authors: Huu-Thanh Duong, Tram-Anh Nguyen-Thi, Vinh Truong Hoang
Format: Article
Language:English
Published: Wiley 2022-01-01
Series:Complexity
Online Access:http://dx.doi.org/10.1155/2022/3188449
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832565683632734208
author Huu-Thanh Duong
Tram-Anh Nguyen-Thi
Vinh Truong Hoang
author_facet Huu-Thanh Duong
Tram-Anh Nguyen-Thi
Vinh Truong Hoang
author_sort Huu-Thanh Duong
collection DOAJ
description The annotated dataset is an essential requirement to develop an artificial intelligence (AI) system effectively and expect the generalization of the predictive models and to avoid overfitting. Lack of the training data is a big barrier so that AI systems can broaden in several domains which have no or missing training data. Building these datasets is a tedious and expensive task and depends on the domains and languages. This is especially a big challenge for low-resource languages. In this paper, we experiment and evaluate many various approaches on sentiment analysis problems so that they can still obtain high performances under limited training data. This paper uses the preprocessing techniques to clean and normalize the data and generate the new samples from the limited training dataset based on many text augmentation techniques such as lexicon substitution, sentence shuffling, back translation, syntax-tree transformation, and embedding mixup. Several experiments have been performed for both well-known machine learning-based classifiers and deep learning models. We compare, analyze, and evaluate the results to indicate the advantage and disadvantage points of the techniques for each approach. The experimental results show that the data augmentation techniques enhance the accuracy of the predictive models; this promises that smart systems can be applied widely in several domains under limited training data.
format Article
id doaj-art-e24bd9189fac427c939c8d17e12c4950
institution Kabale University
issn 1099-0526
language English
publishDate 2022-01-01
publisher Wiley
record_format Article
series Complexity
spelling doaj-art-e24bd9189fac427c939c8d17e12c49502025-02-03T01:07:05ZengWileyComplexity1099-05262022-01-01202210.1155/2022/3188449Vietnamese Sentiment Analysis under Limited Training Data Based on Deep Neural NetworksHuu-Thanh Duong0Tram-Anh Nguyen-Thi1Vinh Truong Hoang2Faculty of Information TechnologyDepartment of Fundamental StudiesFaculty of Information TechnologyThe annotated dataset is an essential requirement to develop an artificial intelligence (AI) system effectively and expect the generalization of the predictive models and to avoid overfitting. Lack of the training data is a big barrier so that AI systems can broaden in several domains which have no or missing training data. Building these datasets is a tedious and expensive task and depends on the domains and languages. This is especially a big challenge for low-resource languages. In this paper, we experiment and evaluate many various approaches on sentiment analysis problems so that they can still obtain high performances under limited training data. This paper uses the preprocessing techniques to clean and normalize the data and generate the new samples from the limited training dataset based on many text augmentation techniques such as lexicon substitution, sentence shuffling, back translation, syntax-tree transformation, and embedding mixup. Several experiments have been performed for both well-known machine learning-based classifiers and deep learning models. We compare, analyze, and evaluate the results to indicate the advantage and disadvantage points of the techniques for each approach. The experimental results show that the data augmentation techniques enhance the accuracy of the predictive models; this promises that smart systems can be applied widely in several domains under limited training data.http://dx.doi.org/10.1155/2022/3188449
spellingShingle Huu-Thanh Duong
Tram-Anh Nguyen-Thi
Vinh Truong Hoang
Vietnamese Sentiment Analysis under Limited Training Data Based on Deep Neural Networks
Complexity
title Vietnamese Sentiment Analysis under Limited Training Data Based on Deep Neural Networks
title_full Vietnamese Sentiment Analysis under Limited Training Data Based on Deep Neural Networks
title_fullStr Vietnamese Sentiment Analysis under Limited Training Data Based on Deep Neural Networks
title_full_unstemmed Vietnamese Sentiment Analysis under Limited Training Data Based on Deep Neural Networks
title_short Vietnamese Sentiment Analysis under Limited Training Data Based on Deep Neural Networks
title_sort vietnamese sentiment analysis under limited training data based on deep neural networks
url http://dx.doi.org/10.1155/2022/3188449
work_keys_str_mv AT huuthanhduong vietnamesesentimentanalysisunderlimitedtrainingdatabasedondeepneuralnetworks
AT tramanhnguyenthi vietnamesesentimentanalysisunderlimitedtrainingdatabasedondeepneuralnetworks
AT vinhtruonghoang vietnamesesentimentanalysisunderlimitedtrainingdatabasedondeepneuralnetworks