Research on Word Vector Training Method Based on Improved Skip-Gram Algorithm

Through the effective word vector training method, we can obtain semantic-rich word vectors and can achieve better results on the same task. In view of the shortcomings of the traditional skip-gram model in coding and modeling the processing of context words, this study proposes an improved word vec...

Full description

Saved in:
Bibliographic Details
Main Author: Yachun Tang
Format: Article
Language:English
Published: Wiley 2022-01-01
Series:Advances in Multimedia
Online Access:http://dx.doi.org/10.1155/2022/4414207
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832563452269297664
author Yachun Tang
author_facet Yachun Tang
author_sort Yachun Tang
collection DOAJ
description Through the effective word vector training method, we can obtain semantic-rich word vectors and can achieve better results on the same task. In view of the shortcomings of the traditional skip-gram model in coding and modeling the processing of context words, this study proposes an improved word vector-training method based on skip-gram algorithm. Based on the analysis of the existing skip-gram model, the concept of distribution hypothesis is introduced. The distribution of each word in the word context is taken as the representation of the word, the word is put into the semantic space of the word, and then the word is modelled, which is better modelled by the smoothing of words and the semantic space of words. In the training process, the random gradient descent method is used to solve the vector representation of each word and each Chinese character. The proposed training method is compared with skip gram, CWE+P, and SEING by using word sense similarity task and text classification task in the experiment. Experimental results showed that the proposed method had significant advantages in the Chinese-word segmentation task with a performance gain rate of about 30%. The method proposed in this study provides a reference for the in-depth study of word vector and text mining.
format Article
id doaj-art-c0143057d7a645b0a84914d1ebaa2ab2
institution Kabale University
issn 1687-5699
language English
publishDate 2022-01-01
publisher Wiley
record_format Article
series Advances in Multimedia
spelling doaj-art-c0143057d7a645b0a84914d1ebaa2ab22025-02-03T01:20:11ZengWileyAdvances in Multimedia1687-56992022-01-01202210.1155/2022/4414207Research on Word Vector Training Method Based on Improved Skip-Gram AlgorithmYachun Tang0College of Information EngineeringThrough the effective word vector training method, we can obtain semantic-rich word vectors and can achieve better results on the same task. In view of the shortcomings of the traditional skip-gram model in coding and modeling the processing of context words, this study proposes an improved word vector-training method based on skip-gram algorithm. Based on the analysis of the existing skip-gram model, the concept of distribution hypothesis is introduced. The distribution of each word in the word context is taken as the representation of the word, the word is put into the semantic space of the word, and then the word is modelled, which is better modelled by the smoothing of words and the semantic space of words. In the training process, the random gradient descent method is used to solve the vector representation of each word and each Chinese character. The proposed training method is compared with skip gram, CWE+P, and SEING by using word sense similarity task and text classification task in the experiment. Experimental results showed that the proposed method had significant advantages in the Chinese-word segmentation task with a performance gain rate of about 30%. The method proposed in this study provides a reference for the in-depth study of word vector and text mining.http://dx.doi.org/10.1155/2022/4414207
spellingShingle Yachun Tang
Research on Word Vector Training Method Based on Improved Skip-Gram Algorithm
Advances in Multimedia
title Research on Word Vector Training Method Based on Improved Skip-Gram Algorithm
title_full Research on Word Vector Training Method Based on Improved Skip-Gram Algorithm
title_fullStr Research on Word Vector Training Method Based on Improved Skip-Gram Algorithm
title_full_unstemmed Research on Word Vector Training Method Based on Improved Skip-Gram Algorithm
title_short Research on Word Vector Training Method Based on Improved Skip-Gram Algorithm
title_sort research on word vector training method based on improved skip gram algorithm
url http://dx.doi.org/10.1155/2022/4414207
work_keys_str_mv AT yachuntang researchonwordvectortrainingmethodbasedonimprovedskipgramalgorithm