Web News Data Extraction Technology Based on Text Keywords

In order to shorten the time for users to query news on the Internet, this paper studies and designs a network news data extraction technology, which can obtain the main news information through the extraction of news text keywords. Firstly, the TF-IDF keyword extraction algorithm, TextRank keyword...

Full description

Saved in:
Bibliographic Details
Main Author: Kun Zhang
Format: Article
Language:English
Published: Wiley 2021-01-01
Series:Complexity
Online Access:http://dx.doi.org/10.1155/2021/5529447
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850174306745581568
author Kun Zhang
author_facet Kun Zhang
author_sort Kun Zhang
collection DOAJ
description In order to shorten the time for users to query news on the Internet, this paper studies and designs a network news data extraction technology, which can obtain the main news information through the extraction of news text keywords. Firstly, the TF-IDF keyword extraction algorithm, TextRank keyword extraction algorithm, and LDA keyword extraction algorithm are analyzed to understand the keyword extraction process, and the TF-IDF algorithm is optimized by Zipf’s law. By introducing the idea of model fusion, five schemes based on waterfall fusion and parallel combination fusion are designed, and the effects of the five schemes are verified by experiments. It is found that the designed extraction technology has a good effect on network news data extraction. News keyword extraction has a great application prospect, which can provide the basis for the research fields of news key phrases, news abstracts, and so on.
format Article
id doaj-art-c6b7f6cfb9a24ef8b49e70b03deb1802
institution OA Journals
issn 1076-2787
1099-0526
language English
publishDate 2021-01-01
publisher Wiley
record_format Article
series Complexity
spelling doaj-art-c6b7f6cfb9a24ef8b49e70b03deb18022025-08-20T02:19:41ZengWileyComplexity1076-27871099-05262021-01-01202110.1155/2021/55294475529447Web News Data Extraction Technology Based on Text KeywordsKun Zhang0School of Communication, Xi’an Peihua University, Xi’an City, ChinaIn order to shorten the time for users to query news on the Internet, this paper studies and designs a network news data extraction technology, which can obtain the main news information through the extraction of news text keywords. Firstly, the TF-IDF keyword extraction algorithm, TextRank keyword extraction algorithm, and LDA keyword extraction algorithm are analyzed to understand the keyword extraction process, and the TF-IDF algorithm is optimized by Zipf’s law. By introducing the idea of model fusion, five schemes based on waterfall fusion and parallel combination fusion are designed, and the effects of the five schemes are verified by experiments. It is found that the designed extraction technology has a good effect on network news data extraction. News keyword extraction has a great application prospect, which can provide the basis for the research fields of news key phrases, news abstracts, and so on.http://dx.doi.org/10.1155/2021/5529447
spellingShingle Kun Zhang
Web News Data Extraction Technology Based on Text Keywords
Complexity
title Web News Data Extraction Technology Based on Text Keywords
title_full Web News Data Extraction Technology Based on Text Keywords
title_fullStr Web News Data Extraction Technology Based on Text Keywords
title_full_unstemmed Web News Data Extraction Technology Based on Text Keywords
title_short Web News Data Extraction Technology Based on Text Keywords
title_sort web news data extraction technology based on text keywords
url http://dx.doi.org/10.1155/2021/5529447
work_keys_str_mv AT kunzhang webnewsdataextractiontechnologybasedontextkeywords