Information extraction from massive Web pages based on node property and text content

Information extraction from massive Web pages based on node property and text content

To address the problem of extracting valuable information from massive Web pages in big data environments,a novel information extraction method based on node property and text content for massive Web pages was put forward.Web pages were converted into a document object model (DOM) tree,and a pruning...

Full description

Saved in:

Bibliographic Details
Main Authors:	Hai-yan WANG, Pan CAO
Format:	Article
Language:	zho
Published:	Editorial Department of Journal on Communications 2016-10-01
Series:	Tongxin xuebao
Subjects:	Web information extraction MapReduce DOM tree
Online Access:	http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2016190/
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MapReduce based big data framework using associative Kruskal poly Kernel classifier for diabetic disease prediction
by: R. Ramani, et al.
Published: (2025-06-01)

An efficient parallel DCNN algorithm in big data environment
by: Yimin Mao, et al.
Published: (2025-05-01)

A Parallel ETL Tool Based on an Improved Chain-MapReduce Framework
by: Bin Wu, et al.
Published: (2013-12-01)

MP-SPILDL: A Massively Parallel Inductive Logic Learner in Description Logic
by: Eyad Algahtani
Published: (2024-01-01)

CC-MRSJ:Cache Conscious Star Join Algorithm on Hadoop Platform
by: Guoliang Zhou, et al.
Published: (2013-10-01)

Temperature aware energy-efficient task scheduling strategies for mapreduce
by: Bin LIAO, et al.
Published: (2016-01-01)

CC-MRSJ:Cache Conscious Star Join Algorithm on Hadoop Platform
by: Guoliang Zhou, et al.
Published: (2013-10-01)

Temperature aware energy-efficient task scheduling strategies for mapreduce
by: Bin LIAO, et al.
Published: (2016-01-01)

Diffluent Internet Traffic and Characteristics Computation Based on Hadoop
by: Yong Liu, et al.
Published: (2014-12-01)

A distributed high efficiency similarity matrix computation method based on users’ mobile network access location
by: Yuan WANG, et al.
Published: (2018-05-01)

A MapReduce-Based Decision-Making Approach for Multiple Criteria Sorting
by: Xiaoxin Mao, et al.
Published: (2025-04-01)

Continuous Skyline Queries Based on MapReduce
by: Guanmin Shan, et al.
Published: (2014-05-01)

Continuous Skyline Queries Based on MapReduce
by: Guanmin Shan, et al.
Published: (2014-05-01)

MapReduce teaching learning based optimization algorithm for solving CEC-2013 LSGO benchmark Testsuit
by: A.J. Umbarkar, et al.
Published: (2024-12-01)

A Comprehensive Survey of MapReduce Models for Processing Big Data
by: Hemn Barzan Abdalla, et al.
Published: (2025-03-01)

Design and application research on data service platform for big data
by: Yun-feng LIU, et al.
Published: (2013-09-01)

Securely redundant scheduling policy for MapReduce based on dynamic domains partition
by: Qing-ni SHEN, et al.
Published: (2014-01-01)

Securely redundant scheduling policy for MapReduce based on dynamic domains partition
by: Qing-ni SHEN, et al.
Published: (2014-01-01)

Stochastic algorithm for HDFS data theft detection based on MapReduce
by: Yuanzhao GAO, et al.
Published: (2018-10-01)

Research on association analysis between electricity consumption behaviors and weather factors based on mapreduce
by: Yuehua Yang, et al.
Published: (2025-05-01)

Stochastic gradient descent algorithm preserving differential privacy in MapReduce framework
by: Yihan YU, et al.
Published: (2018-01-01)

Research on real-time fusion method of multi-source heterogeneous flight trajectory data stream
by: Zhuxi ZHANG, et al.
Published: (2020-09-01)

An Adaptive Subspace Similarity Search Approach
by: Jianxin Ren, et al.
Published: (2015-07-01)

Hadoop Çatısının Bulut Ortamında Gerçeklenmesi Ve Terabyte Sort Deneyleri
by: G. Ozen, et al.
Published: (2015-05-01)

Title-Based Extraction of News Contents for Text Mining
by: Zhen Tan, et al.
Published: (2018-01-01)

Метод збільшення продуктивності Apache Spark на основі сегментування даних і налаштувань конфігураційних параметрів
by: Serhii Minukhin, et al.
Published: (2024-03-01)

HATAY İLİ ANTAKYA İLÇESİNDE YAŞAYAN DOMLARIN MÜZİK KÜLTÜRLERİ
by: Timur Vural, et al.
Published: (2018-08-01)

Summary of Large-Scale Grapb Partitioning Algoritbms
by: Jinfeng Xu, et al.
Published: (2014-07-01)

Improved method of targeted user interface updates for enhancing the efficiency of web applications based on reactive streams and virtual DOM
by: M.V. Havatiuk, et al.
Published: (2025-07-01)

Problems of long-term preservation of web pages
by: Mitja Dečman
Published: (2011-01-01)

Study of high-speed malicious Web page detection system based on two-step classifier
by: Zheng-qi WANG, et al.
Published: (2017-08-01)

UML Profile to Model Accessible Web Pages
by: Karla Ordonez-Briceno, et al.
Published: (2024-01-01)

Comparison of Hadoop Mapreduce and Apache Spark in Big Data Processing with Hgrid247-DE
by: Firmania Dwi Utami, et al.
Published: (2024-11-01)

Examining the effects of texting, web surfing, and navigating apps on urban driving behavior and crash risk
by: Maria G. Oikonomou, et al.
Published: (2025-03-01)

Mathematical Model and Algorithm for Accurate Main Content Extraction From News Websites
by: Hamza Salem, et al.
Published: (2025-01-01)

WEB CONTENT MINING
by: CLAUDIA ELENA DINUCĂ, et al.
Published: (2012-01-01)

Behandling af betinget dømte alkoholmisbrugere.
by: Hellmut Sørensen, et al.
Published: (1954-06-01)

Towards Efficient Serverless MapReduce Computing on Cloud-Native Platforms
by: Xu Huang, et al.
Published: (2025-05-01)

Understanding the Carbon Footprint of Tile Transfer for Web Maps
by: Guillaume Touya, et al.
Published: (2025-03-01)

Slow task scheduling algorithm based on node identification
by: Yun-fei CUI, et al.
Published: (2014-07-01)