Research and design of distributed high-performance network reptiles based on cloud platform

With the arrival of large data age,data has become the most valuable resource.And web crawler technology as an important means of external data collection,has become a standard tool for data analysis.A high-performance,convenient cloud-based crawler architecture design was introduced.The overall str...

Full description

Saved in:
Bibliographic Details
Main Authors: Enming SHI, Xiaojun XIAO, Yu LU
Format: Article
Language:zho
Published: Beijing Xintong Media Co., Ltd 2017-08-01
Series:Dianxin kexue
Subjects:
Online Access:http://www.telecomsci.com/zh/article/doi/10.11959/j.issn.1000-0801.2017234/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1841530177121681408
author Enming SHI
Xiaojun XIAO
Yu LU
author_facet Enming SHI
Xiaojun XIAO
Yu LU
author_sort Enming SHI
collection DOAJ
description With the arrival of large data age,data has become the most valuable resource.And web crawler technology as an important means of external data collection,has become a standard tool for data analysis.A high-performance,convenient cloud-based crawler architecture design was introduced.The overall structure of the reptile to the distributed design and the design of the sub-module was described in detail.Each module of the crawler was encapsulated in Docker,and Kubernetes was used as the resource scheduling and management of the cluster.In the performance of optimization,the MD5 reset tree algorithm,DNS optimization and asynchronous I/O were adopted.Experimental results show that the performance of crawler has obvious advantages compared with the UN optimized scheme.
format Article
id doaj-art-eeae6bd5e8cf4f6dbe22882ee5079412
institution Kabale University
issn 1000-0801
language zho
publishDate 2017-08-01
publisher Beijing Xintong Media Co., Ltd
record_format Article
series Dianxin kexue
spelling doaj-art-eeae6bd5e8cf4f6dbe22882ee50794122025-01-15T03:12:24ZzhoBeijing Xintong Media Co., LtdDianxin kexue1000-08012017-08-013318018659601035Research and design of distributed high-performance network reptiles based on cloud platformEnming SHIXiaojun XIAOYu LUWith the arrival of large data age,data has become the most valuable resource.And web crawler technology as an important means of external data collection,has become a standard tool for data analysis.A high-performance,convenient cloud-based crawler architecture design was introduced.The overall structure of the reptile to the distributed design and the design of the sub-module was described in detail.Each module of the crawler was encapsulated in Docker,and Kubernetes was used as the resource scheduling and management of the cluster.In the performance of optimization,the MD5 reset tree algorithm,DNS optimization and asynchronous I/O were adopted.Experimental results show that the performance of crawler has obvious advantages compared with the UN optimized scheme.http://www.telecomsci.com/zh/article/doi/10.11959/j.issn.1000-0801.2017234/distributed system architectureweb crawlerDockerhigh-performance
spellingShingle Enming SHI
Xiaojun XIAO
Yu LU
Research and design of distributed high-performance network reptiles based on cloud platform
Dianxin kexue
distributed system architecture
web crawler
Docker
high-performance
title Research and design of distributed high-performance network reptiles based on cloud platform
title_full Research and design of distributed high-performance network reptiles based on cloud platform
title_fullStr Research and design of distributed high-performance network reptiles based on cloud platform
title_full_unstemmed Research and design of distributed high-performance network reptiles based on cloud platform
title_short Research and design of distributed high-performance network reptiles based on cloud platform
title_sort research and design of distributed high performance network reptiles based on cloud platform
topic distributed system architecture
web crawler
Docker
high-performance
url http://www.telecomsci.com/zh/article/doi/10.11959/j.issn.1000-0801.2017234/
work_keys_str_mv AT enmingshi researchanddesignofdistributedhighperformancenetworkreptilesbasedoncloudplatform
AT xiaojunxiao researchanddesignofdistributedhighperformancenetworkreptilesbasedoncloudplatform
AT yulu researchanddesignofdistributedhighperformancenetworkreptilesbasedoncloudplatform