Research and design of distributed high-performance network reptiles based on cloud platform
With the arrival of large data age,data has become the most valuable resource.And web crawler technology as an important means of external data collection,has become a standard tool for data analysis.A high-performance,convenient cloud-based crawler architecture design was introduced.The overall str...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | zho |
Published: |
Beijing Xintong Media Co., Ltd
2017-08-01
|
Series: | Dianxin kexue |
Subjects: | |
Online Access: | http://www.telecomsci.com/zh/article/doi/10.11959/j.issn.1000-0801.2017234/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841530177121681408 |
---|---|
author | Enming SHI Xiaojun XIAO Yu LU |
author_facet | Enming SHI Xiaojun XIAO Yu LU |
author_sort | Enming SHI |
collection | DOAJ |
description | With the arrival of large data age,data has become the most valuable resource.And web crawler technology as an important means of external data collection,has become a standard tool for data analysis.A high-performance,convenient cloud-based crawler architecture design was introduced.The overall structure of the reptile to the distributed design and the design of the sub-module was described in detail.Each module of the crawler was encapsulated in Docker,and Kubernetes was used as the resource scheduling and management of the cluster.In the performance of optimization,the MD5 reset tree algorithm,DNS optimization and asynchronous I/O were adopted.Experimental results show that the performance of crawler has obvious advantages compared with the UN optimized scheme. |
format | Article |
id | doaj-art-eeae6bd5e8cf4f6dbe22882ee5079412 |
institution | Kabale University |
issn | 1000-0801 |
language | zho |
publishDate | 2017-08-01 |
publisher | Beijing Xintong Media Co., Ltd |
record_format | Article |
series | Dianxin kexue |
spelling | doaj-art-eeae6bd5e8cf4f6dbe22882ee50794122025-01-15T03:12:24ZzhoBeijing Xintong Media Co., LtdDianxin kexue1000-08012017-08-013318018659601035Research and design of distributed high-performance network reptiles based on cloud platformEnming SHIXiaojun XIAOYu LUWith the arrival of large data age,data has become the most valuable resource.And web crawler technology as an important means of external data collection,has become a standard tool for data analysis.A high-performance,convenient cloud-based crawler architecture design was introduced.The overall structure of the reptile to the distributed design and the design of the sub-module was described in detail.Each module of the crawler was encapsulated in Docker,and Kubernetes was used as the resource scheduling and management of the cluster.In the performance of optimization,the MD5 reset tree algorithm,DNS optimization and asynchronous I/O were adopted.Experimental results show that the performance of crawler has obvious advantages compared with the UN optimized scheme.http://www.telecomsci.com/zh/article/doi/10.11959/j.issn.1000-0801.2017234/distributed system architectureweb crawlerDockerhigh-performance |
spellingShingle | Enming SHI Xiaojun XIAO Yu LU Research and design of distributed high-performance network reptiles based on cloud platform Dianxin kexue distributed system architecture web crawler Docker high-performance |
title | Research and design of distributed high-performance network reptiles based on cloud platform |
title_full | Research and design of distributed high-performance network reptiles based on cloud platform |
title_fullStr | Research and design of distributed high-performance network reptiles based on cloud platform |
title_full_unstemmed | Research and design of distributed high-performance network reptiles based on cloud platform |
title_short | Research and design of distributed high-performance network reptiles based on cloud platform |
title_sort | research and design of distributed high performance network reptiles based on cloud platform |
topic | distributed system architecture web crawler Docker high-performance |
url | http://www.telecomsci.com/zh/article/doi/10.11959/j.issn.1000-0801.2017234/ |
work_keys_str_mv | AT enmingshi researchanddesignofdistributedhighperformancenetworkreptilesbasedoncloudplatform AT xiaojunxiao researchanddesignofdistributedhighperformancenetworkreptilesbasedoncloudplatform AT yulu researchanddesignofdistributedhighperformancenetworkreptilesbasedoncloudplatform |