A Parallel ETL Tool Based on an Improved Chain-MapReduce Framework
The related work in parallel ETL and common methods to deal with multiple MapReduce jobs were introduced. Then an improved chain-MapReduce framework was presented, based on this framework,a parallel ETL tool was designed. Several optimization rules on ETL which will make the ETL process generate les...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | zho |
Published: |
Beijing Xintong Media Co., Ltd
2013-12-01
|
Series: | Dianxin kexue |
Subjects: | |
Online Access: | http://www.telecomsci.com/zh/article/doi/10.3969/j.issn.1000-0801.2013.12.001/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The related work in parallel ETL and common methods to deal with multiple MapReduce jobs were introduced. Then an improved chain-MapReduce framework was presented, based on this framework,a parallel ETL tool was designed. Several optimization rules on ETL which will make the ETL process generate less MapReduce jobs to avoid unnecessary I/O and network cost were presented. The ETL tool on real queries and real big datasets were evaluated. Compared with Hive, the tool reduces time on average by 10% to 20%. |
---|---|
ISSN: | 1000-0801 |