A Parallel ETL Tool Based on an Improved Chain-MapReduce Framework

The related work in parallel ETL and common methods to deal with multiple MapReduce jobs were introduced. Then an improved chain-MapReduce framework was presented, based on this framework,a parallel ETL tool was designed. Several optimization rules on ETL which will make the ETL process generate les...

Full description

Saved in:
Bibliographic Details
Main Authors: Bin Wu, Xinguang Liu
Format: Article
Language:zho
Published: Beijing Xintong Media Co., Ltd 2013-12-01
Series:Dianxin kexue
Subjects:
Online Access:http://www.telecomsci.com/zh/article/doi/10.3969/j.issn.1000-0801.2013.12.001/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The related work in parallel ETL and common methods to deal with multiple MapReduce jobs were introduced. Then an improved chain-MapReduce framework was presented, based on this framework,a parallel ETL tool was designed. Several optimization rules on ETL which will make the ETL process generate less MapReduce jobs to avoid unnecessary I/O and network cost were presented. The ETL tool on real queries and real big datasets were evaluated. Compared with Hive, the tool reduces time on average by 10% to 20%.
ISSN:1000-0801