Design and Implementation of File Deduplication Framework on HDFS

File systems are designed to control how files are stored and retrieved. Without knowing the context and semantics of file contents, file systems often contain duplicate copies and result in redundant consumptions of storage space and network bandwidth. It has been a complex and challenging issue fo...

Full description

Saved in:
Bibliographic Details
Main Authors: Ruey-Kai Sheu, Shyan-Ming Yuan, Win-Tsung Lo, Chan-I Ku
Format: Article
Language:English
Published: Wiley 2014-04-01
Series:International Journal of Distributed Sensor Networks
Online Access:https://doi.org/10.1155/2014/561340
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850222184492957696
author Ruey-Kai Sheu
Shyan-Ming Yuan
Win-Tsung Lo
Chan-I Ku
author_facet Ruey-Kai Sheu
Shyan-Ming Yuan
Win-Tsung Lo
Chan-I Ku
author_sort Ruey-Kai Sheu
collection DOAJ
description File systems are designed to control how files are stored and retrieved. Without knowing the context and semantics of file contents, file systems often contain duplicate copies and result in redundant consumptions of storage space and network bandwidth. It has been a complex and challenging issue for enterprises to seek deduplication technologies to reduce cost and increase the storage efficiency. To solve such problem, researchers proposed in-line or offline solutions for primary storages or backup systems at the subfile or whole-file level. Some of the technologies are used for file servers and database systems. Fewer studies focus on the cloud file system deduplication technologies at the application level, especially for the Hadoop distributed file system. It is the goal of this paper to design a file deduplication framework on Hadoop distributed file system for cloud application developers. The architecture, interface, and implementation experiences are also shared in this paper.
format Article
id doaj-art-724a93027ad74c0fa99e79dc6c0d01a6
institution OA Journals
issn 1550-1477
language English
publishDate 2014-04-01
publisher Wiley
record_format Article
series International Journal of Distributed Sensor Networks
spelling doaj-art-724a93027ad74c0fa99e79dc6c0d01a62025-08-20T02:06:27ZengWileyInternational Journal of Distributed Sensor Networks1550-14772014-04-011010.1155/2014/561340561340Design and Implementation of File Deduplication Framework on HDFSRuey-Kai Sheu0Shyan-Ming Yuan1Win-Tsung Lo2Chan-I Ku3 Department of Computer Science, Tung Hai University, Taichung, Taiwan Institute of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan Department of Computer Science, Tung Hai University, Taichung, Taiwan Computational Intelligence Technology Center, Industrial Technology Research Institute, Hsinchu, TaiwanFile systems are designed to control how files are stored and retrieved. Without knowing the context and semantics of file contents, file systems often contain duplicate copies and result in redundant consumptions of storage space and network bandwidth. It has been a complex and challenging issue for enterprises to seek deduplication technologies to reduce cost and increase the storage efficiency. To solve such problem, researchers proposed in-line or offline solutions for primary storages or backup systems at the subfile or whole-file level. Some of the technologies are used for file servers and database systems. Fewer studies focus on the cloud file system deduplication technologies at the application level, especially for the Hadoop distributed file system. It is the goal of this paper to design a file deduplication framework on Hadoop distributed file system for cloud application developers. The architecture, interface, and implementation experiences are also shared in this paper.https://doi.org/10.1155/2014/561340
spellingShingle Ruey-Kai Sheu
Shyan-Ming Yuan
Win-Tsung Lo
Chan-I Ku
Design and Implementation of File Deduplication Framework on HDFS
International Journal of Distributed Sensor Networks
title Design and Implementation of File Deduplication Framework on HDFS
title_full Design and Implementation of File Deduplication Framework on HDFS
title_fullStr Design and Implementation of File Deduplication Framework on HDFS
title_full_unstemmed Design and Implementation of File Deduplication Framework on HDFS
title_short Design and Implementation of File Deduplication Framework on HDFS
title_sort design and implementation of file deduplication framework on hdfs
url https://doi.org/10.1155/2014/561340
work_keys_str_mv AT rueykaisheu designandimplementationoffilededuplicationframeworkonhdfs
AT shyanmingyuan designandimplementationoffilededuplicationframeworkonhdfs
AT wintsunglo designandimplementationoffilededuplicationframeworkonhdfs
AT chaniku designandimplementationoffilededuplicationframeworkonhdfs