Hybrid deduplication system with content-based cache for cloud environment

Primary storage deduplication systems are performance sensitive. Their performance depends upon two factors — metadata access for duplicate detection and strategy for elimination of duplicate data. Various approaches for duplicate detection through suitable caching mechanisms have been proposed in t...

Full description

Saved in:

Bibliographic Details
Main Authors:	Amdewar Godavari, Chapram Sudhakar, T. Ramesh
Format:	Article
Language:	English
Published:	Springer 2024-06-01
Series:	Journal of King Saud University: Computer and Information Sciences
Subjects:	Deduplication Content based cache Disk bottleneck
Online Access:	http://www.sciencedirect.com/science/article/pii/S1319157824001198
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849307056682565632
author	Amdewar Godavari Chapram Sudhakar T. Ramesh
author_facet	Amdewar Godavari Chapram Sudhakar T. Ramesh
author_sort	Amdewar Godavari
collection	DOAJ
description	Primary storage deduplication systems are performance sensitive. Their performance depends upon two factors — metadata access for duplicate detection and strategy for elimination of duplicate data. Various approaches for duplicate detection through suitable caching mechanisms have been proposed in the literature. Most of the approaches assumed that the primary workloads exhibit strong temporal locality. Whereas, this cannot be assumed in the context of Cloud as the workloads locality does not exist with interferences among different workloads on the same system. Duplicate content among the data blocks with different addresses lead to an inefficient utilization of the data cache. In this context, applying deduplication causes sharing of the data blocks among the clients with different access patterns and frequencies. In this situation, LRU cache, which considers only the recency of the references, is not appropriate. In this paper, Hybrid Deduplication System (HDS) containing the content-based cache with a new replacement policy — Modified Adaptive Replacement Cache (ARC), is proposed. The proposed system is simulated in the Linux environment using three different types of FIU traces. Effectiveness of the system is compared with a full deduplication system. Experimental results show that the system has performed consistently better than the full deduplication system in reducing the metadata overhead for all of the three data sets.
format	Article
id	doaj-art-20264cfe13264f208a1167693ac1192c
institution	Kabale University
issn	1319-1578
language	English
publishDate	2024-06-01
publisher	Springer
record_format	Article
series	Journal of King Saud University: Computer and Information Sciences
spelling	doaj-art-20264cfe13264f208a1167693ac1192c2025-08-20T03:54:52ZengSpringerJournal of King Saud University: Computer and Information Sciences1319-15782024-06-0136510203010.1016/j.jksuci.2024.102030Hybrid deduplication system with content-based cache for cloud environmentAmdewar Godavari0Chapram Sudhakar1T. Ramesh2Department of Computer Science and Engineering (Networks), Kakatiya Institute of Technology & Science Warangal, India; Corresponding author.Department of Computer Science and Engineering, National Institute of Technology Warangal, 506004, IndiaDepartment of Computer Science and Engineering, National Institute of Technology Warangal, 506004, IndiaPrimary storage deduplication systems are performance sensitive. Their performance depends upon two factors — metadata access for duplicate detection and strategy for elimination of duplicate data. Various approaches for duplicate detection through suitable caching mechanisms have been proposed in the literature. Most of the approaches assumed that the primary workloads exhibit strong temporal locality. Whereas, this cannot be assumed in the context of Cloud as the workloads locality does not exist with interferences among different workloads on the same system. Duplicate content among the data blocks with different addresses lead to an inefficient utilization of the data cache. In this context, applying deduplication causes sharing of the data blocks among the clients with different access patterns and frequencies. In this situation, LRU cache, which considers only the recency of the references, is not appropriate. In this paper, Hybrid Deduplication System (HDS) containing the content-based cache with a new replacement policy — Modified Adaptive Replacement Cache (ARC), is proposed. The proposed system is simulated in the Linux environment using three different types of FIU traces. Effectiveness of the system is compared with a full deduplication system. Experimental results show that the system has performed consistently better than the full deduplication system in reducing the metadata overhead for all of the three data sets.http://www.sciencedirect.com/science/article/pii/S1319157824001198DeduplicationContent based cacheDisk bottleneck
spellingShingle	Amdewar Godavari Chapram Sudhakar T. Ramesh Hybrid deduplication system with content-based cache for cloud environment Journal of King Saud University: Computer and Information Sciences Deduplication Content based cache Disk bottleneck
title	Hybrid deduplication system with content-based cache for cloud environment
title_full	Hybrid deduplication system with content-based cache for cloud environment
title_fullStr	Hybrid deduplication system with content-based cache for cloud environment
title_full_unstemmed	Hybrid deduplication system with content-based cache for cloud environment
title_short	Hybrid deduplication system with content-based cache for cloud environment
title_sort	hybrid deduplication system with content based cache for cloud environment
topic	Deduplication Content based cache Disk bottleneck
url	http://www.sciencedirect.com/science/article/pii/S1319157824001198
work_keys_str_mv	AT amdewargodavari hybriddeduplicationsystemwithcontentbasedcacheforcloudenvironment AT chapramsudhakar hybriddeduplicationsystemwithcontentbasedcacheforcloudenvironment AT tramesh hybriddeduplicationsystemwithcontentbasedcacheforcloudenvironment

Hybrid deduplication system with content-based cache for cloud environment

Similar Items