Hybrid deduplication system with content-based cache for cloud environment

Primary storage deduplication systems are performance sensitive. Their performance depends upon two factors — metadata access for duplicate detection and strategy for elimination of duplicate data. Various approaches for duplicate detection through suitable caching mechanisms have been proposed in t...

Full description

Saved in:
Bibliographic Details
Main Authors: Amdewar Godavari, Chapram Sudhakar, T. Ramesh
Format: Article
Language:English
Published: Springer 2024-06-01
Series:Journal of King Saud University: Computer and Information Sciences
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1319157824001198
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849307056682565632
author Amdewar Godavari
Chapram Sudhakar
T. Ramesh
author_facet Amdewar Godavari
Chapram Sudhakar
T. Ramesh
author_sort Amdewar Godavari
collection DOAJ
description Primary storage deduplication systems are performance sensitive. Their performance depends upon two factors — metadata access for duplicate detection and strategy for elimination of duplicate data. Various approaches for duplicate detection through suitable caching mechanisms have been proposed in the literature. Most of the approaches assumed that the primary workloads exhibit strong temporal locality. Whereas, this cannot be assumed in the context of Cloud as the workloads locality does not exist with interferences among different workloads on the same system. Duplicate content among the data blocks with different addresses lead to an inefficient utilization of the data cache. In this context, applying deduplication causes sharing of the data blocks among the clients with different access patterns and frequencies. In this situation, LRU cache, which considers only the recency of the references, is not appropriate. In this paper, Hybrid Deduplication System (HDS) containing the content-based cache with a new replacement policy — Modified Adaptive Replacement Cache (ARC), is proposed. The proposed system is simulated in the Linux environment using three different types of FIU traces. Effectiveness of the system is compared with a full deduplication system. Experimental results show that the system has performed consistently better than the full deduplication system in reducing the metadata overhead for all of the three data sets.
format Article
id doaj-art-20264cfe13264f208a1167693ac1192c
institution Kabale University
issn 1319-1578
language English
publishDate 2024-06-01
publisher Springer
record_format Article
series Journal of King Saud University: Computer and Information Sciences
spelling doaj-art-20264cfe13264f208a1167693ac1192c2025-08-20T03:54:52ZengSpringerJournal of King Saud University: Computer and Information Sciences1319-15782024-06-0136510203010.1016/j.jksuci.2024.102030Hybrid deduplication system with content-based cache for cloud environmentAmdewar Godavari0Chapram Sudhakar1T. Ramesh2Department of Computer Science and Engineering (Networks), Kakatiya Institute of Technology & Science Warangal, India; Corresponding author.Department of Computer Science and Engineering, National Institute of Technology Warangal, 506004, IndiaDepartment of Computer Science and Engineering, National Institute of Technology Warangal, 506004, IndiaPrimary storage deduplication systems are performance sensitive. Their performance depends upon two factors — metadata access for duplicate detection and strategy for elimination of duplicate data. Various approaches for duplicate detection through suitable caching mechanisms have been proposed in the literature. Most of the approaches assumed that the primary workloads exhibit strong temporal locality. Whereas, this cannot be assumed in the context of Cloud as the workloads locality does not exist with interferences among different workloads on the same system. Duplicate content among the data blocks with different addresses lead to an inefficient utilization of the data cache. In this context, applying deduplication causes sharing of the data blocks among the clients with different access patterns and frequencies. In this situation, LRU cache, which considers only the recency of the references, is not appropriate. In this paper, Hybrid Deduplication System (HDS) containing the content-based cache with a new replacement policy — Modified Adaptive Replacement Cache (ARC), is proposed. The proposed system is simulated in the Linux environment using three different types of FIU traces. Effectiveness of the system is compared with a full deduplication system. Experimental results show that the system has performed consistently better than the full deduplication system in reducing the metadata overhead for all of the three data sets.http://www.sciencedirect.com/science/article/pii/S1319157824001198DeduplicationContent based cacheDisk bottleneck
spellingShingle Amdewar Godavari
Chapram Sudhakar
T. Ramesh
Hybrid deduplication system with content-based cache for cloud environment
Journal of King Saud University: Computer and Information Sciences
Deduplication
Content based cache
Disk bottleneck
title Hybrid deduplication system with content-based cache for cloud environment
title_full Hybrid deduplication system with content-based cache for cloud environment
title_fullStr Hybrid deduplication system with content-based cache for cloud environment
title_full_unstemmed Hybrid deduplication system with content-based cache for cloud environment
title_short Hybrid deduplication system with content-based cache for cloud environment
title_sort hybrid deduplication system with content based cache for cloud environment
topic Deduplication
Content based cache
Disk bottleneck
url http://www.sciencedirect.com/science/article/pii/S1319157824001198
work_keys_str_mv AT amdewargodavari hybriddeduplicationsystemwithcontentbasedcacheforcloudenvironment
AT chapramsudhakar hybriddeduplicationsystemwithcontentbasedcacheforcloudenvironment
AT tramesh hybriddeduplicationsystemwithcontentbasedcacheforcloudenvironment