Hybrid deduplication system with content-based cache for cloud environment
Primary storage deduplication systems are performance sensitive. Their performance depends upon two factors — metadata access for duplicate detection and strategy for elimination of duplicate data. Various approaches for duplicate detection through suitable caching mechanisms have been proposed in t...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2024-06-01
|
| Series: | Journal of King Saud University: Computer and Information Sciences |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S1319157824001198 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849307056682565632 |
|---|---|
| author | Amdewar Godavari Chapram Sudhakar T. Ramesh |
| author_facet | Amdewar Godavari Chapram Sudhakar T. Ramesh |
| author_sort | Amdewar Godavari |
| collection | DOAJ |
| description | Primary storage deduplication systems are performance sensitive. Their performance depends upon two factors — metadata access for duplicate detection and strategy for elimination of duplicate data. Various approaches for duplicate detection through suitable caching mechanisms have been proposed in the literature. Most of the approaches assumed that the primary workloads exhibit strong temporal locality. Whereas, this cannot be assumed in the context of Cloud as the workloads locality does not exist with interferences among different workloads on the same system. Duplicate content among the data blocks with different addresses lead to an inefficient utilization of the data cache. In this context, applying deduplication causes sharing of the data blocks among the clients with different access patterns and frequencies. In this situation, LRU cache, which considers only the recency of the references, is not appropriate. In this paper, Hybrid Deduplication System (HDS) containing the content-based cache with a new replacement policy — Modified Adaptive Replacement Cache (ARC), is proposed. The proposed system is simulated in the Linux environment using three different types of FIU traces. Effectiveness of the system is compared with a full deduplication system. Experimental results show that the system has performed consistently better than the full deduplication system in reducing the metadata overhead for all of the three data sets. |
| format | Article |
| id | doaj-art-20264cfe13264f208a1167693ac1192c |
| institution | Kabale University |
| issn | 1319-1578 |
| language | English |
| publishDate | 2024-06-01 |
| publisher | Springer |
| record_format | Article |
| series | Journal of King Saud University: Computer and Information Sciences |
| spelling | doaj-art-20264cfe13264f208a1167693ac1192c2025-08-20T03:54:52ZengSpringerJournal of King Saud University: Computer and Information Sciences1319-15782024-06-0136510203010.1016/j.jksuci.2024.102030Hybrid deduplication system with content-based cache for cloud environmentAmdewar Godavari0Chapram Sudhakar1T. Ramesh2Department of Computer Science and Engineering (Networks), Kakatiya Institute of Technology & Science Warangal, India; Corresponding author.Department of Computer Science and Engineering, National Institute of Technology Warangal, 506004, IndiaDepartment of Computer Science and Engineering, National Institute of Technology Warangal, 506004, IndiaPrimary storage deduplication systems are performance sensitive. Their performance depends upon two factors — metadata access for duplicate detection and strategy for elimination of duplicate data. Various approaches for duplicate detection through suitable caching mechanisms have been proposed in the literature. Most of the approaches assumed that the primary workloads exhibit strong temporal locality. Whereas, this cannot be assumed in the context of Cloud as the workloads locality does not exist with interferences among different workloads on the same system. Duplicate content among the data blocks with different addresses lead to an inefficient utilization of the data cache. In this context, applying deduplication causes sharing of the data blocks among the clients with different access patterns and frequencies. In this situation, LRU cache, which considers only the recency of the references, is not appropriate. In this paper, Hybrid Deduplication System (HDS) containing the content-based cache with a new replacement policy — Modified Adaptive Replacement Cache (ARC), is proposed. The proposed system is simulated in the Linux environment using three different types of FIU traces. Effectiveness of the system is compared with a full deduplication system. Experimental results show that the system has performed consistently better than the full deduplication system in reducing the metadata overhead for all of the three data sets.http://www.sciencedirect.com/science/article/pii/S1319157824001198DeduplicationContent based cacheDisk bottleneck |
| spellingShingle | Amdewar Godavari Chapram Sudhakar T. Ramesh Hybrid deduplication system with content-based cache for cloud environment Journal of King Saud University: Computer and Information Sciences Deduplication Content based cache Disk bottleneck |
| title | Hybrid deduplication system with content-based cache for cloud environment |
| title_full | Hybrid deduplication system with content-based cache for cloud environment |
| title_fullStr | Hybrid deduplication system with content-based cache for cloud environment |
| title_full_unstemmed | Hybrid deduplication system with content-based cache for cloud environment |
| title_short | Hybrid deduplication system with content-based cache for cloud environment |
| title_sort | hybrid deduplication system with content based cache for cloud environment |
| topic | Deduplication Content based cache Disk bottleneck |
| url | http://www.sciencedirect.com/science/article/pii/S1319157824001198 |
| work_keys_str_mv | AT amdewargodavari hybriddeduplicationsystemwithcontentbasedcacheforcloudenvironment AT chapramsudhakar hybriddeduplicationsystemwithcontentbasedcacheforcloudenvironment AT tramesh hybriddeduplicationsystemwithcontentbasedcacheforcloudenvironment |