重复数据消除
计算机科学
元数据
日志文件系统
操作系统
德拉姆
块(置换群论)
数据库
计算机文件
计算机硬件
数学
几何学
作者
Chunlin Song,Xianzhang Chen,Duo Liu,Jiali Li,Yujuan Tan,Ao Ren
标识
DOI:10.1109/tcad.2023.3347305
摘要
Block-level data deduplication is a widely-used technology for saving storage space by filtering the data blocks with the same hash value. However, existing block-level data deduplication approaches either ignore the data consistency of deduplication or suffer severe performance degradation for providing consistency guarantees. In this paper, we propose Consistency-Aware Deduplication (CADedup+) to achieve high-performance block-level data deduplication with data consistency. The main idea of CADedup+ is to achieve an efficient journaling mechanism for deduplication by taking advantage of persistent memory (PM), such as byte-addressability and near-DRAM access latency. To balance the trade-offs between performance and consistency requirements in data deduplication, we carefully design three modes of journaling mechanism, i.e., writeback mode, ordered mode, and journal mode, for CADedup+. We properly place the deduplication metadata of CADedup+ onto the DRAM-PM hybrid memory architecture to minimize PM costs according to the features of metadata updates. The deduplication metadata on PM is managed by a set of metadata transactions and updated with the help of the efficient hardware atomic operations provided by CPU. We implement CADedup+ in the generic block layer in Linux kernel 4.9.0. We conduct extensive experiments on Intel Optane PMEM to evaluate CADedup+ with typical benchmarks. Experimental results show that CADedup+ can reduce 63%-70% write volume and 50%-60% I/O latency over Dmdedup, a widely-used open-source block-level data deduplication system, while ensuring deduplication consistency.
科研通智能强力驱动
Strongly Powered by AbleSci AI