有损压缩
计算机科学
压缩比
哈夫曼编码
编码器
数据压缩
编码(内存)
气体压缩机
体积热力学
无损压缩
压缩(物理)
系列(地层学)
数据库
算法
数据挖掘
计算机硬件
实时计算
人工智能
操作系统
工程类
古生物学
机械工程
物理
材料科学
量子力学
汽车工程
复合材料
生物
内燃机
作者
Yang Shi,Xiangyu Zou,Xinyu Chen,Sian Jin,Dingwen Tao,Cai Deng,Yu‐Fan Chen,Wen Xia
标识
DOI:10.1109/dcc58796.2024.00061
摘要
As time series data become popular, their volume increases rapidly. Time series databases are designed for such data, and they process data in short slices, meaning that the compression units for compressors are small. How to compress the short slices of floating-points while reserving a high compression ratio and a high decompression speed remains a problem.To solve the problem, we propose a lossy compressor Machete. It uses an efficient hybrid encoder of Huffman encoding and variable length quantity (VLQ). Adaptive encoding selection makes it excel on short-slice data compression ratio, while the simple framework ensures fast decompression. We also find a limitation in VLQ and propose the optimal VLQ to further improve the compression ratio.Our evaluation on four real-world datasets shows that Machete outperforms state-of-the-art compressors by 32%−80% on compression ratio and achieves the fastest decompression speed on two datasets. When applied to a well-known time series database InfluxDB, Machete saves disk usage up to 79% and improves the query performance of the InfluxDB database by saving I/O.
科研通智能强力驱动
Strongly Powered by AbleSci AI