无损压缩
栏(排版)
表(数据库)
分类
计算机科学
数据压缩
算法
排列(音乐)
压缩(物理)
分类
数据压缩比
有损压缩
并行计算
作者
Xizhe CHENG,Sian-Jheng LIN,Jie SUN
标识
DOI:10.1109/dcc52660.2022.00046
摘要
A universal scheme is proposed for the lossless compression of two-dimensional tables and matrices. Instead of standard row- or column-based compression, we propose to sort each column first and record both the sorted table and the corresponding permutation table of the sorting permutations. These two tables are then separately compressed. In this new scheme, both intra- and inter-column correlations can be efficiently captured, giving rise to improved compression ratio in particular when both column-wise and row-wise dependencies cooccur. This scheme reduces the problem of the compression of an arbitrary two-dimensional table to that of a ‘permutation table’ together with a ‘sorted table’, where the former is only dependent on the table dimension and the latter can be effectively compressed column-by-column using predictive methods. Based on this scheme, a new algorithm is proposed, SortComp (sort-and-compress). For correlated columns, we give an estimation of the asymptotic bit rate of the algorithm and compare it to column-oriented compression schemes. Numerical experiments on real-life csv datasets validate the advantages of SortComp compared to existing row- and column-oriented compression algorithms.
科研通智能强力驱动
Strongly Powered by AbleSci AI