计算机科学
栏(排版)
云计算
工作量
像素
延迟(音频)
调度(生产过程)
数据库
云存储
实时计算
分布式计算
操作系统
计算机网络
工程类
人工智能
电信
运营管理
帧(网络)
作者
Haoqiong Bian,Anastasia Ailamaki
标识
DOI:10.1109/icde53745.2022.00276
摘要
To benefit from the cloud's higher elasticity and price-efficiency, most modern data-lake engines support S3-like cloud object storage (COS) services as their optional or preferred underlying storage. Meanwhile, the widespread column stores, such as Parquet, are applied in these data lakes to improve analytical performance. However, as these column stores were designed for on-premise HDFS, they often suffer from the high latency of COS and deliver sub-optimal query performance. We observe that by optimizing the storage layout and data access pattern, we can effectively hide and mitigate the high latency. In this paper, we present Pixels, a column store optimized for the cloud that solves the problem by (1) the workload-driven storage layout optimization within and across the row group boundaries; (2) the I/O scheduling concerning the optimized storage layout and the performance characteristics of COS. They collectively improve the analytical performance in a transparent way that does not affect data ingestion and query execution in data lakes. Evaluations show that Pixels outperforms the state-of-the-art column store on COS by more than one order of magnitude on real-world workload and by 1.93x on TPC-H. Moreover, the performance of Pixels is also portable to HDFS.
科研通智能强力驱动
Strongly Powered by AbleSci AI