计算机科学
云计算
多租户技术
操作系统
软件
软件即服务
软件开发
作者
Hamidreza Moradi,Wei Wang,Dakai Zhu
标识
DOI:10.1109/tcc.2021.3078690
摘要
Clouds have been adopted widely by many organizations for their supports of flexible resource demands and low cost, which is normally achieved through sharing the underlying hardware among multiple cloud tenants. However, such sharing with the changes in resource contentions in virtual machines (VMs) can result in large variations for the performance of cloud applications, which makes it difficult for ordinary cloud users to estimate the run-time performance of their applications. In this article, we propose online learning methodologies for performance modeling and prediction of applications that run repetitively on multi-tenant clouds (such as on-line data analytic tasks). Here, a few micro-benchmarks are utilized to probe the in-situ perceivable performance of CPU, memory and I/O components of the target VM. Then, based on such profiling information and in-place measured application's performance, the predictive models can be derived with either Regression or Neural-Network techniques. In particular, to address the changes in the intensity of resource contentions of a VM over time and its effects on the target application, we proposed periodic model retraining where the sliding-window technique was exploited to control the frequency and historical data used for model retraining. Moreover, a progressive modeling approach has been devised where the Regression and Neural-Network models are gradually updated for better adaptation to recent changes in resource contention. With 17 representative applications from PARSEC, NAS Parallel and CloudSuite benchmarks being considered, we have extensively evaluated the proposed online schemes for the prediction accuracy of the resulting models and associated overheads on both a private and public clouds. The evaluation results show that, even on the private cloud with high and radically changed resource contention, the average prediction errors of the considered models can be less than 20 percent with periodic retraining. The prediction errors generally decrease with higher retraining frequencies and more historical data points but incurring higher run-time overheads. Furthermore, with the neural-network progressive models, the average prediction errors can be reduced by about 7 percent with much reduced run-time overheads (up to 265 X ) on the private cloud. For public clouds with less resource contentions, the average prediction errors can be less than 4 percent for the considered models with our proposed online schemes.
科研通智能强力驱动
Strongly Powered by AbleSci AI