计算机科学
微服务
跟踪(心理语言学)
异常检测
试验台
人工智能
机器学习
数据挖掘
万维网
操作系统
云计算
哲学
语言学
作者
Ping Liu,Haowen Xu,Qianyu Ouyang,Rui Jiao,Zhekang Chen,Shenglin Zhang,Jiahai Yang,Linlin Mo,Jice Zeng,Wenman Xue,Dan Pei
标识
DOI:10.1109/issre5003.2020.00014
摘要
The anomalies of microservice invocation traces (traces) often indicate that the quality of the microservice-based large software service is being impaired. However, timely and accurately detecting trace anomalies is very challenging due to: 1) the large number of underlying microservices, 2) the complex call relationships between them, 3) the interdependency between the response times and invocation paths. Our core idea is to use machine learning to automatically learn the overall normal patterns of traces during periodic offline training. In online anomaly detection, a new trace with a small anomaly score (computed based on the learned normal pattern) is considered anomalous. With our novel trace representation and the design of deep Bayesian networks with posterior flow, our unsupervised anomaly detection system, called TraceAnomaly, can accurately and robustly detect trace anomalies in a unified fashion. TraceAnomaly has been deployed on 18 online services in a company S. Detailed evaluations on four large online services which contain hundreds of microservices and a testbed which contains 41 microservices show that the recall and precision of TraceAnomaly are both above 0.97, outperforming the existing approach in S (hard-coded rule) by 19.6% and 7.1%, and seven other baselines by 57.0% and 41.6% on average.
科研通智能强力驱动
Strongly Powered by AbleSci AI