计算机科学
背景(考古学)
原始数据
比例(比率)
数据收集
数据科学
信息系统
数据挖掘
工程类
程序设计语言
量子力学
生物
统计
电气工程
物理
古生物学
数学
作者
Adam J. Oliner,Jon Stearley
出处
期刊:Dependable Systems and Networks
日期:2007-06-01
卷期号:: 575-584
被引量:565
摘要
If we hope to automatically detect and diagnose failures in large-scale computer systems, we must study real deployed systems and the data they generate. Progress has been hampered by the inaccessibility of empirical data. This paper addresses that dearth by examining system logs from five supercomputers, with the aim of providing useful insight and direction for future research into the use of such logs. We present details about the systems, methods of log collection, and how alerts were identified; propose a simpler and more effective filtering algorithm; and define operational context to encompass the crucial information that we found to be currently missing from most logs. The machines we consider (and the number of processors) are: Blue Gene/L (131072), Red Storm (10880), Thunderbird (9024), Spirit (1028), and Liberty (512). This is the first study of raw system logs from multiple supercomputers.
科研通智能强力驱动
Strongly Powered by AbleSci AI