恶意软件
计算机科学
搜索引擎索引
可扩展性
最近邻搜索
调用图
系统调用
图形
修剪
利用
理论计算机科学
隐病毒学
数据挖掘
人工智能
数据库
程序设计语言
计算机安全
生物
农学
作者
Xin Hu,Tzi‐cker Chiueh,Kang G. Shin
标识
DOI:10.1145/1653662.1653736
摘要
A major challenge of the anti-virus (AV) industry is how to effectively process the huge influx of malware samples they receive every day. One possible solution to this problem is to quickly determine if a new malware sample is similar to any previously-seen malware program. In this paper, we design, implement and evaluate a malware database management system called SMIT (Symantec Malware Indexing Tree) that can efficiently make such determination based on malware's function-call graphs, which is a structural representation known to be less susceptible to instruction-level obfuscations commonly employed by malware writers to evade detection of AV software. Because each malware program is represented as a graph, the problem of searching for the most similar malware program in a database to a given malware sample is cast into a nearest-neighbor search problem in a graph database. To speed up this search, we have developed an efficient method to compute graph similarity that exploits structural and instruction-level information in the underlying malware programs, and a multi-resolution indexing scheme that uses a computationally economical feature vector for early pruning and resorts to a more accurate but computationally more expensive graph similarity function only when it needs to pinpoint the most similar neighbors. Results of a comprehensive performance study of the SMIT prototype using a database of more than 100,000 malware demonstrate the effective pruning power and scalability of its nearest neighbor search mechanisms.
科研通智能强力驱动
Strongly Powered by AbleSci AI