作者
Sareena Karapoola,Nikhilesh Singh,Chester Rebeiro,V. Kamakoti
摘要
Malware programs are diverse, with varying objectives, functionalities, and threat levels ranging from mere pop-ups to significant financial losses. Consequently, their run-time footprints across the system differ, impacting the optimal data source (Network, Operating system (OS), Hardware) and features that are instrumental to malware detection. Further, the variations in threat levels of malware classes affect the user policies for detection. Thus, the optimal tuple of $\langle \tt data$ - $\tt source$ , $\tt features$ , $\tt user$ - $\tt policies \rangle$ , determined experimentally, is different for each malware class, impacting the state-of-the-art detection solutions that are agnostic to these subtle differences. This paper presents ${\sf SUNDEW}$ , a framework to detect malware classes using the corresponding optimal tuple of $\langle \tt data$ - $\tt source$ , $\tt features$ , $\tt user$ - $\tt policies \rangle$ . ${\sf SUNDEW}$ uses an ensemble of specialized predictors, each trained with a particular data source (network, OS, and hardware) and tuned for features and policies of a specific class. While the specialized ensemble with a holistic view across the system improves detection, aggregating the independent conflicting inferences from the different predictors is challenging. ${\sf SUNDEW}$ resolves such conflicts with a hierarchical aggregation considering the threat-level, noise in the data sources, and prior domain knowledge. We evaluate ${\sf SUNDEW}$ on a real-world dataset of over 10,000 malware samples from 8 classes. It achieves an F1-Score of one for most classes, with an average of 0.93, and has a limited performance overhead of 1.5%. Our experiments on a common multi-featured dataset show that ${\sf SUNDEW}$ is 10% more accurate, with 89% lower false positives, than prior state-of-the-art predictors.