作者
Arwa M. Eldhai,Mosab Hamdan,Ahmed Abdelaziz,Mohamed Hashem,Sharief F. Babiker,Muhammad Nadzir Marsono,Muzaffar Hamzah,N. Z. Jhanjhi
摘要
Traffic classification (TC) in software-defined networks (SDN) based on machine learning (ML) proves to be a viable option for improving network management. Therefore, TC assists SDN, and SDN facilitates the feature selection (FS) process, especially when using ML as a classification mechanism to extract measurements and related information from the incoming data to the SDN controller. Despite these advantages, there is still a lack of adequate support for tasks related to TC and FS because traffic profiles are often very similar, making classification difficult. Moreover, stream learning (SL), when it is used with TC, shows many challenges. Therefore, robust statistical flow features are needed to reduce the overhead from the SDN control plane. Consequently, these statistical flow features could extract online features, handle concept drift and process an infinite data stream with finite resources (time and memory). This paper aims to improve the overall performance of TC based on the SL technique to selection of relevant FS to alleviate load from the SDN control plane by the following. First, an FS mechanism named Boruta is proposed. Second, we propose a streaming-based traffic classification method in SDN called hoeffding adaptive trees (HAT), adaptive random forest (ARF), and k-nearest neighbour with adaptive sliding window detector (KNN-ADWIN). These techniques can dynamically handle the drift concept and solve the problem of consuming memory and time to reduce the SDN controller's overhead. Third, real and synthetic traffic traces are used to assess the proposed FS and stream TC performance. According to simulation results, the Boruta FS technique can achieve up to 95 % average accuracy, and up to 87% average per application to precision, recall, and f-score than other works in the literature. Furthermore, findings for SL techniques reveal that the proposed methods can retain up to 85% average accuracy, 78% kappa, and average rates between 62-88% in precision, recall, f-score. Also, the HAT has lower time and memory consumption reach to 15s and 105KB comparison to ART and KNN-ADWIN.