Research Article
BibTex RIS Cite
Year 2023, , 67 - 79, 30.04.2023
https://doi.org/10.35377/saucis...1223054

Abstract

References

  • [1] Ham, Jeroen Van Der. “Toward a Better Understanding of “Cybersecurity”.” Digital Threats: Research and Practice 2.3 (2021): 1-3.
  • [2] Khraisat, Ansam, et al. “Survey of intrusion detection systems: techniques, datasets and challenges.” Cybersecurity 2.1 (2019): 1-22.
  • [3] Ahmad, Zeeshan, et al. “Network intrusion detection system: A systematic study of machine learning and deep learning approaches.” Transactions on Emerging Telecommunications Technologies 32.1 (2021): e4150.
  • [4] Singh, Dalwinder, and Birmohan Singh. “Investigating the impact of data normalisation on classification performance.” Applied Soft Computing 97 (2020): 105524.
  • [5] Guyon, Isabelle, et al. “Gene selection for cancer classification using support vector machines.” Machine learning 46.1 (2002): 389-422.
  • [6] Tavallaee, Mahbod, et al. “A detailed analysis of the KDD CUP 99 data set.” 2009 IEEE symposium on computational intelligence for security and defense applications. Ieee, 2009.
  • [7] Moustafa, Nour, and Jill Slay. “The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set.” Information Security Journal: A Global Perspective 25.1-3 (2016): 18-31.
  • [8] Sharafaldin, Iman, Arash Habibi Lashkari, and Ali A. Ghorbani. “Toward generating a new intrusion detection dataset and intrusion traffic characterisation.” ICISSp 1 (2018): 108-116.
  • [9] Zhang, Xiaoyuan, Daoyin Qiu, and Fuan Chen. “Support vector machine with parameter optimisation by a novel hybrid method and its application to fault diagnosis.” Neurocomputing 149 (2015): 641-651.
  • [10] Yin C, Zhu Y, Fei J, He X. A deep learning approach for intrusion detection using recurrent neural networks. IEEE Access. 2017;5:21954-21961. doi:10.1109/access.2017.2762418.
  • [11] Tang, Chaofei, Nurbol Luktarhan, and Yuxin Zhao. “SAAE-DNN: Deep learning method on intrusion detection.” Symmetry 12.10 (2020): 1695.
  • [12] Pervez, Muhammad Shakil, and Dewan Md Farid. “Feature selection and intrusion classification in NSL-KDD cup 99 dataset employing SVMs.” The 8th International Conference on Software, Knowledge, Information Management and Applications (SKIMA 2014). IEEE, 2014.
  • [13] Janarthanan, Tharmini, and Shahrzad Zargari. “Feature selection in UNSW-NB15 and KDDCUP’99 datasets.” 2017 IEEE 26th international symposium on industrial electronics (ISIE). IEEE, 2017.
  • [14] Malik, Arif Jamal, Waseem Shahzad, and Farrukh Aslam Khan. “Network intrusion detection using hybrid binary PSO and random forests algorithm.” Security and Communication Networks 8.16 (2015): 2646-2660.
  • [15] Kanakarajan, Navaneeth Kumar, and Kandasamy Muniasamy. “Improving the accuracy of intrusion detection using gar-forest with feature selection.” Proceedings of the 4th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA) 2015. Springer, New Delhi, 2016.
  • [16] Khammassi, Chaouki, and Saoussen Krichen. “A GA-LR wrapper approach for feature selection in network intrusion detection.” computers & security 70 (2017): 255-277.
  • [17] Packet Preprocessing in CNN-Based Network Intrusion Detection System
  • [18] An Effective Comparative Analysis of Data Preprocessing Techniques in Network Intrusion Detection System Using Deep Neural Networks
  • [19] Data Preprocessing and feature selection for machine learning intrusion detection systems
  • [20] Feature selection for intrusion detection system in Internet-of-Things (IoT)
  • [21] Pajouh HH, Dastghaibyfard GH, Hashemi S. Two-tier network anomaly detection model: A machine learning approach. Journal of Intelligent Information Systems. 2015;48(1):61-74. doi:10.1007/s10844-015-0388-x.
  • [22] Jabbar AF, Mohammed IJ. Development of an optimised botnet detection framework based on filters of features and machine learning classifiers using CICIDS2017 dataset. IOP Conference Series: Materials Science and Engineering. 2020;928(3):032027. doi:10.1088/1757-899x/928/3/032027.
  • [23] Krishna KV, Swathi K, Rao BB. A novel framework for nids through fast knn classifier on CICIDS 2017 dataset. International Journal of Recent Technology and Engineering (IJRTE). 2020;8(5):3669-3675. doi:10.35940/ijrte.e6580.018520.
  • [24] Kshirsagar D, Kumar S. An efficient feature reduction method for the detection of Dos Attack. ICT Express. 2021;7(3):371-375. doi:10.1016/j.icte.2020.12.006.
  • [25] Azzaoui H, Boukhamla AZ, Arroyo D, Bensayah A. Developing new deep-learning model to enhance network intrusion classification. Evolving Systems. 2021;13(1):17-25. doi:10.1007/s12530-020-09364-z.
  • [26] Prajapati, Gend Lal, and Arti Patle. “On performing classification using SVM with radial basis and polynomial kernel functions.” 2010 3rd International Conference on Emerging Trends in Engineering and Technology. IEEE, 2010.
  • [27] Zhang, Xiaoyuan, Daoyin Qiu, and Fuan Chen. “Support vector machine with parameter optimisation by a novel hybrid method and its application to fault diagnosis.” Neurocomputing 149 (2015): 641-651.
  • [28] Hsu, Chih-Wei, Chih-Chung Chang, and Chih-Jen Lin. “A practical guide to support vector classification.” (2003): 1396-1400.
  • [29] R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
  • [30] RStudio Team (2019). RStudio: Integrated Development for R. RStudio, Inc., Boston, MA URL http://www.rstudio.com/.
  • [31] Meyer, David, et al. “Package ‘e1071’.” The R Journal (2019).

Preprocessing Impact Analysis for Machine Learning-Based Network Intrusion Detection

Year 2023, , 67 - 79, 30.04.2023
https://doi.org/10.35377/saucis...1223054

Abstract

Machine learning (ML) has been frequently used to build intelligent systems in many problem domains, including cybersecurity. For malicious network activity detection, ML-based intrusion detection systems (IDSs) are promising due to their ability to classify attacks autonomously after learning process. However, this is a challenging task due to the vast number of available methods in the current literature, including ML classification algorithms and preprocessing techniques. For analysis the impact of preprocessing techniques on the ML algorithm, this study has conducted extensive experiments, using support vector machines (SVM), the classifier and the FS technique, several normalisation techniques, and a grid-search classifier optimisation algorithm. These methods were sequentially tested on three publicly available network intrusion datasets, NSL-KDD, UNSW-NB15, and CICIDS2017. Subsequently, the results were analysed to investigate the impact of each model and to extract the insights for building intelligent and efficient IDS. The results exhibited that data preprocessing significantly improves classification performance and log-scaling normalisation outperformed other techniques for intrusion detection datasets. Additionally, the results suggested that the embedded SVM-FS is accurate and classifier optimisation can improve performance of classifier-dependent FS techniques. However, feature selection in classifier optimisation is a critical problem that must be addressed. In conclusion, this study provides insights for building ML-based NIDS by revealing important information about data preprocessing.

References

  • [1] Ham, Jeroen Van Der. “Toward a Better Understanding of “Cybersecurity”.” Digital Threats: Research and Practice 2.3 (2021): 1-3.
  • [2] Khraisat, Ansam, et al. “Survey of intrusion detection systems: techniques, datasets and challenges.” Cybersecurity 2.1 (2019): 1-22.
  • [3] Ahmad, Zeeshan, et al. “Network intrusion detection system: A systematic study of machine learning and deep learning approaches.” Transactions on Emerging Telecommunications Technologies 32.1 (2021): e4150.
  • [4] Singh, Dalwinder, and Birmohan Singh. “Investigating the impact of data normalisation on classification performance.” Applied Soft Computing 97 (2020): 105524.
  • [5] Guyon, Isabelle, et al. “Gene selection for cancer classification using support vector machines.” Machine learning 46.1 (2002): 389-422.
  • [6] Tavallaee, Mahbod, et al. “A detailed analysis of the KDD CUP 99 data set.” 2009 IEEE symposium on computational intelligence for security and defense applications. Ieee, 2009.
  • [7] Moustafa, Nour, and Jill Slay. “The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set.” Information Security Journal: A Global Perspective 25.1-3 (2016): 18-31.
  • [8] Sharafaldin, Iman, Arash Habibi Lashkari, and Ali A. Ghorbani. “Toward generating a new intrusion detection dataset and intrusion traffic characterisation.” ICISSp 1 (2018): 108-116.
  • [9] Zhang, Xiaoyuan, Daoyin Qiu, and Fuan Chen. “Support vector machine with parameter optimisation by a novel hybrid method and its application to fault diagnosis.” Neurocomputing 149 (2015): 641-651.
  • [10] Yin C, Zhu Y, Fei J, He X. A deep learning approach for intrusion detection using recurrent neural networks. IEEE Access. 2017;5:21954-21961. doi:10.1109/access.2017.2762418.
  • [11] Tang, Chaofei, Nurbol Luktarhan, and Yuxin Zhao. “SAAE-DNN: Deep learning method on intrusion detection.” Symmetry 12.10 (2020): 1695.
  • [12] Pervez, Muhammad Shakil, and Dewan Md Farid. “Feature selection and intrusion classification in NSL-KDD cup 99 dataset employing SVMs.” The 8th International Conference on Software, Knowledge, Information Management and Applications (SKIMA 2014). IEEE, 2014.
  • [13] Janarthanan, Tharmini, and Shahrzad Zargari. “Feature selection in UNSW-NB15 and KDDCUP’99 datasets.” 2017 IEEE 26th international symposium on industrial electronics (ISIE). IEEE, 2017.
  • [14] Malik, Arif Jamal, Waseem Shahzad, and Farrukh Aslam Khan. “Network intrusion detection using hybrid binary PSO and random forests algorithm.” Security and Communication Networks 8.16 (2015): 2646-2660.
  • [15] Kanakarajan, Navaneeth Kumar, and Kandasamy Muniasamy. “Improving the accuracy of intrusion detection using gar-forest with feature selection.” Proceedings of the 4th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA) 2015. Springer, New Delhi, 2016.
  • [16] Khammassi, Chaouki, and Saoussen Krichen. “A GA-LR wrapper approach for feature selection in network intrusion detection.” computers & security 70 (2017): 255-277.
  • [17] Packet Preprocessing in CNN-Based Network Intrusion Detection System
  • [18] An Effective Comparative Analysis of Data Preprocessing Techniques in Network Intrusion Detection System Using Deep Neural Networks
  • [19] Data Preprocessing and feature selection for machine learning intrusion detection systems
  • [20] Feature selection for intrusion detection system in Internet-of-Things (IoT)
  • [21] Pajouh HH, Dastghaibyfard GH, Hashemi S. Two-tier network anomaly detection model: A machine learning approach. Journal of Intelligent Information Systems. 2015;48(1):61-74. doi:10.1007/s10844-015-0388-x.
  • [22] Jabbar AF, Mohammed IJ. Development of an optimised botnet detection framework based on filters of features and machine learning classifiers using CICIDS2017 dataset. IOP Conference Series: Materials Science and Engineering. 2020;928(3):032027. doi:10.1088/1757-899x/928/3/032027.
  • [23] Krishna KV, Swathi K, Rao BB. A novel framework for nids through fast knn classifier on CICIDS 2017 dataset. International Journal of Recent Technology and Engineering (IJRTE). 2020;8(5):3669-3675. doi:10.35940/ijrte.e6580.018520.
  • [24] Kshirsagar D, Kumar S. An efficient feature reduction method for the detection of Dos Attack. ICT Express. 2021;7(3):371-375. doi:10.1016/j.icte.2020.12.006.
  • [25] Azzaoui H, Boukhamla AZ, Arroyo D, Bensayah A. Developing new deep-learning model to enhance network intrusion classification. Evolving Systems. 2021;13(1):17-25. doi:10.1007/s12530-020-09364-z.
  • [26] Prajapati, Gend Lal, and Arti Patle. “On performing classification using SVM with radial basis and polynomial kernel functions.” 2010 3rd International Conference on Emerging Trends in Engineering and Technology. IEEE, 2010.
  • [27] Zhang, Xiaoyuan, Daoyin Qiu, and Fuan Chen. “Support vector machine with parameter optimisation by a novel hybrid method and its application to fault diagnosis.” Neurocomputing 149 (2015): 641-651.
  • [28] Hsu, Chih-Wei, Chih-Chung Chang, and Chih-Jen Lin. “A practical guide to support vector classification.” (2003): 1396-1400.
  • [29] R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
  • [30] RStudio Team (2019). RStudio: Integrated Development for R. RStudio, Inc., Boston, MA URL http://www.rstudio.com/.
  • [31] Meyer, David, et al. “Package ‘e1071’.” The R Journal (2019).
There are 31 citations in total.

Details

Primary Language English
Subjects Computer Software
Journal Section Articles
Authors

Hüseyin Güney 0000-0001-7924-1904

Early Pub Date April 28, 2023
Publication Date April 30, 2023
Submission Date December 22, 2022
Acceptance Date April 3, 2023
Published in Issue Year 2023

Cite

IEEE H. Güney, “Preprocessing Impact Analysis for Machine Learning-Based Network Intrusion Detection”, SAUCIS, vol. 6, no. 1, pp. 67–79, 2023, doi: 10.35377/saucis...1223054.

29070    The papers in this journal are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License