Preprocessing Impact Analysis for Machine Learning-Based Network Intrusion Detection

Hüseyin Güney

doi:10.35377/saucis...1223054

Research Article

Year 2023, Volume: 6 Issue: 1, 67 - 79, 30.04.2023

Hüseyin Güney

https://doi.org/10.35377/saucis...1223054

Cited By: 3

Abstract

References

[1] Ham, Jeroen Van Der. “Toward a Better Understanding of “Cybersecurity”.” Digital Threats: Research and Practice 2.3 (2021): 1-3.
[2] Khraisat, Ansam, et al. “Survey of intrusion detection systems: techniques, datasets and challenges.” Cybersecurity 2.1 (2019): 1-22.
[3] Ahmad, Zeeshan, et al. “Network intrusion detection system: A systematic study of machine learning and deep learning approaches.” Transactions on Emerging Telecommunications Technologies 32.1 (2021): e4150.
[4] Singh, Dalwinder, and Birmohan Singh. “Investigating the impact of data normalisation on classification performance.” Applied Soft Computing 97 (2020): 105524.
[5] Guyon, Isabelle, et al. “Gene selection for cancer classification using support vector machines.” Machine learning 46.1 (2002): 389-422.
[6] Tavallaee, Mahbod, et al. “A detailed analysis of the KDD CUP 99 data set.” 2009 IEEE symposium on computational intelligence for security and defense applications. Ieee, 2009.
[7] Moustafa, Nour, and Jill Slay. “The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set.” Information Security Journal: A Global Perspective 25.1-3 (2016): 18-31.
[8] Sharafaldin, Iman, Arash Habibi Lashkari, and Ali A. Ghorbani. “Toward generating a new intrusion detection dataset and intrusion traffic characterisation.” ICISSp 1 (2018): 108-116.
[9] Zhang, Xiaoyuan, Daoyin Qiu, and Fuan Chen. “Support vector machine with parameter optimisation by a novel hybrid method and its application to fault diagnosis.” Neurocomputing 149 (2015): 641-651.
[10] Yin C, Zhu Y, Fei J, He X. A deep learning approach for intrusion detection using recurrent neural networks. IEEE Access. 2017;5:21954-21961. doi:10.1109/access.2017.2762418.
[11] Tang, Chaofei, Nurbol Luktarhan, and Yuxin Zhao. “SAAE-DNN: Deep learning method on intrusion detection.” Symmetry 12.10 (2020): 1695.
[12] Pervez, Muhammad Shakil, and Dewan Md Farid. “Feature selection and intrusion classification in NSL-KDD cup 99 dataset employing SVMs.” The 8th International Conference on Software, Knowledge, Information Management and Applications (SKIMA 2014). IEEE, 2014.
[13] Janarthanan, Tharmini, and Shahrzad Zargari. “Feature selection in UNSW-NB15 and KDDCUP’99 datasets.” 2017 IEEE 26th international symposium on industrial electronics (ISIE). IEEE, 2017.
[14] Malik, Arif Jamal, Waseem Shahzad, and Farrukh Aslam Khan. “Network intrusion detection using hybrid binary PSO and random forests algorithm.” Security and Communication Networks 8.16 (2015): 2646-2660.
[15] Kanakarajan, Navaneeth Kumar, and Kandasamy Muniasamy. “Improving the accuracy of intrusion detection using gar-forest with feature selection.” Proceedings of the 4th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA) 2015. Springer, New Delhi, 2016.
[16] Khammassi, Chaouki, and Saoussen Krichen. “A GA-LR wrapper approach for feature selection in network intrusion detection.” computers & security 70 (2017): 255-277.
[17] Packet Preprocessing in CNN-Based Network Intrusion Detection System
[18] An Effective Comparative Analysis of Data Preprocessing Techniques in Network Intrusion Detection System Using Deep Neural Networks
[19] Data Preprocessing and feature selection for machine learning intrusion detection systems
[20] Feature selection for intrusion detection system in Internet-of-Things (IoT)
[21] Pajouh HH, Dastghaibyfard GH, Hashemi S. Two-tier network anomaly detection model: A machine learning approach. Journal of Intelligent Information Systems. 2015;48(1):61-74. doi:10.1007/s10844-015-0388-x.
[22] Jabbar AF, Mohammed IJ. Development of an optimised botnet detection framework based on filters of features and machine learning classifiers using CICIDS2017 dataset. IOP Conference Series: Materials Science and Engineering. 2020;928(3):032027. doi:10.1088/1757-899x/928/3/032027.
[23] Krishna KV, Swathi K, Rao BB. A novel framework for nids through fast knn classifier on CICIDS 2017 dataset. International Journal of Recent Technology and Engineering (IJRTE). 2020;8(5):3669-3675. doi:10.35940/ijrte.e6580.018520.
[24] Kshirsagar D, Kumar S. An efficient feature reduction method for the detection of Dos Attack. ICT Express. 2021;7(3):371-375. doi:10.1016/j.icte.2020.12.006.
[25] Azzaoui H, Boukhamla AZ, Arroyo D, Bensayah A. Developing new deep-learning model to enhance network intrusion classification. Evolving Systems. 2021;13(1):17-25. doi:10.1007/s12530-020-09364-z.
[26] Prajapati, Gend Lal, and Arti Patle. “On performing classification using SVM with radial basis and polynomial kernel functions.” 2010 3rd International Conference on Emerging Trends in Engineering and Technology. IEEE, 2010.
[27] Zhang, Xiaoyuan, Daoyin Qiu, and Fuan Chen. “Support vector machine with parameter optimisation by a novel hybrid method and its application to fault diagnosis.” Neurocomputing 149 (2015): 641-651.
[28] Hsu, Chih-Wei, Chih-Chung Chang, and Chih-Jen Lin. “A practical guide to support vector classification.” (2003): 1396-1400.
[29] R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
[30] RStudio Team (2019). RStudio: Integrated Development for R. RStudio, Inc., Boston, MA URL http://www.rstudio.com/.
[31] Meyer, David, et al. “Package ‘e1071’.” The R Journal (2019).

Preprocessing Impact Analysis for Machine Learning-Based Network Intrusion Detection

Year 2023, Volume: 6 Issue: 1, 67 - 79, 30.04.2023

Hüseyin Güney

https://doi.org/10.35377/saucis...1223054

Cited By: 3

Abstract

Machine learning (ML) has been frequently used to build intelligent systems in many problem domains, including cybersecurity. For malicious network activity detection, ML-based intrusion detection systems (IDSs) are promising due to their ability to classify attacks autonomously after learning process. However, this is a challenging task due to the vast number of available methods in the current literature, including ML classification algorithms and preprocessing techniques. For analysis the impact of preprocessing techniques on the ML algorithm, this study has conducted extensive experiments, using support vector machines (SVM), the classifier and the FS technique, several normalisation techniques, and a grid-search classifier optimisation algorithm. These methods were sequentially tested on three publicly available network intrusion datasets, NSL-KDD, UNSW-NB15, and CICIDS2017. Subsequently, the results were analysed to investigate the impact of each model and to extract the insights for building intelligent and efficient IDS. The results exhibited that data preprocessing significantly improves classification performance and log-scaling normalisation outperformed other techniques for intrusion detection datasets. Additionally, the results suggested that the embedded SVM-FS is accurate and classifier optimisation can improve performance of classifier-dependent FS techniques. However, feature selection in classifier optimisation is a critical problem that must be addressed. In conclusion, this study provides insights for building ML-based NIDS by revealing important information about data preprocessing.

Keywords

Data Preprocessing , Classifier Optimisation , Feature Selection , Network Intrusion Detection System , Support Vector Machines.

References

[1] Ham, Jeroen Van Der. “Toward a Better Understanding of “Cybersecurity”.” Digital Threats: Research and Practice 2.3 (2021): 1-3.
[2] Khraisat, Ansam, et al. “Survey of intrusion detection systems: techniques, datasets and challenges.” Cybersecurity 2.1 (2019): 1-22.
[3] Ahmad, Zeeshan, et al. “Network intrusion detection system: A systematic study of machine learning and deep learning approaches.” Transactions on Emerging Telecommunications Technologies 32.1 (2021): e4150.
[4] Singh, Dalwinder, and Birmohan Singh. “Investigating the impact of data normalisation on classification performance.” Applied Soft Computing 97 (2020): 105524.
[5] Guyon, Isabelle, et al. “Gene selection for cancer classification using support vector machines.” Machine learning 46.1 (2002): 389-422.
[6] Tavallaee, Mahbod, et al. “A detailed analysis of the KDD CUP 99 data set.” 2009 IEEE symposium on computational intelligence for security and defense applications. Ieee, 2009.
[7] Moustafa, Nour, and Jill Slay. “The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set.” Information Security Journal: A Global Perspective 25.1-3 (2016): 18-31.
[8] Sharafaldin, Iman, Arash Habibi Lashkari, and Ali A. Ghorbani. “Toward generating a new intrusion detection dataset and intrusion traffic characterisation.” ICISSp 1 (2018): 108-116.
[9] Zhang, Xiaoyuan, Daoyin Qiu, and Fuan Chen. “Support vector machine with parameter optimisation by a novel hybrid method and its application to fault diagnosis.” Neurocomputing 149 (2015): 641-651.
[10] Yin C, Zhu Y, Fei J, He X. A deep learning approach for intrusion detection using recurrent neural networks. IEEE Access. 2017;5:21954-21961. doi:10.1109/access.2017.2762418.
[11] Tang, Chaofei, Nurbol Luktarhan, and Yuxin Zhao. “SAAE-DNN: Deep learning method on intrusion detection.” Symmetry 12.10 (2020): 1695.
[12] Pervez, Muhammad Shakil, and Dewan Md Farid. “Feature selection and intrusion classification in NSL-KDD cup 99 dataset employing SVMs.” The 8th International Conference on Software, Knowledge, Information Management and Applications (SKIMA 2014). IEEE, 2014.
[13] Janarthanan, Tharmini, and Shahrzad Zargari. “Feature selection in UNSW-NB15 and KDDCUP’99 datasets.” 2017 IEEE 26th international symposium on industrial electronics (ISIE). IEEE, 2017.
[14] Malik, Arif Jamal, Waseem Shahzad, and Farrukh Aslam Khan. “Network intrusion detection using hybrid binary PSO and random forests algorithm.” Security and Communication Networks 8.16 (2015): 2646-2660.
[15] Kanakarajan, Navaneeth Kumar, and Kandasamy Muniasamy. “Improving the accuracy of intrusion detection using gar-forest with feature selection.” Proceedings of the 4th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA) 2015. Springer, New Delhi, 2016.
[16] Khammassi, Chaouki, and Saoussen Krichen. “A GA-LR wrapper approach for feature selection in network intrusion detection.” computers & security 70 (2017): 255-277.
[17] Packet Preprocessing in CNN-Based Network Intrusion Detection System
[18] An Effective Comparative Analysis of Data Preprocessing Techniques in Network Intrusion Detection System Using Deep Neural Networks
[19] Data Preprocessing and feature selection for machine learning intrusion detection systems
[20] Feature selection for intrusion detection system in Internet-of-Things (IoT)
[21] Pajouh HH, Dastghaibyfard GH, Hashemi S. Two-tier network anomaly detection model: A machine learning approach. Journal of Intelligent Information Systems. 2015;48(1):61-74. doi:10.1007/s10844-015-0388-x.
[22] Jabbar AF, Mohammed IJ. Development of an optimised botnet detection framework based on filters of features and machine learning classifiers using CICIDS2017 dataset. IOP Conference Series: Materials Science and Engineering. 2020;928(3):032027. doi:10.1088/1757-899x/928/3/032027.
[23] Krishna KV, Swathi K, Rao BB. A novel framework for nids through fast knn classifier on CICIDS 2017 dataset. International Journal of Recent Technology and Engineering (IJRTE). 2020;8(5):3669-3675. doi:10.35940/ijrte.e6580.018520.
[24] Kshirsagar D, Kumar S. An efficient feature reduction method for the detection of Dos Attack. ICT Express. 2021;7(3):371-375. doi:10.1016/j.icte.2020.12.006.
[25] Azzaoui H, Boukhamla AZ, Arroyo D, Bensayah A. Developing new deep-learning model to enhance network intrusion classification. Evolving Systems. 2021;13(1):17-25. doi:10.1007/s12530-020-09364-z.
[26] Prajapati, Gend Lal, and Arti Patle. “On performing classification using SVM with radial basis and polynomial kernel functions.” 2010 3rd International Conference on Emerging Trends in Engineering and Technology. IEEE, 2010.
[27] Zhang, Xiaoyuan, Daoyin Qiu, and Fuan Chen. “Support vector machine with parameter optimisation by a novel hybrid method and its application to fault diagnosis.” Neurocomputing 149 (2015): 641-651.
[28] Hsu, Chih-Wei, Chih-Chung Chang, and Chih-Jen Lin. “A practical guide to support vector classification.” (2003): 1396-1400.
[29] R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
[30] RStudio Team (2019). RStudio: Integrated Development for R. RStudio, Inc., Boston, MA URL http://www.rstudio.com/.
[31] Meyer, David, et al. “Package ‘e1071’.” The R Journal (2019).

There are 31 citations in total.

Details

Primary Language	English
Subjects	Computer Software
Journal Section	Articles
Authors	Hüseyin Güney 0000-0001-7924-1904
Early Pub Date	April 28, 2023
Publication Date	April 30, 2023
Submission Date	December 22, 2022
Acceptance Date	April 3, 2023
Published in Issue	Year 2023 Volume: 6 Issue: 1

Cite

APA	Güney, H. (2023). Preprocessing Impact Analysis for Machine Learning-Based Network Intrusion Detection. Sakarya University Journal of Computer and Information Sciences, 6(1), 67-79. https://doi.org/10.35377/saucis...1223054
AMA	Güney H. Preprocessing Impact Analysis for Machine Learning-Based Network Intrusion Detection. SAUCIS. April 2023;6(1):67-79. doi:10.35377/saucis.1223054
Chicago	Güney, Hüseyin. “Preprocessing Impact Analysis for Machine Learning-Based Network Intrusion Detection”. Sakarya University Journal of Computer and Information Sciences 6, no. 1 (April 2023): 67-79. https://doi.org/10.35377/saucis. 1223054.
EndNote	Güney H (April 1, 2023) Preprocessing Impact Analysis for Machine Learning-Based Network Intrusion Detection. Sakarya University Journal of Computer and Information Sciences 6 1 67–79.
IEEE	H. Güney, “Preprocessing Impact Analysis for Machine Learning-Based Network Intrusion Detection”, SAUCIS, vol. 6, no. 1, pp. 67–79, 2023, doi: 10.35377/saucis...1223054.
ISNAD	Güney, Hüseyin. “Preprocessing Impact Analysis for Machine Learning-Based Network Intrusion Detection”. Sakarya University Journal of Computer and Information Sciences 6/1 (April2023), 67-79. https://doi.org/10.35377/saucis. 1223054.
JAMA	Güney H. Preprocessing Impact Analysis for Machine Learning-Based Network Intrusion Detection. SAUCIS. 2023;6:67–79.
MLA	Güney, Hüseyin. “Preprocessing Impact Analysis for Machine Learning-Based Network Intrusion Detection”. Sakarya University Journal of Computer and Information Sciences, vol. 6, no. 1, 2023, pp. 67-79, doi:10.35377/saucis. 1223054.
Vancouver	Güney H. Preprocessing Impact Analysis for Machine Learning-Based Network Intrusion Detection. SAUCIS. 2023;6(1):67-79.