Research Article

Preprocessing Impact Analysis for Machine Learning-Based Network Intrusion Detection

Volume: 6 Number: 1 April 30, 2023
EN

Preprocessing Impact Analysis for Machine Learning-Based Network Intrusion Detection

Abstract

Machine learning (ML) has been frequently used to build intelligent systems in many problem domains, including cybersecurity. For malicious network activity detection, ML-based intrusion detection systems (IDSs) are promising due to their ability to classify attacks autonomously after learning process. However, this is a challenging task due to the vast number of available methods in the current literature, including ML classification algorithms and preprocessing techniques. For analysis the impact of preprocessing techniques on the ML algorithm, this study has conducted extensive experiments, using support vector machines (SVM), the classifier and the FS technique, several normalisation techniques, and a grid-search classifier optimisation algorithm. These methods were sequentially tested on three publicly available network intrusion datasets, NSL-KDD, UNSW-NB15, and CICIDS2017. Subsequently, the results were analysed to investigate the impact of each model and to extract the insights for building intelligent and efficient IDS. The results exhibited that data preprocessing significantly improves classification performance and log-scaling normalisation outperformed other techniques for intrusion detection datasets. Additionally, the results suggested that the embedded SVM-FS is accurate and classifier optimisation can improve performance of classifier-dependent FS techniques. However, feature selection in classifier optimisation is a critical problem that must be addressed. In conclusion, this study provides insights for building ML-based NIDS by revealing important information about data preprocessing.

Keywords

References

  1. [1] Ham, Jeroen Van Der. “Toward a Better Understanding of “Cybersecurity”.” Digital Threats: Research and Practice 2.3 (2021): 1-3.
  2. [2] Khraisat, Ansam, et al. “Survey of intrusion detection systems: techniques, datasets and challenges.” Cybersecurity 2.1 (2019): 1-22.
  3. [3] Ahmad, Zeeshan, et al. “Network intrusion detection system: A systematic study of machine learning and deep learning approaches.” Transactions on Emerging Telecommunications Technologies 32.1 (2021): e4150.
  4. [4] Singh, Dalwinder, and Birmohan Singh. “Investigating the impact of data normalisation on classification performance.” Applied Soft Computing 97 (2020): 105524.
  5. [5] Guyon, Isabelle, et al. “Gene selection for cancer classification using support vector machines.” Machine learning 46.1 (2002): 389-422.
  6. [6] Tavallaee, Mahbod, et al. “A detailed analysis of the KDD CUP 99 data set.” 2009 IEEE symposium on computational intelligence for security and defense applications. Ieee, 2009.
  7. [7] Moustafa, Nour, and Jill Slay. “The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set.” Information Security Journal: A Global Perspective 25.1-3 (2016): 18-31.
  8. [8] Sharafaldin, Iman, Arash Habibi Lashkari, and Ali A. Ghorbani. “Toward generating a new intrusion detection dataset and intrusion traffic characterisation.” ICISSp 1 (2018): 108-116.

Details

Primary Language

English

Subjects

Computer Software

Journal Section

Research Article

Authors

Hüseyin Güney *
0000-0001-7924-1904
Kuzey Kıbrıs Türk Cumhuriyeti

Early Pub Date

April 28, 2023

Publication Date

April 30, 2023

Submission Date

December 22, 2022

Acceptance Date

April 3, 2023

Published in Issue

Year 2023 Volume: 6 Number: 1

APA
Güney, H. (2023). Preprocessing Impact Analysis for Machine Learning-Based Network Intrusion Detection. Sakarya University Journal of Computer and Information Sciences, 6(1), 67-79. https://doi.org/10.35377/saucis...1223054
AMA
1.Güney H. Preprocessing Impact Analysis for Machine Learning-Based Network Intrusion Detection. SAUCIS. 2023;6(1):67-79. doi:10.35377/saucis.1223054
Chicago
Güney, Hüseyin. 2023. “Preprocessing Impact Analysis for Machine Learning-Based Network Intrusion Detection”. Sakarya University Journal of Computer and Information Sciences 6 (1): 67-79. https://doi.org/10.35377/saucis. 1223054.
EndNote
Güney H (April 1, 2023) Preprocessing Impact Analysis for Machine Learning-Based Network Intrusion Detection. Sakarya University Journal of Computer and Information Sciences 6 1 67–79.
IEEE
[1]H. Güney, “Preprocessing Impact Analysis for Machine Learning-Based Network Intrusion Detection”, SAUCIS, vol. 6, no. 1, pp. 67–79, Apr. 2023, doi: 10.35377/saucis...1223054.
ISNAD
Güney, Hüseyin. “Preprocessing Impact Analysis for Machine Learning-Based Network Intrusion Detection”. Sakarya University Journal of Computer and Information Sciences 6/1 (April 1, 2023): 67-79. https://doi.org/10.35377/saucis. 1223054.
JAMA
1.Güney H. Preprocessing Impact Analysis for Machine Learning-Based Network Intrusion Detection. SAUCIS. 2023;6:67–79.
MLA
Güney, Hüseyin. “Preprocessing Impact Analysis for Machine Learning-Based Network Intrusion Detection”. Sakarya University Journal of Computer and Information Sciences, vol. 6, no. 1, Apr. 2023, pp. 67-79, doi:10.35377/saucis. 1223054.
Vancouver
1.Hüseyin Güney. Preprocessing Impact Analysis for Machine Learning-Based Network Intrusion Detection. SAUCIS. 2023 Apr. 1;6(1):67-79. doi:10.35377/saucis. 1223054

Cited By

 

INDEXING & ABSTRACTING & ARCHIVING

 

31045 31044   ResimLink - Resim Yükle  31047 

31043 28939 28938 34240
 

 

29070    The papers in this journal are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License