Machine learning (ML) has been frequently used to build intelligent systems in many problem domains, including cybersecurity. For malicious network activity detection, ML-based intrusion detection systems (IDSs) are promising due to their ability to classify attacks autonomously after learning process. However, this is a challenging task due to the vast number of available methods in the current literature, including ML classification algorithms and preprocessing techniques. For analysis the impact of preprocessing techniques on the ML algorithm, this study has conducted extensive experiments, using support vector machines (SVM), the classifier and the FS technique, several normalisation techniques, and a grid-search classifier optimisation algorithm. These methods were sequentially tested on three publicly available network intrusion datasets, NSL-KDD, UNSW-NB15, and CICIDS2017. Subsequently, the results were analysed to investigate the impact of each model and to extract the insights for building intelligent and efficient IDS. The results exhibited that data preprocessing significantly improves classification performance and log-scaling normalisation outperformed other techniques for intrusion detection datasets. Additionally, the results suggested that the embedded SVM-FS is accurate and classifier optimisation can improve performance of classifier-dependent FS techniques. However, feature selection in classifier optimisation is a critical problem that must be addressed. In conclusion, this study provides insights for building ML-based NIDS by revealing important information about data preprocessing.
Data Preprocessing Classifier Optimisation Feature Selection Network Intrusion Detection System Support Vector Machines.
Primary Language | English |
---|---|
Subjects | Computer Software |
Journal Section | Articles |
Authors | |
Early Pub Date | April 28, 2023 |
Publication Date | April 30, 2023 |
Submission Date | December 22, 2022 |
Acceptance Date | April 3, 2023 |
Published in Issue | Year 2023 |
The papers in this journal are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License