Classification of Malicious URLs Using Naive Bayes and Genetic Algorithm

Murat Koca; İsa Avcı; Mohammed Abdulkareem Shakir Al-hayani

doi:10.35377/saucis...1273536

Research Article

Year 2023, , 80 - 90, 31.08.2023

Murat Koca , İsa Avcı , Mohammed Abdulkareem Shakir Al-hayani

https://doi.org/10.35377/saucis...1273536

Cited By: 1

Abstract

References

M. Tavallaee, E. Bagheri, W. Lu, and A. Ghorbani, “A Detailed Analysis of the KDD CUP 99 Data Set,” Submitted to Second IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), 2009.
A. Sharma and A. Thakral, “Malicious URL classification using machine learning algorithms and comparative analysis,” Advances in Intelligent Systems and Computing, vol. 1090, pp. 791–799, 2020, doi: 10.1007/978-981-15-1480-7_73/COVER.
K. U. Santoshi, S. S. Bhavya, Y. B. Sri, and B. Venkateswarlu, “Twitter Spam Detection Using Naïve Bayes Classifier,” Proceedings of the 6th International Conference on Inventive Computation Technologies, ICICT 2021, pp. 773–777, Jan. 2021, doi: 10.1109/ICICT50816.2021.9358579.
T. Islam, S. Latif, and N. Ahmed, “Using Social Networks to Detect Malicious Bangla Text Content,” 1st International Conference on Advances in Science, Engineering and Robotics Technology 2019, ICASERT 2019, May 2019, doi: 10.1109/ICASERT.2019.8934841.
A. Moruff Oyelakin, O. Akinyemi Moruff, A. Olasunkanmi Maruf, and A. Tosho, “Performance Analysis of Selected Machine Learning Algorithms for the Classification of Phishing URLs Machine Learning Techniques in building Predictive Models for COVID-19 View project Investigation of MANET security protocols and optimisation View project Performance Analysis of Selected Machine Learning Algorithms for the Classification of Phishing URLs”, Accessed: Jan. 05, 2023. [Online]. Available: https://www.researchgate.net/publication/345161822
Maciej Serda et al., “Synteza i aktywność biologiczna nowych analogów tiosemikarbazonowych chelatorów żelaza,” Uniwersytet śląski, vol. 7, no. 1, pp. 343–354, 2013, doi: 10.2/JQUERY.MIN.JS.
T. Wu, Y. Xi, M. Wang, and Z. Zhao, “Classification of Malicious URLs by CNN Model Based on Genetic Algorithm,” Applied Sciences 2022, Vol. 12, Page 12030, vol. 12, no. 23, p. 12030, Nov. 2022, doi: 10.3390/APP122312030.
R. Rajalakshmi, S. Ramraj, and R. Ramesh Kannan, “Transfer learning approach for identification of malicious domain names,” Communications in Computer and Information Science, vol. 969, pp. 656–666, 2019, doi: 10.1007/978-981-13-5826-5_51/COVER.
G. Wejinya and S. Bhatia, “Machine Learning for Malicious URL Detection,” Advances in Intelligent Systems and Computing, vol. 1270, pp. 463–472, 2021, doi: 10.1007/978-981-15-8289-9_45/COVER.
F. Alzubaidi, “DETECT MALWARE URL USING NAIVE BAYES ALGORITHM.”
A. E. El-Din, E. El-Din Hemdan, and A. El-Sayed, “Malweb: An efficient malicious websites detection system using machine learning algorithms,” ICEEM 2021 - 2nd IEEE International Conference on Electronic Engineering, Jul. 2021, doi: 10.1109/ICEEM52022.2021.9480648.
S. Wang, Y. Wang, and M. Tang, “Auto Malicious Websites Classification Based on Naive Bayes Classifier,” Proceedings of 2020 IEEE 3rd International Conference on Information Systems and Computer Aided Education, ICISCAE 2020, pp. 443–447, Sep. 2020, doi: 10.1109/ICISCAE51034.2020.9236912.
S. Wang, Y. Wang, and M. Tang, “Auto Malicious Websites Classification Based on Naive Bayes Classifier,” Proceedings of 2020 IEEE 3rd International Conference on Information Systems and Computer Aided Education, ICISCAE 2020, pp. 443–447, Sep. 2020, doi: 10.1109/ICISCAE51034.2020.9236912.
W. Fadheel, W. Al-Mawee, and S. Carr, “On Phishing: URL Lexical and Network Traffic Features Analysis and Knowledge Extraction using Machine Learning Algorithms (A Comparison Study),” 2022 5th International Conference on Data Science and Information Technology, DSIT 2022 - Proceedings, 2022, doi: 10.1109/DSIT55514.2022.9943832.
C. Liu and G. Wang, “Analysis and detection of spam accounts in social networks,” 2016 2nd IEEE International Conference on Computer and Communications, ICCC 2016 - Proceedings, pp. 2526–2530, May 2017, doi: 10.1109/COMPCOMM.2016.7925154.
Subasi, A.; Balfaqih, M.; Balfagih, Z.; Alfawwaz, K. A comparative evaluation of ensemble classifiers for malicious webpage detection. Procedia Comput. Sci. 2021, 194, 272–279.
Sayamber, A.B.; Dixit, A.M. Malicious URL detection and identification. Int. J. Comput. Appl. 2014, 99, 17–23.
Jian, L.; Gang, Z.; Yunpeng, Z. Design and implementation of malicious URL multi-layer filtering detection model. Inf. Netw. Secur. 2016, 1, 6.
Vundavalli, V.; Barsha, F.; Masum, M.; Shahriar, H.; Haddad, H. Malicious URL detection using supervised machine learning techniques. In Proceedings of the 13th International Conference on Security of Information and Networks, Merkez, Turkey, 4–7 November 2020; pp. 1–6.
Rahman SS, M.M.; Islam, T.; Jabiullah, M.I. PhishStack: Evaluation of stacked generalization in phishing URLs detection. Procedia Comput. Sci. 2020, 167, 2410–2418.
Zeyu, L.; Yong, S.; Zhi, X. Malicious URL recognition based on machine learning. Commun. Technol. 2020, 53.
Pham TT, T.; Hoang, V.N.; Ha, T.N. Exploring efficiency of character-level convolution neuron network and long short-term memory on malicious URL detection. In Proceedings of the 2018 VII International Conference on Network, Communication and Computing, Taipei City, Taiwan, 14–16 December 2018; pp. 82–86.
Chen, Z.; Liu, Y.; Chen, C.; Lu, M.; Zhang, X. Malicious URL detection based on improved multilayer recurrent convolutional neural network model. Secur. Commun. Netw. 2021, 2021, 9994127.
Li, T.; Kou, G.; Peng, Y. Improving malicious URLs detection via feature engineering: Linear and nonlinear space transformation methods. Inf. Syst. 2020, 91, 101494.
Kumi, S.; Lim, C.H.; Lee, S.G. Malicious URL detection based on associative classification. Entropy 2021, 23, 182.
Raja, A.S.; Vinodini, R.; Kavitha, A. Lexical features based malicious URL detection using machine learning techniques. Mater. Today: Proc. 2021, 47 Pt 1, 163–166.
Joshi, A.; Lloyd, L.; Westin, P.; Seethapathy, S. Using lexical features for malicious URL detection—A machine learning approach. arXiv 2019, arXiv:1910.06277.
Kang, C.; Huazheng, F.; Yong, X. Malicious URL identification based on deep learning. Comput. Syst. Appl. 2018, 27, 27–33.
Yuan, J.T.; Liu, Y.P.; Yu, L. A novel approach for malicious URL detection based on the joint model. Secur. Commun. Netw. 2021, 2021, 4917016.
Le, H.; Pham, Q.; Sahoo, D.; Hoi, S.C. URLNet: Learning a URL representation with deep learning for malicious URL detection. arXiv 2018, arXiv:1802.03162.
Yuan, J.; Chen, G.; Tian, S.; Pei, X. Malicious URL detection based on a parallel neural joint model. IEEE Access 2021, 9, 9464–9472.
Zhao, G.; Wang, P.; Wang, X.; Jin, W.; Wu, X. Two-dimensional code malicious URL detection method based on decision tree. Inf. Secur. Technol. 2014, 5, 36–39.
Liu, C.; Wang, L.; Lang, B.; Zhou, Y. Finding effective classifier for malicious URL detection. In Proceedings of the 2018 2nd International Conference on Management Engineering, Software Engineering and Service Sciences, Wuhan, China, 13–15 January 2018; pp. 240–244.
Lin, H.L.; Li, Y.; Wang, W.P.; Yue, Y.L.; Lin, Z. Efficient malicious URL detection method based on segment pattern. Commun. J. 2015, 36, 141–148.
Subasi, A.; Balfaqih, M.; Balfagih, Z.; Alfawwaz, K. A comparative evaluation of ensemble classifiers for malicious webpage detection. Procedia Comput. Sci. 2021, 194, 272–279.
Sayamber, A.B.; Dixit, A.M. Malicious URL detection and identification. Int. J. Comput. Appl. 2014, 99, 17–23.
Jian, L.; Gang, Z.; Yunpeng, Z. Design and implementation of malicious URL multi-layer filtering detection model. Inf. Netw. Secur. 2016, 1, 6.
Vundavalli, V.; Barsha, F.; Masum, M.; Shahriar, H.; Haddad, H. Malicious URL detection using supervised machine learning techniques. In Proceedings of the 13th International Conference on Security of Information and Networks, Merkez, Turkey, 4–7 November 2020; pp. 1–6.
Rahman SS, M.M.; Islam, T.; Jabiullah, M.I. PhishStack: Evaluation of stacked generalization in phishing URLs detection. Procedia Comput. Sci. 2020, 167, 2410–2418.
Zeyu, L.; Yong, S.; Zhi, X. Malicious URL recognition based on machine learning. Commun. Technol. 2020.
Pham TT, T.; Hoang, V.N.; Ha, T.N. Exploring efficiency of character-level convolution neuron network and long short-term memory on malicious URL detection. In Proceedings of the 2018 VII International Conference on Network, Communication and Computing, Taipei City, Taiwan, 14–16 December 2018; pp. 82–86.
Chen, Z.; Liu, Y.; Chen, C.; Lu, M.; Zhang, X. Malicious URL detection based on improved multilayer recurrent convolutional neural network model. Secur. Commun. Netw. 2021, 2021, 9994127.
Li, T.; Kou, G.; Peng, Y. Improving malicious URLs detection via feature engineering: Linear and nonlinear space transformation methods. Inf. Syst. 2020, 91, 101494.

Classification of Malicious URLs Using Naive Bayes and Genetic Algorithm

Year 2023, , 80 - 90, 31.08.2023

Murat Koca , İsa Avcı , Mohammed Abdulkareem Shakir Al-hayani

https://doi.org/10.35377/saucis...1273536

Cited By: 1

Abstract

The financial losses of vulnerable and insecure websites are increasing day by day. The proposed system in this research presents a strategy based on factor analysis of website categories and accurate identification of unknown information to classify safe and dangerous websites and protect users from the previous one. Probability calculations based on Naive Bayes and other powerful approaches are used throughout the website classification procedure to evaluate and train the website classification model. According to our study, the Naive Bayes approach was benign and showed successful results compared to other tests. This strategy is best optimized to solve the problem of distinguishing secure websites from unsafe ones. The vulnerability data categorization training model included in this datasheet had a better degree of precision. In this study, the best accuracy probability of 96% was achieved in Naive Bayes' NSL-KDD data set categorization

Keywords

: HTML, Malicious, Naive Bayes, Machine learning, URL, Neural Network

References

M. Tavallaee, E. Bagheri, W. Lu, and A. Ghorbani, “A Detailed Analysis of the KDD CUP 99 Data Set,” Submitted to Second IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), 2009.
A. Sharma and A. Thakral, “Malicious URL classification using machine learning algorithms and comparative analysis,” Advances in Intelligent Systems and Computing, vol. 1090, pp. 791–799, 2020, doi: 10.1007/978-981-15-1480-7_73/COVER.
K. U. Santoshi, S. S. Bhavya, Y. B. Sri, and B. Venkateswarlu, “Twitter Spam Detection Using Naïve Bayes Classifier,” Proceedings of the 6th International Conference on Inventive Computation Technologies, ICICT 2021, pp. 773–777, Jan. 2021, doi: 10.1109/ICICT50816.2021.9358579.
T. Islam, S. Latif, and N. Ahmed, “Using Social Networks to Detect Malicious Bangla Text Content,” 1st International Conference on Advances in Science, Engineering and Robotics Technology 2019, ICASERT 2019, May 2019, doi: 10.1109/ICASERT.2019.8934841.
A. Moruff Oyelakin, O. Akinyemi Moruff, A. Olasunkanmi Maruf, and A. Tosho, “Performance Analysis of Selected Machine Learning Algorithms for the Classification of Phishing URLs Machine Learning Techniques in building Predictive Models for COVID-19 View project Investigation of MANET security protocols and optimisation View project Performance Analysis of Selected Machine Learning Algorithms for the Classification of Phishing URLs”, Accessed: Jan. 05, 2023. [Online]. Available: https://www.researchgate.net/publication/345161822
Maciej Serda et al., “Synteza i aktywność biologiczna nowych analogów tiosemikarbazonowych chelatorów żelaza,” Uniwersytet śląski, vol. 7, no. 1, pp. 343–354, 2013, doi: 10.2/JQUERY.MIN.JS.
T. Wu, Y. Xi, M. Wang, and Z. Zhao, “Classification of Malicious URLs by CNN Model Based on Genetic Algorithm,” Applied Sciences 2022, Vol. 12, Page 12030, vol. 12, no. 23, p. 12030, Nov. 2022, doi: 10.3390/APP122312030.
R. Rajalakshmi, S. Ramraj, and R. Ramesh Kannan, “Transfer learning approach for identification of malicious domain names,” Communications in Computer and Information Science, vol. 969, pp. 656–666, 2019, doi: 10.1007/978-981-13-5826-5_51/COVER.
G. Wejinya and S. Bhatia, “Machine Learning for Malicious URL Detection,” Advances in Intelligent Systems and Computing, vol. 1270, pp. 463–472, 2021, doi: 10.1007/978-981-15-8289-9_45/COVER.
F. Alzubaidi, “DETECT MALWARE URL USING NAIVE BAYES ALGORITHM.”
A. E. El-Din, E. El-Din Hemdan, and A. El-Sayed, “Malweb: An efficient malicious websites detection system using machine learning algorithms,” ICEEM 2021 - 2nd IEEE International Conference on Electronic Engineering, Jul. 2021, doi: 10.1109/ICEEM52022.2021.9480648.
S. Wang, Y. Wang, and M. Tang, “Auto Malicious Websites Classification Based on Naive Bayes Classifier,” Proceedings of 2020 IEEE 3rd International Conference on Information Systems and Computer Aided Education, ICISCAE 2020, pp. 443–447, Sep. 2020, doi: 10.1109/ICISCAE51034.2020.9236912.
S. Wang, Y. Wang, and M. Tang, “Auto Malicious Websites Classification Based on Naive Bayes Classifier,” Proceedings of 2020 IEEE 3rd International Conference on Information Systems and Computer Aided Education, ICISCAE 2020, pp. 443–447, Sep. 2020, doi: 10.1109/ICISCAE51034.2020.9236912.
W. Fadheel, W. Al-Mawee, and S. Carr, “On Phishing: URL Lexical and Network Traffic Features Analysis and Knowledge Extraction using Machine Learning Algorithms (A Comparison Study),” 2022 5th International Conference on Data Science and Information Technology, DSIT 2022 - Proceedings, 2022, doi: 10.1109/DSIT55514.2022.9943832.
C. Liu and G. Wang, “Analysis and detection of spam accounts in social networks,” 2016 2nd IEEE International Conference on Computer and Communications, ICCC 2016 - Proceedings, pp. 2526–2530, May 2017, doi: 10.1109/COMPCOMM.2016.7925154.
Subasi, A.; Balfaqih, M.; Balfagih, Z.; Alfawwaz, K. A comparative evaluation of ensemble classifiers for malicious webpage detection. Procedia Comput. Sci. 2021, 194, 272–279.
Sayamber, A.B.; Dixit, A.M. Malicious URL detection and identification. Int. J. Comput. Appl. 2014, 99, 17–23.
Jian, L.; Gang, Z.; Yunpeng, Z. Design and implementation of malicious URL multi-layer filtering detection model. Inf. Netw. Secur. 2016, 1, 6.
Vundavalli, V.; Barsha, F.; Masum, M.; Shahriar, H.; Haddad, H. Malicious URL detection using supervised machine learning techniques. In Proceedings of the 13th International Conference on Security of Information and Networks, Merkez, Turkey, 4–7 November 2020; pp. 1–6.
Rahman SS, M.M.; Islam, T.; Jabiullah, M.I. PhishStack: Evaluation of stacked generalization in phishing URLs detection. Procedia Comput. Sci. 2020, 167, 2410–2418.
Zeyu, L.; Yong, S.; Zhi, X. Malicious URL recognition based on machine learning. Commun. Technol. 2020, 53.
Pham TT, T.; Hoang, V.N.; Ha, T.N. Exploring efficiency of character-level convolution neuron network and long short-term memory on malicious URL detection. In Proceedings of the 2018 VII International Conference on Network, Communication and Computing, Taipei City, Taiwan, 14–16 December 2018; pp. 82–86.
Chen, Z.; Liu, Y.; Chen, C.; Lu, M.; Zhang, X. Malicious URL detection based on improved multilayer recurrent convolutional neural network model. Secur. Commun. Netw. 2021, 2021, 9994127.
Li, T.; Kou, G.; Peng, Y. Improving malicious URLs detection via feature engineering: Linear and nonlinear space transformation methods. Inf. Syst. 2020, 91, 101494.
Kumi, S.; Lim, C.H.; Lee, S.G. Malicious URL detection based on associative classification. Entropy 2021, 23, 182.
Raja, A.S.; Vinodini, R.; Kavitha, A. Lexical features based malicious URL detection using machine learning techniques. Mater. Today: Proc. 2021, 47 Pt 1, 163–166.
Joshi, A.; Lloyd, L.; Westin, P.; Seethapathy, S. Using lexical features for malicious URL detection—A machine learning approach. arXiv 2019, arXiv:1910.06277.
Kang, C.; Huazheng, F.; Yong, X. Malicious URL identification based on deep learning. Comput. Syst. Appl. 2018, 27, 27–33.
Yuan, J.T.; Liu, Y.P.; Yu, L. A novel approach for malicious URL detection based on the joint model. Secur. Commun. Netw. 2021, 2021, 4917016.
Le, H.; Pham, Q.; Sahoo, D.; Hoi, S.C. URLNet: Learning a URL representation with deep learning for malicious URL detection. arXiv 2018, arXiv:1802.03162.
Yuan, J.; Chen, G.; Tian, S.; Pei, X. Malicious URL detection based on a parallel neural joint model. IEEE Access 2021, 9, 9464–9472.
Zhao, G.; Wang, P.; Wang, X.; Jin, W.; Wu, X. Two-dimensional code malicious URL detection method based on decision tree. Inf. Secur. Technol. 2014, 5, 36–39.
Liu, C.; Wang, L.; Lang, B.; Zhou, Y. Finding effective classifier for malicious URL detection. In Proceedings of the 2018 2nd International Conference on Management Engineering, Software Engineering and Service Sciences, Wuhan, China, 13–15 January 2018; pp. 240–244.
Lin, H.L.; Li, Y.; Wang, W.P.; Yue, Y.L.; Lin, Z. Efficient malicious URL detection method based on segment pattern. Commun. J. 2015, 36, 141–148.
Subasi, A.; Balfaqih, M.; Balfagih, Z.; Alfawwaz, K. A comparative evaluation of ensemble classifiers for malicious webpage detection. Procedia Comput. Sci. 2021, 194, 272–279.
Sayamber, A.B.; Dixit, A.M. Malicious URL detection and identification. Int. J. Comput. Appl. 2014, 99, 17–23.
Jian, L.; Gang, Z.; Yunpeng, Z. Design and implementation of malicious URL multi-layer filtering detection model. Inf. Netw. Secur. 2016, 1, 6.
Vundavalli, V.; Barsha, F.; Masum, M.; Shahriar, H.; Haddad, H. Malicious URL detection using supervised machine learning techniques. In Proceedings of the 13th International Conference on Security of Information and Networks, Merkez, Turkey, 4–7 November 2020; pp. 1–6.
Rahman SS, M.M.; Islam, T.; Jabiullah, M.I. PhishStack: Evaluation of stacked generalization in phishing URLs detection. Procedia Comput. Sci. 2020, 167, 2410–2418.
Zeyu, L.; Yong, S.; Zhi, X. Malicious URL recognition based on machine learning. Commun. Technol. 2020.
Pham TT, T.; Hoang, V.N.; Ha, T.N. Exploring efficiency of character-level convolution neuron network and long short-term memory on malicious URL detection. In Proceedings of the 2018 VII International Conference on Network, Communication and Computing, Taipei City, Taiwan, 14–16 December 2018; pp. 82–86.
Chen, Z.; Liu, Y.; Chen, C.; Lu, M.; Zhang, X. Malicious URL detection based on improved multilayer recurrent convolutional neural network model. Secur. Commun. Netw. 2021, 2021, 9994127.
Li, T.; Kou, G.; Peng, Y. Improving malicious URLs detection via feature engineering: Linear and nonlinear space transformation methods. Inf. Syst. 2020, 91, 101494.

There are 43 citations in total.

Details

Primary Language	English
Subjects	Computer Software
Journal Section	Articles
Authors	Murat Koca 0000-0002-6048-7645 İsa Avcı 0000-0001-7032-8018 Mohammed Abdulkareem Shakir Al-hayani 0009-0002-3847-9603
Early Pub Date	August 27, 2023
Publication Date	August 31, 2023
Submission Date	March 30, 2023
Acceptance Date	May 27, 2023
Published in Issue	Year 2023

Cite

APA	Koca, M., Avcı, İ., & Al-hayani, M. A. S. (2023). Classification of Malicious URLs Using Naive Bayes and Genetic Algorithm. Sakarya University Journal of Computer and Information Sciences, 6(2), 80-90. https://doi.org/10.35377/saucis...1273536
AMA	Koca M, Avcı İ, Al-hayani MAS. Classification of Malicious URLs Using Naive Bayes and Genetic Algorithm. SAUCIS. August 2023;6(2):80-90. doi:10.35377/saucis.1273536
Chicago	Koca, Murat, İsa Avcı, and Mohammed Abdulkareem Shakir Al-hayani. “Classification of Malicious URLs Using Naive Bayes and Genetic Algorithm”. Sakarya University Journal of Computer and Information Sciences 6, no. 2 (August 2023): 80-90. https://doi.org/10.35377/saucis. 1273536.
EndNote	Koca M, Avcı İ, Al-hayani MAS (August 1, 2023) Classification of Malicious URLs Using Naive Bayes and Genetic Algorithm. Sakarya University Journal of Computer and Information Sciences 6 2 80–90.
IEEE	M. Koca, İ. Avcı, and M. A. S. Al-hayani, “Classification of Malicious URLs Using Naive Bayes and Genetic Algorithm”, SAUCIS, vol. 6, no. 2, pp. 80–90, 2023, doi: 10.35377/saucis...1273536.
ISNAD	Koca, Murat et al. “Classification of Malicious URLs Using Naive Bayes and Genetic Algorithm”. Sakarya University Journal of Computer and Information Sciences 6/2 (August 2023), 80-90. https://doi.org/10.35377/saucis. 1273536.
JAMA	Koca M, Avcı İ, Al-hayani MAS. Classification of Malicious URLs Using Naive Bayes and Genetic Algorithm. SAUCIS. 2023;6:80–90.
MLA	Koca, Murat et al. “Classification of Malicious URLs Using Naive Bayes and Genetic Algorithm”. Sakarya University Journal of Computer and Information Sciences, vol. 6, no. 2, 2023, pp. 80-90, doi:10.35377/saucis. 1273536.
Vancouver	Koca M, Avcı İ, Al-hayani MAS. Classification of Malicious URLs Using Naive Bayes and Genetic Algorithm. SAUCIS. 2023;6(2):80-9.

Cited By

Web Security in the Digital Age

International Journal on Semantic Web and Information Systems

https://doi.org/10.4018/IJSWIS.369823

Article Files

Full Text

INDEXING & ABSTRACTING & ARCHIVING

29070 The papers in this journal are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License