Research Article
BibTex RIS Cite
Year 2020, , 239 - 249, 30.12.2020
https://doi.org/10.35377/saucis.03.03.735463

Abstract

References

  • H. Faris, I. Aljarah, and B. Al-Shboul, "A hybrid approach based on particle swarm optimization and random forests for e-mail spam filtering," in International Conference on Computational Collective Intelligence, 2016: Springer, pp. 498-508.
  • R. Varghese and K. Dhanya, "Efficient feature set for spam Email filtering," in 2017 IEEE 7th International Advance Computing Conference (IACC), 2017: IEEE, pp. 732-737.
  • M. Diale, T. Celik, and C. Van Der Walt, "Unsupervised feature learning for spam email filtering," Computers & Electrical Engineering, vol. 74, pp. 89-104, 2019.
  • M. A. Shafi’I et al., "A review on mobile SMS spam filtering techniques," IEEE Access, vol. 5, pp. 15650-15666, 2017.
  • K. O. Kawade and K. S. Oza, "Content-based SMS spam filtering using machine learning technique," International Journal of Computer Engineering and Applications, vol. 7, p. 4, 2018.
  • T. H. Apandi and C. A. Sugianto, "Analisis Komparasi Machine Learning Pada Data Spam Sms," Jurnal TEDC, vol. 12, no. 1, pp. 58-62, 2019.
  • S. J. Delany, M. Buckley, and D. Greene, "SMS spam filtering: Methods and data," Expert Systems with Applications, vol. 39, no. 10, pp. 9899-9908, 2012.
  • J. M. Gómez Hidalgo, G. C. Bringas, E. P. Sánz, and F. C. García, "Content based SMS spam filtering," in Proceedings of the 2006 ACM Symposium on Document Engineering, 2006, pp. 107-114.
  • G. V. Cormack, J. M. G. Hidalgo, and E. P. Sánz, "Feature engineering for mobile (SMS) spam filtering," in Proceedings of the 30th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, 2007, pp. 871-872.
  • T. Almeida, J. M. G. Hidalgo, and T. P. Silva, "Towards sms spam filtering: Results under a new dataset," International Journal of Information Security Science, vol. 2, no. 1, pp. 1-18, 2013.
  • M. T. Nuruzzaman, C. Lee, and D. Choi, "Independent and personal SMS spam filtering," in 2011 IEEE 11th International Conference on Computer and Information Technology, 2011: IEEE, pp. 429-435.
  • M. B. Junaid and M. Farooq, "Using evolutionary learning classifiers to do MobileSpam (SMS) filtering," in Proceedings of the 13th annual conference on Genetic and evolutionary computation, 2011, pp. 1795-1802.
  • A. K. Uysal, S. Gunal, S. Ergin, and E. S. Gunal, "A novel framework for SMS spam filtering," in 2012 International Symposium on Innovations in Intelligent Systems and Applications, 2012: IEEE, pp. 1-4.
  • J. W. Yoon, H. Kim, and J. H. Huh, "Hybrid spam filtering for mobile communication," Computers & Security, vol. 29, no. 4, pp. 446-459, 2010.
  • H. Najadat, N. Abdulla, R. Abooraig, and S. Nawasrah, "Mobile sms spam filtering based on mixing classifiers," International Journal of Advanced Computing Research, vol. 1, pp. 1-7, 2014.
  • H.-Y. Lee and S.-S. Kang, "Word Embedding Method of SMS Messages for Spam Message Filtering," in 2019 IEEE International Conference on Big Data and Smart Computing (BigComp), 2019: IEEE, pp. 1-4.
  • A. K. Jain, S. K. Yadav, and N. Choudhary, "A Novel Approach to Detect Spam and Smishing SMS using Machine Learning Techniques," International Journal of E-Services and Mobile Applications (IJESMA), vol. 12, no. 1, pp. 21-38, 2020.
  • A. K. Uysal, S. Gunal, S. Ergin, and E. S. Gunal, "The impact of feature extraction and selection on SMS spam filtering," Elektronika ir Elektrotechnika, vol. 19, no. 5, pp. 67-73, 2013.
  • Y.-T. Chen and M. C. Chen, "Using chi-square statistics to measure similarities for text categorization," Expert systems with applications, vol. 38, no. 4, pp. 3085-3090, 2011.
  • K. Sparck Jones, "A Statistical Interpretation of Term Specificity and Its Application in Retrieval," Journal of Documentation, vol. 28, no. 1, pp. 11-21, 2004
  • Y. Liu, H. T. Loh, and A. Sun, "Imbalanced text classification: A term weighting approach," Expert Systems with Applications, vol. 36, no. 1, pp. 690-701, 2009
  • A. K. Uysal and S. Gunal, "A novel probabilistic feature selection method for text classification," Knowledge-Based Systems, vol. 36, pp. 226-235, 2012
  • M. Lan, C. L. Tan, J. Su, and Y. Lu, "Supervised and traditional term weighting methods for automatic text categorization," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 4, pp. 721-735, 2009.
  • K. Chen, Z. Zhang, J. Long, and H. Zhang, "Turning from TF-IDF to TF-IGM for term weighting in text classification," Expert Systems with Applications, vol. 66, pp. 245-260, 2016
  • T. Dogan and A. K. Uysal, "Improved inverse gravity moment term weighting for text classification," Expert Systems with Applications, vol. 130, pp. 45-59, 2019.
  • C.-C. Chang and C.-J. Lin, "LIBSVM: a library for support vector machines," ACM Transactions on Intelligent Systems and Technology (TIST), vol. 2, no. 3, p. 27, 2011.

On Term Weighting for Spam SMS Filtering

Year 2020, , 239 - 249, 30.12.2020
https://doi.org/10.35377/saucis.03.03.735463

Abstract

Due to rapid development of the technology, the usage of mobile telephones and short message services (SMS) have become widespread. Thus, the number of spam SMS messages has dramatically increased and the significance of identifying and filtering of suchlike messages raised. Moreover, since they have also risk to steal users’ personal information; the problem of identifying and filtering of Spam SMS messages stays popular in terms of also information and data security. In this study, the classification performances of five different term weighting methods on three different datasets containing SMS messages categorized as Spam and legitimate are compared by using two classifiers for corresponding problem. The results obtained showed that reasonable weighting of SMS contents plays an important role in identifying of spam SMS messages. On the other hand, it can be expressed that real classification potential of term weighting schemes reflected betterly the with feature vectors created by using fifty and higher number of terms on especially Turkish and English SMS message datasets. In addition, it has been observed that value ranges of the classification results of obtained from term weighting methods on Turkish SMS message dataset is wider for than ones obtained in English SMS message datasets.

References

  • H. Faris, I. Aljarah, and B. Al-Shboul, "A hybrid approach based on particle swarm optimization and random forests for e-mail spam filtering," in International Conference on Computational Collective Intelligence, 2016: Springer, pp. 498-508.
  • R. Varghese and K. Dhanya, "Efficient feature set for spam Email filtering," in 2017 IEEE 7th International Advance Computing Conference (IACC), 2017: IEEE, pp. 732-737.
  • M. Diale, T. Celik, and C. Van Der Walt, "Unsupervised feature learning for spam email filtering," Computers & Electrical Engineering, vol. 74, pp. 89-104, 2019.
  • M. A. Shafi’I et al., "A review on mobile SMS spam filtering techniques," IEEE Access, vol. 5, pp. 15650-15666, 2017.
  • K. O. Kawade and K. S. Oza, "Content-based SMS spam filtering using machine learning technique," International Journal of Computer Engineering and Applications, vol. 7, p. 4, 2018.
  • T. H. Apandi and C. A. Sugianto, "Analisis Komparasi Machine Learning Pada Data Spam Sms," Jurnal TEDC, vol. 12, no. 1, pp. 58-62, 2019.
  • S. J. Delany, M. Buckley, and D. Greene, "SMS spam filtering: Methods and data," Expert Systems with Applications, vol. 39, no. 10, pp. 9899-9908, 2012.
  • J. M. Gómez Hidalgo, G. C. Bringas, E. P. Sánz, and F. C. García, "Content based SMS spam filtering," in Proceedings of the 2006 ACM Symposium on Document Engineering, 2006, pp. 107-114.
  • G. V. Cormack, J. M. G. Hidalgo, and E. P. Sánz, "Feature engineering for mobile (SMS) spam filtering," in Proceedings of the 30th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, 2007, pp. 871-872.
  • T. Almeida, J. M. G. Hidalgo, and T. P. Silva, "Towards sms spam filtering: Results under a new dataset," International Journal of Information Security Science, vol. 2, no. 1, pp. 1-18, 2013.
  • M. T. Nuruzzaman, C. Lee, and D. Choi, "Independent and personal SMS spam filtering," in 2011 IEEE 11th International Conference on Computer and Information Technology, 2011: IEEE, pp. 429-435.
  • M. B. Junaid and M. Farooq, "Using evolutionary learning classifiers to do MobileSpam (SMS) filtering," in Proceedings of the 13th annual conference on Genetic and evolutionary computation, 2011, pp. 1795-1802.
  • A. K. Uysal, S. Gunal, S. Ergin, and E. S. Gunal, "A novel framework for SMS spam filtering," in 2012 International Symposium on Innovations in Intelligent Systems and Applications, 2012: IEEE, pp. 1-4.
  • J. W. Yoon, H. Kim, and J. H. Huh, "Hybrid spam filtering for mobile communication," Computers & Security, vol. 29, no. 4, pp. 446-459, 2010.
  • H. Najadat, N. Abdulla, R. Abooraig, and S. Nawasrah, "Mobile sms spam filtering based on mixing classifiers," International Journal of Advanced Computing Research, vol. 1, pp. 1-7, 2014.
  • H.-Y. Lee and S.-S. Kang, "Word Embedding Method of SMS Messages for Spam Message Filtering," in 2019 IEEE International Conference on Big Data and Smart Computing (BigComp), 2019: IEEE, pp. 1-4.
  • A. K. Jain, S. K. Yadav, and N. Choudhary, "A Novel Approach to Detect Spam and Smishing SMS using Machine Learning Techniques," International Journal of E-Services and Mobile Applications (IJESMA), vol. 12, no. 1, pp. 21-38, 2020.
  • A. K. Uysal, S. Gunal, S. Ergin, and E. S. Gunal, "The impact of feature extraction and selection on SMS spam filtering," Elektronika ir Elektrotechnika, vol. 19, no. 5, pp. 67-73, 2013.
  • Y.-T. Chen and M. C. Chen, "Using chi-square statistics to measure similarities for text categorization," Expert systems with applications, vol. 38, no. 4, pp. 3085-3090, 2011.
  • K. Sparck Jones, "A Statistical Interpretation of Term Specificity and Its Application in Retrieval," Journal of Documentation, vol. 28, no. 1, pp. 11-21, 2004
  • Y. Liu, H. T. Loh, and A. Sun, "Imbalanced text classification: A term weighting approach," Expert Systems with Applications, vol. 36, no. 1, pp. 690-701, 2009
  • A. K. Uysal and S. Gunal, "A novel probabilistic feature selection method for text classification," Knowledge-Based Systems, vol. 36, pp. 226-235, 2012
  • M. Lan, C. L. Tan, J. Su, and Y. Lu, "Supervised and traditional term weighting methods for automatic text categorization," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 4, pp. 721-735, 2009.
  • K. Chen, Z. Zhang, J. Long, and H. Zhang, "Turning from TF-IDF to TF-IGM for term weighting in text classification," Expert Systems with Applications, vol. 66, pp. 245-260, 2016
  • T. Dogan and A. K. Uysal, "Improved inverse gravity moment term weighting for text classification," Expert Systems with Applications, vol. 130, pp. 45-59, 2019.
  • C.-C. Chang and C.-J. Lin, "LIBSVM: a library for support vector machines," ACM Transactions on Intelligent Systems and Technology (TIST), vol. 2, no. 3, p. 27, 2011.
There are 26 citations in total.

Details

Primary Language English
Subjects Computer Software
Journal Section Articles
Authors

Turgut Dogan 0000-0003-2690-4019

Publication Date December 30, 2020
Submission Date May 11, 2020
Acceptance Date November 14, 2020
Published in Issue Year 2020

Cite

IEEE T. Dogan, “On Term Weighting for Spam SMS Filtering”, SAUCIS, vol. 3, no. 3, pp. 239–249, 2020, doi: 10.35377/saucis.03.03.735463.

29070    The papers in this journal are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License