Sentiment Analysis for Software Engineering Domain in Turkish

Mansur Alp Toçoğlu

doi:10.35377/saucis.03.03.769969

EN

Sentiment Analysis for Software Engineering Domain in Turkish

Abstract

The focus of this study is to provide a model to be used for the identification of sentiments of comments about education and profession life of software engineering in social media and microblogging sites. Such a pre-trained model can be useful to evaluate students’ and software engineers’ feedbacks about software engineering. This problem is considered as a supervised text classification problem, which thereby requires a dataset for the training process. To do so, a survey is conducted among students of a software engineering department. In the classification phase, we represent the corpus by using conventional and word-embedding text representation schemes and yield accuracy, recall and precision results by using conventional supervised machine learning classifiers and well-known deep learning architectures. In the experimental analysis, first we focus on achieving classification results by using three conventional text representation schemes and three N-gram models in conjunction with five classifiers (i.e., naïve bayes, k-nearest neighbor algorithm, support vector machines, random forest and logistic regression). In addition, we evaluate the performances of three ensemble learners and three deep learning architectures (i.e. convolutional neural network, recurrent neural network, and long short-term memory). The empirical results indicate that deep learning architectures outperform conventional supervised machine learning classifiers and ensemble learners.

Keywords

References

B. Pang and L. Lee, “Opinion mining and sentiment analysis,” Found. Trends Inf. Retr., pp.1–135, 2008.
E. Fersini, E. Messina, and F. A. Pozzi, “Sentiment analysis: Bayesian Ensemble Learning,” Decis. Support Syst., vol. 68, pp.26–38, 2014.
B. Lin, F. Zampetti, G. Bavota, M. Di Penta, M. Lanza, and R. Oliveto, “Sentiment Analysis for Software Engineering: How Far CanWe Go?”, Proc. - 40th International Conference on Software Engineering, pp. 94–104, 2018.
E. Guzman, D. Azócar, and Y. Li, “Sentiment Analysis of Commit Comments in GitHub: An Empirical Study,” Proc. - 11thWorking Conference on Mining Software Repositories, pp. 352–355, 2014.
M. Goul, O. Marjanovic, S. Baxley, and K. Vizecky, “Managing the Enterprise Business Intelligence App Store: Sentiment Analysis Supported Requirements Engineering,” Proc. - 45th Hawaii International Conference on System Sciences, pp. 4168–4177, 2012.
M. Ortu, B. Adams, G. Destefanis, P. Tourani, M. Marchesi, and R. Tonelli, “Are Bullies More Productive? Empirical Study of Affectiveness vs. Issue Fixing Time,” Proc. - 12th Working Conference on Mining Software Repositories, pp. 303–313, 2015.
F. Calefato, F. Lanubile, and N. Novielli, “EmoTxt: A Toolkit for Emotion Recognition from Text,” Proc. - 7th International Conference on Affective Computing and Intelligent Interaction, pp. 79–80, 2017.
M. Goul, O. Marjanovic, S. Baxley, and K. Vizecky, “Managing the Enterprise Business Intelligence App Store: Sentiment Analysis Supported Requirements Engineering,” Proc. - 45th Hawaii International Conference on System Sciences, pp. 4168–4177, 2012.

L. V. G. Carreno and K. Winbladh, “Analysis of User Comments: An Approach for Software Requirements Evolution,” Proc. - 35th International Conference on Software Engineering, pp. 582–591, 2013.
E. Guzman, O. Aly, and B. Bruegge, “Retrieving Diverse Opinions from App Reviews”, Proc. - 9th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, pp.21–30, 2015.
M. Thelwall, K. Buckley, G. Paltoglou, D. Cai, and A. Kappas, “Sentiment in short strength detection informal text,” J. Am. Soc. Inf. Sci. Technol., vol. 61, no. 12, pp. 2544–2558, 2010.
S. Panichella, A. D. Sorbo, E. Guzman, C. A. Visaggio,G. Canfora, and . C. Gall, “How Can I Improve My App? Classifying User Reviews for Software Maintenance and Evolution,” Proc. - 31st International Conference on Software Maintenance and Evolution, pp. 281–290, 2015.
E. Guzman, R. Alkadhi, and N. Seyff, “An exploratory study of Twitter messages about software applications,” Requir. Eng., vol. 22, pp. 387–412, 2017.
F. Calefato, F. Lanubile, F. Maiorano, and N. Novielli, “Sentiment polarity detection for software development,” Empir. Software Eng., vol. 23, pp. 1352–1382, 2018.
L. Zhao, and A Zhao, “Sentiment analysis based requirement evolution prediction,” Future Internet, vol. 11, no. 2, article no. 5, 2019.
F. Sağlam, H. Sever and B. Genç, “Developing Turkish Sentiment Lexicon for Sentiment Analysis using Online News Media,” Proc. - 13th International Conference of Computer Systems and Applications, pp. 1–5, 2016.
K. Bayraktar, U. Yavanoglu and A. Ozbilen, “A Rule-Based Holistic Approach for Turkish Aspect-Based Sentiment Analysis,” Proc. - IEEE International Conference on Big Data, pp. 2154–2158, 2019.
M. Rumelli, D. Akkuş, Ö. Kart and Z. Isik, “Sentiment Analysis in Turkish Text with Machine Learning Algorithms,” Proc. - Innovations in Intelligent Systems and Applications Conference, pp. 1–5, 2019.
B. Ciftci and M. S. Apaydin, “A Deep Learning Approach to Sentiment Analysis in Turkish,” Proc. - International Conference on Artificial Intelligence and Data Processing, pp. 1–5, 2018.
A. A. Karcioğlu and T. Aydin, “Sentiment Analysis of Turkish and English Twitter Feeds Using Word2Vec Model,” Proc. - 27th Signal Processing and Communications Applications Conference, pp. 1–4, 2019.
D. Ayata, M. Saraçlar and A. Özgür, “Turkish Tweet Sentiment Analysis with Word Embedding and Machine Learning,” Proc. - 25th Signal Processing and Communications Applications Conference, pp. 1–4, 2017.
A. Onan, “Mining opinions from instructor evaluation reviews: A deep learning approach,” Comput. Appl. Eng. Educ., vol. 28, no. 1, pp. 117–138, 2020.
E. Stamatatos, “A survey of modern authorship attribution methods,” J. Am. Soc. Inf. Sci. Technol., vol. 60, no. 3, pp. 538–556, 2009.
M. F. Porter, “Snowball: A language for stemming algorithms,” 2001.
S. Bird, and E. Loper, “NLTK : The Natural Language Toolkit NLTK : The Natural Language Toolkit,” Proc. - Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, pp. 63–70, 2016.
C. C. Aggarwal and C. X. Zhai, “A survey of text clustering algorithms,” in Mining Text Data, pp.77–128, 2012.
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” Proc. - Advances in Neural Information Processing Systems, pp. 3111–3119, 2013.
Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, “A Neural Probabilistic Language Model,” 2003. J. Mach. Learn. Research, vol. 3, pp. 1137–1155, 2003.
H. Zhang, “The Optimality of Naive Bayes,” Proc. - 17th International Florida Artificial Intelligence Research Society Conference, pp. 562–567, 2004.
C. Cortes and V. Vapnik, “Support-Vector Networks,” Mach. Learn., vol. 20, no. 3, pp. 273–297, 1995.
L. Breiman, “Random forests,” Mach. Learn., vol. 45, pp. 5–32, 2001.
M. Kantardzic, Data Mining: Concepts, Models, Methods, and Algorithms: Second Edition. Wiley, Hoboken, 2011.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Proc. - 25th International Conference on Neural Information Processing Systems, pp. 1097-1105, 2012.
S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
X. Li et al., “Long short-term memory neural network for air pollutant concentration predictions: Method development and evaluation,” Environ. Pollut., vol. 231, pp. 997–1004, 2017.
A. Onan, S. Korukoǧlu, and H. Bulut, “Ensemble of keyword extraction methods and classifiers in text classification,” Expert Syst. Appl., vol. 57, pp. 232–247, 2016.
Z.H. Zhou, “Ensemble Methods: Foundations and Algorithm,” UK: CRC Press, 2012.
L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, pp. 123–140, 1996.
T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
NLPL word embeddings repository, “word embeddings repository homepage,” 2017. [Online]. Available: http://vectors.nlpl.eu/repository/. [Accessed: 25-Nov-2020].
W. Yin, K. Kann, M. Yu, and H. Schutze, “Comparative study of CNN and RNN for natural language processing,” arXiv preprint arXiv:1702.01923, 2017.
D. Tang, B. Qin, and T. Liu, “Document Modeling with Gated Recurrent Neural Network for Sentiment Classification,” Proc. - Conference on Empirical Methods in Natural Language Processing, pp. 1422–1432, 2015.
Y. N. Dauphin, A. Fan, M. Auli, and D. Grangier, “Language Modeling with Gated Convolutional Networks,” arXiv preprint arXiv:1612.08083, 2016.

Details

Primary Language

English

Subjects

Artificial Intelligence , Software Engineering

Journal Section

Research Article

Authors

Mansur Alp Toçoğlu ^*
0000-0003-1784-9003
Türkiye

Publication Date

December 30, 2020

Submission Date

July 15, 2020

Acceptance Date

December 7, 2020

Published in Issue

Year 1970 Volume: 3 Number: 3

DOI

https://doi.org/10.35377/saucis.03.03.769969

IZ

https://izlik.org/JA63WY79WD

Cite

RIS / Bibtex

APA

Toçoğlu, M. A. (2020). Sentiment Analysis for Software Engineering Domain in Turkish. Sakarya University Journal of Computer and Information Sciences, 3(3), 296-308. https://doi.org/10.35377/saucis.03.03.769969

AMA

1.Toçoğlu MA. Sentiment Analysis for Software Engineering Domain in Turkish. SAUCIS. 2020;3(3):296-308. doi:10.35377/saucis.03.03.769969

Chicago

Toçoğlu, Mansur Alp. 2020. “Sentiment Analysis for Software Engineering Domain in Turkish”. Sakarya University Journal of Computer and Information Sciences 3 (3): 296-308. https://doi.org/10.35377/saucis.03.03.769969.

EndNote

Toçoğlu MA (December 1, 2020) Sentiment Analysis for Software Engineering Domain in Turkish. Sakarya University Journal of Computer and Information Sciences 3 3 296–308.

IEEE

[1]M. A. Toçoğlu, “Sentiment Analysis for Software Engineering Domain in Turkish”, SAUCIS, vol. 3, no. 3, pp. 296–308, Dec. 2020, doi: 10.35377/saucis.03.03.769969.

ISNAD

Toçoğlu, Mansur Alp. “Sentiment Analysis for Software Engineering Domain in Turkish”. Sakarya University Journal of Computer and Information Sciences 3/3 (December 1, 2020): 296-308. https://doi.org/10.35377/saucis.03.03.769969.

JAMA

1.Toçoğlu MA. Sentiment Analysis for Software Engineering Domain in Turkish. SAUCIS. 2020;3:296–308.

MLA

Toçoğlu, Mansur Alp. “Sentiment Analysis for Software Engineering Domain in Turkish”. Sakarya University Journal of Computer and Information Sciences, vol. 3, no. 3, Dec. 2020, pp. 296-08, doi:10.35377/saucis.03.03.769969.

Vancouver

1.Mansur Alp Toçoğlu. Sentiment Analysis for Software Engineering Domain in Turkish. SAUCIS. 2020 Dec. 1;3(3):296-308. doi:10.35377/saucis.03.03.769969