Investigating the Robustness of Text Mining Classification Algorithms: A Study of Algorithm and Expert Performance on Class-Inconsistent Data

Hüseyin Ataseven; Ömay Çokluk-bökeoglu

doi:10.35377/saucis...1626239

Research Article

Year 2025, Volume: 8 Issue: 3, 422 - 440, 30.09.2025

Hüseyin Ataseven , Ömay Çokluk-bökeoglu

https://doi.org/10.35377/saucis...1626239

Abstract

References

P. Baldi, S. Brunak, Y. Chauvin, C. A. F. Andersen, and H. Nielsen, “Assessing the accuracy of prediction algorithms for classification: an overview,” Bioinformatics, vol. 16, no. 5, pp. 412–424, May 2000, doi: 10.1093/bioinformatics/16.5.412.
K. Kowsari, K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text classification algorithms: A survey,” Information, vol. 10, no. 4, Art. no. 150, Apr. 2019, doi: 10.3390/info10040150.
J. Riggs and T. Lalonde, Handbook for Applied Modeling: Non-Gaussian and Correlated Data. Cambridge, U.K.: Cambridge Univ. Press, 2017.
T. W. Anderson, An Introduction to Multivariate Statistical Analysis, 3rd ed. Hoboken, NJ, USA: Wiley-Interscience, 2003.
R. A. Fisher, “The use of multiple measurements in taxonomic problems,” Ann. Eugenics, vol. 7, no. 2, pp. 179–188, Sep. 1936.
S. Har-Peled, D. Roth, and D. Zimak, “Constraint classification for multiclass classification and ranking,” in Proc. 16th Int. Conf. Neural Inf. Process. Syst. (NIPS), Vancouver, BC, Canada, Dec. 2002, pp. 809–816.
N. Matloff, Statistical Regression and Classification: From Linear Models to Machine Learning. Boca Raton, FL, USA: CRC Press, 2017.
E. Apostolova and R. A. Kreek, “Training and prediction data discrepancies: Challenges of text classification with noisy, historical data,” in Proc. 2018 EMNLP Workshop W-NUT: 4th Workshop on Noisy User-Generated Text, Brussels, Belgium, Nov. 2018, pp. 104–109. doi: 10.18653/v1/W18-6114.
J. Wainer, “Comparison of 14 different families of classification algorithms on 115 binary datasets,” arXiv preprint, Jun. 2016, [Online]. Available: https://arxiv.org/abs/1606.00930v1
A. Hotho, A. Nürnberger, and G. Paaß, “A brief survey of text mining,” J. Lang. Technol. Comput. Linguist., vol. 20, no. 1, pp. 19–62, Jul. 2005, doi: 10.21248/JLCL.20.2005.68.
C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. Cambridge, U.K.: Cambridge Univ. Press, 2008. doi: 10.1017/CBO9780511809071.
T. Graepel, J. Q. Candela, T. Borchert, and R. Herbrich, “Web-scale Bayesian click-through rate prediction for sponsored search advertising in Microsoft’s Bing search engine,” 2010, [Online]. Available: https://discovery.ucl.ac.uk/id/eprint/1395202/
M. Richardson, E. Dominowska, and R. Ragno, “Predicting clicks: Estimating the click-through rate for new ads,” in Proc. 16th Int. World Wide Web Conf. (WWW), Banff, AB, Canada, 2007, pp. 521–530. doi: 10.1145/1242572.1242643.
Y. Yang, Y. Yang, B. Jansen, and M. Lalmas, “Computational advertising: A paradigm shift for advertising and marketing?,” IEEE Intell. Syst., vol. 32, no. 3, pp. 3–6, May 2017, doi: 10.1109/MIS.2017.58.
J. Burrell, “How the machine ‘thinks’: Understanding opacity in machine learning algorithms,” Big Data Soc., vol. 3, no. 1, Jan. 2016, doi: 10.1177/2053951715622512.
N. Jindal and B. Liu, “Opinion spam and analysis,” in Proc. 2008 Int. Conf. Web Search and Data Mining (WSDM), Palo Alto, CA, USA, Feb. 2008, pp. 219–229. doi: 10.1145/1341531.1341560.
E. P. Lim, V. A. Nguyen, N. Jindal, B. Liu, and H. W. Lauw, “Detecting product review spammers using rating behaviors,” in Proc. 19th ACM Int. Conf. Inf. Knowl. Manag. (CIKM), Toronto, ON, Canada, Oct. 2010, pp. 939–948. doi: 10.1145/1871437.1871557.
S. Redhu, “Sentiment analysis using text mining: A review,” Int. J. Data Sci. Technol., vol. 4, no. 2, p. 49, 2018, doi: 10.11648/J.IJDST.20180402.12.
C. Zucco, B. Calabrese, G. Agapito, P. H. Guzzi, and M. Cannataro, “Sentiment analysis for mining texts and social networks data: Methods and tools,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 10, no. 1, p. e1333, Jan. 2020, doi: 10.1002/widm.1333.
A. Dhar, H. Mukherjee, K. Roy, K. C. Santosh, and N. S. Dash, “Hybrid approach for text categorization: A case study with Bangla news article,” J. Inf. Sci., vol. 49, no. 3, pp. 762–777, Jun. 2023, doi: 10.1177/01655515211027770.
H. Gomes, M. de Castro Neto, and R. Henriques, “Text mining: Sentiment analysis on news classification,” in Proc. 8th Iberian Conf. Inf. Syst. Technol. (CISTI), Lisboa, Portugal, Jun. 2013, pp. 1–6.
M. Jamaluddin and A. D. Wibawa, “Patient diagnosis classification based on electronic medical record using text mining and support vector machine,” in Proc. 2021 Int. Seminar Appl. Technol. Inf. Commun. (iSemantic), Semarang, Indonesia, Sep. 2021, pp. 243–248. doi: 10.1109/isemantic52711.2021.9573178.
C. Luque, J. M. Luna, M. Luque, and S. Ventura, “An advanced review on text mining in medicine,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 9, no. 3, p. e1302, May 2019, doi: 10.1002/widm.1302.
Y. L. Chen, Y. H. Liu, and W. L. Ho, “A text mining approach to assist the general public in the retrieval of legal documents,” J. Am. Soc. Inf. Sci. Technol., vol. 64, no. 2, pp. 280–290, Feb. 2013, doi: 10.1002/asi.22767.
K. Berezka, O. Kovalchuk, S. Banakh, S. Zlyvko, and R. Hrechaniuk, “A binary logistic regression model for support decision making in criminal justice,” Folia Oecon. Stetin., vol. 22, no. 1, pp. 1–17, Jun. 2022, doi: 10.2478/foli-2022-0001.
D. L. Wilson, “Asymptotic properties of nearest neighbor rules using edited data,” IEEE Trans. Syst., Man, Cybern., vol. 2, no. 3, pp. 408–421, 1972, doi: 10.1109/tsmc.1972.4309137.
X. Zhu and X. Wu, “Class noise vs. attribute noise: A quantitative study,” Artif. Intell. Rev., vol. 22, no. 3, pp. 177–210, Nov. 2004, doi: 10.1007/s10462-004-0751-8.
L. Beretta and A. Santaniello, “Nearest neighbor imputation algorithms: a critical evaluation,” BMC Med. Inform. Decis. Mak., vol. 16, Jul. 2016, doi: 10.1186/s12911-016-0318-z.
H. Zhang, “The optimality of naive Bayes,” in Proc. 17th Int. Florida Artif. Intell. Res. Soc. Conf. (FLAIRS), Miami Beach, FL, USA, May 2004, pp. 562–567. [Online]. Available: https://cdn.aaai.org/FLAIRS/2004/Flairs04-097.pdf
B. Frénay and M. Verleysen, “Classification in the presence of label noise: A survey,” IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 5, pp. 845–869, May 2014, doi: 10.1109/tnnls.2013.2292894.
N. Cesa-Bianchi, S. Shalev Shwartz, and O. Shamir, “Online learning of noisy data,” IEEE Trans. Inf. Theory, vol. 57, no. 12, pp. 7907–7931, Dec. 2013, doi: 10.1109/tit.2011.2164053.
J. Kim and C. D. Scott, “Robust kernel density estimation,” J. Mach. Learn. Res., vol. 13, pp. 2529–2565, 2012, [Online]. Available: https://www.jmlr.org/papers/volume13/kim12b/kim12b.pdf
N. Manwani and P. S. Sastry, “Noise tolerance under risk minimization,” IEEE Trans. Cybern., vol. 43, no. 3, pp. 1146–1151, Jun. 2013, doi: 10.1109/tsmcb.2012.2223460.
B. Biggio, G. Fumera, and F. Roli, “Multiple classifier systems for robust classifier design in adversarial environments,” Int. J. Mach. Learn. Cybern., vol. 1, no. 1–4, pp. 27–41, Dec. 2010, doi: 10.1007/s13042-010-0007-7.
L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Oct. 2001, doi: 10.1023/a:1010933404324.
A. Folleco, T. M. Khoshgoftaar, J. Van Hulse, and L. Bullard, “Software quality modeling: The impact of class noise on the random forest classifier,” in Proc. IEEE Congr. Evol. Comput. (CEC), Hong Kong, Jun. 2008, pp. 3853–3859. doi: 10.1109/cec.2008.4631321.
J. Zhao, M. Kang, and Z. Han, “Robustness of classification algorithm in the face of label noise,” EAI Endorsed Trans IoT, vol. 9, no. 1, p. e5, Jun. 2023, doi: 10.4108/eetiot.v9i1.3270.
J. Wilton and N. Ye, “Robust loss functions for training decision trees with noisy labels,” in Proc. AAAI Conf. Artif. Intell., Mar. 2024, pp. 15859–15867. doi: 10.1609/aaai.v38i14.29516.
A. Srivastava and M. Sahami, Text mining: Classification, clustering, and applications. Boca Raton, FL, USA: CRC Press, 2009.
B. W. Silverman and M. C. Jones, “E. Fix and J. L. Hodges (1951): An important contribution to nonparametric discriminant analysis and density estimation: Commentary on Fix and Hodges (1951),” Int. Stat. Rev., vol. 57, no. 3, p. 233, Dec. 1989, doi: 10.2307/1403796.
E. Alpaydın, Introduction to Machine Learning, 4th ed. Cambridge, MA, USA: MIT Press, 2020.
E. Parzen, “On estimation of a probability density function and mode,” Ann. Math. Stat., vol. 33, no. 3, pp. 1065–1076, Sep. 1962, doi: 10.1214/aoms/1177704472.
A. Christmann and I. Steinwart, “Support vector machines,” in Support Vector Machines: Theory and Applications, L. T. Yang, Ed., Boston, MA, USA: Springer, 2005, pp. 93–123.
A. Cutler, D. R. Cutler, and J. R. Stevens, “Random forests,” in Ensemble Machine Learning: Methods and Applications, C. Zhang and Y. Ma, Eds., Boston, MA, USA: Springer, 2012, pp. 157–175. doi: 10.1007/978-1-4419-9326-7_5.
S. Vijayarani, M. J. Ilamathi, and M. Nithya, “Preprocessing techniques for text mining—an overview,” Int. J. Comput. Sci. Commun. Netw., vol. 5, no. 1, pp. 7–16, Feb. 2015, [Online]. Available: https://www.researchgate.net/publication/339529230_Preprocessing_Techniques_for_Text_Mining_-_An_Overview
M. Hossin and M. N. Sulaiman, “A review on evaluation metrics for data classification evaluations,” Int. J. Data Min. Knowl. Manag. Process, vol. 5, no. 2, pp. 1–11, Mar. 2015, doi: 10.5121/ijdkp.2015.5201.
K. Pearson, “Mathematical contributions to the theory of evolution. XII—On a generalised theory of alternative inheritance, with special reference to Mendel’s laws,” Proc. R. Soc. Lond., vol. 72, no. 477–486, pp. 505–509, Jan. 1904, doi: 10.1098/rspl.1903.0081.
J. Fan, S. Upadhye, and A. Worster, “Understanding receiver operating characteristic (ROC) curves,” Can. J. Emerg. Med., vol. 8, no. 1, pp. 19–20, Jan. 2006, doi: 10.1017/s1481803500013336.
İ. Akhun, “İki yüzde arasındaki farkın manidarlığının test edilmesi,” Ankara Univ. Egit. Bilim. Fak. Derg., vol. 15, no. 1, pp. 240–259, Jan. 1982, doi: 10.1501/egifak_0000000817.

Investigating the Robustness of Text Mining Classification Algorithms: A Study of Algorithm and Expert Performance on Class-Inconsistent Data

Year 2025, Volume: 8 Issue: 3, 422 - 440, 30.09.2025

Hüseyin Ataseven , Ömay Çokluk-bökeoglu

https://doi.org/10.35377/saucis...1626239

Abstract

This study compares the classification accuracy of text mining algorithms for foreign language proficiency exam items. The dataset included 2,868 items from ÜDS English tests (2006–2012) across Natural and Applied Sciences (n=956), Health Sciences (n=956), and Social Sciences (n=956). Algorithms tested were k-Nearest Neighbors (kNN), Naïve Bayes (NB), Naïve Bayes-Kernel (NB-K), Random Forest (RF), and Support Vector Machines (SVM). Binary classification accuracies ranged from 83.08% (NB) to 92.48% (SVM), while multiclass accuracies ranged from 71.93% (NB) to 84.96% (kNN). Expert analysis and cross-validation identified class-inconsistent items that negatively affected accuracy. Removing these items improved binary classification by 7.39%–9.83% and multiclass classification by 10.58%–17.89%. Among algorithms, kNN was least impacted by class-inconsistent data. These findings highlight the importance of addressing inconsistencies for improving algorithmic performance, with kNN showing robust results across scenarios.

Keywords

Text mining , Document classification , Class-inconsistent data , Robustness of classification algorithms

References

P. Baldi, S. Brunak, Y. Chauvin, C. A. F. Andersen, and H. Nielsen, “Assessing the accuracy of prediction algorithms for classification: an overview,” Bioinformatics, vol. 16, no. 5, pp. 412–424, May 2000, doi: 10.1093/bioinformatics/16.5.412.
K. Kowsari, K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text classification algorithms: A survey,” Information, vol. 10, no. 4, Art. no. 150, Apr. 2019, doi: 10.3390/info10040150.
J. Riggs and T. Lalonde, Handbook for Applied Modeling: Non-Gaussian and Correlated Data. Cambridge, U.K.: Cambridge Univ. Press, 2017.
T. W. Anderson, An Introduction to Multivariate Statistical Analysis, 3rd ed. Hoboken, NJ, USA: Wiley-Interscience, 2003.
R. A. Fisher, “The use of multiple measurements in taxonomic problems,” Ann. Eugenics, vol. 7, no. 2, pp. 179–188, Sep. 1936.
S. Har-Peled, D. Roth, and D. Zimak, “Constraint classification for multiclass classification and ranking,” in Proc. 16th Int. Conf. Neural Inf. Process. Syst. (NIPS), Vancouver, BC, Canada, Dec. 2002, pp. 809–816.
N. Matloff, Statistical Regression and Classification: From Linear Models to Machine Learning. Boca Raton, FL, USA: CRC Press, 2017.
E. Apostolova and R. A. Kreek, “Training and prediction data discrepancies: Challenges of text classification with noisy, historical data,” in Proc. 2018 EMNLP Workshop W-NUT: 4th Workshop on Noisy User-Generated Text, Brussels, Belgium, Nov. 2018, pp. 104–109. doi: 10.18653/v1/W18-6114.
J. Wainer, “Comparison of 14 different families of classification algorithms on 115 binary datasets,” arXiv preprint, Jun. 2016, [Online]. Available: https://arxiv.org/abs/1606.00930v1
A. Hotho, A. Nürnberger, and G. Paaß, “A brief survey of text mining,” J. Lang. Technol. Comput. Linguist., vol. 20, no. 1, pp. 19–62, Jul. 2005, doi: 10.21248/JLCL.20.2005.68.
C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. Cambridge, U.K.: Cambridge Univ. Press, 2008. doi: 10.1017/CBO9780511809071.
T. Graepel, J. Q. Candela, T. Borchert, and R. Herbrich, “Web-scale Bayesian click-through rate prediction for sponsored search advertising in Microsoft’s Bing search engine,” 2010, [Online]. Available: https://discovery.ucl.ac.uk/id/eprint/1395202/
M. Richardson, E. Dominowska, and R. Ragno, “Predicting clicks: Estimating the click-through rate for new ads,” in Proc. 16th Int. World Wide Web Conf. (WWW), Banff, AB, Canada, 2007, pp. 521–530. doi: 10.1145/1242572.1242643.
Y. Yang, Y. Yang, B. Jansen, and M. Lalmas, “Computational advertising: A paradigm shift for advertising and marketing?,” IEEE Intell. Syst., vol. 32, no. 3, pp. 3–6, May 2017, doi: 10.1109/MIS.2017.58.
J. Burrell, “How the machine ‘thinks’: Understanding opacity in machine learning algorithms,” Big Data Soc., vol. 3, no. 1, Jan. 2016, doi: 10.1177/2053951715622512.
N. Jindal and B. Liu, “Opinion spam and analysis,” in Proc. 2008 Int. Conf. Web Search and Data Mining (WSDM), Palo Alto, CA, USA, Feb. 2008, pp. 219–229. doi: 10.1145/1341531.1341560.
E. P. Lim, V. A. Nguyen, N. Jindal, B. Liu, and H. W. Lauw, “Detecting product review spammers using rating behaviors,” in Proc. 19th ACM Int. Conf. Inf. Knowl. Manag. (CIKM), Toronto, ON, Canada, Oct. 2010, pp. 939–948. doi: 10.1145/1871437.1871557.
S. Redhu, “Sentiment analysis using text mining: A review,” Int. J. Data Sci. Technol., vol. 4, no. 2, p. 49, 2018, doi: 10.11648/J.IJDST.20180402.12.
C. Zucco, B. Calabrese, G. Agapito, P. H. Guzzi, and M. Cannataro, “Sentiment analysis for mining texts and social networks data: Methods and tools,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 10, no. 1, p. e1333, Jan. 2020, doi: 10.1002/widm.1333.
A. Dhar, H. Mukherjee, K. Roy, K. C. Santosh, and N. S. Dash, “Hybrid approach for text categorization: A case study with Bangla news article,” J. Inf. Sci., vol. 49, no. 3, pp. 762–777, Jun. 2023, doi: 10.1177/01655515211027770.
H. Gomes, M. de Castro Neto, and R. Henriques, “Text mining: Sentiment analysis on news classification,” in Proc. 8th Iberian Conf. Inf. Syst. Technol. (CISTI), Lisboa, Portugal, Jun. 2013, pp. 1–6.
M. Jamaluddin and A. D. Wibawa, “Patient diagnosis classification based on electronic medical record using text mining and support vector machine,” in Proc. 2021 Int. Seminar Appl. Technol. Inf. Commun. (iSemantic), Semarang, Indonesia, Sep. 2021, pp. 243–248. doi: 10.1109/isemantic52711.2021.9573178.
C. Luque, J. M. Luna, M. Luque, and S. Ventura, “An advanced review on text mining in medicine,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 9, no. 3, p. e1302, May 2019, doi: 10.1002/widm.1302.
Y. L. Chen, Y. H. Liu, and W. L. Ho, “A text mining approach to assist the general public in the retrieval of legal documents,” J. Am. Soc. Inf. Sci. Technol., vol. 64, no. 2, pp. 280–290, Feb. 2013, doi: 10.1002/asi.22767.
K. Berezka, O. Kovalchuk, S. Banakh, S. Zlyvko, and R. Hrechaniuk, “A binary logistic regression model for support decision making in criminal justice,” Folia Oecon. Stetin., vol. 22, no. 1, pp. 1–17, Jun. 2022, doi: 10.2478/foli-2022-0001.
D. L. Wilson, “Asymptotic properties of nearest neighbor rules using edited data,” IEEE Trans. Syst., Man, Cybern., vol. 2, no. 3, pp. 408–421, 1972, doi: 10.1109/tsmc.1972.4309137.
X. Zhu and X. Wu, “Class noise vs. attribute noise: A quantitative study,” Artif. Intell. Rev., vol. 22, no. 3, pp. 177–210, Nov. 2004, doi: 10.1007/s10462-004-0751-8.
L. Beretta and A. Santaniello, “Nearest neighbor imputation algorithms: a critical evaluation,” BMC Med. Inform. Decis. Mak., vol. 16, Jul. 2016, doi: 10.1186/s12911-016-0318-z.
H. Zhang, “The optimality of naive Bayes,” in Proc. 17th Int. Florida Artif. Intell. Res. Soc. Conf. (FLAIRS), Miami Beach, FL, USA, May 2004, pp. 562–567. [Online]. Available: https://cdn.aaai.org/FLAIRS/2004/Flairs04-097.pdf
B. Frénay and M. Verleysen, “Classification in the presence of label noise: A survey,” IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 5, pp. 845–869, May 2014, doi: 10.1109/tnnls.2013.2292894.
N. Cesa-Bianchi, S. Shalev Shwartz, and O. Shamir, “Online learning of noisy data,” IEEE Trans. Inf. Theory, vol. 57, no. 12, pp. 7907–7931, Dec. 2013, doi: 10.1109/tit.2011.2164053.
J. Kim and C. D. Scott, “Robust kernel density estimation,” J. Mach. Learn. Res., vol. 13, pp. 2529–2565, 2012, [Online]. Available: https://www.jmlr.org/papers/volume13/kim12b/kim12b.pdf
N. Manwani and P. S. Sastry, “Noise tolerance under risk minimization,” IEEE Trans. Cybern., vol. 43, no. 3, pp. 1146–1151, Jun. 2013, doi: 10.1109/tsmcb.2012.2223460.
B. Biggio, G. Fumera, and F. Roli, “Multiple classifier systems for robust classifier design in adversarial environments,” Int. J. Mach. Learn. Cybern., vol. 1, no. 1–4, pp. 27–41, Dec. 2010, doi: 10.1007/s13042-010-0007-7.
L. Breiman, “Random forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Oct. 2001, doi: 10.1023/a:1010933404324.
A. Folleco, T. M. Khoshgoftaar, J. Van Hulse, and L. Bullard, “Software quality modeling: The impact of class noise on the random forest classifier,” in Proc. IEEE Congr. Evol. Comput. (CEC), Hong Kong, Jun. 2008, pp. 3853–3859. doi: 10.1109/cec.2008.4631321.
J. Zhao, M. Kang, and Z. Han, “Robustness of classification algorithm in the face of label noise,” EAI Endorsed Trans IoT, vol. 9, no. 1, p. e5, Jun. 2023, doi: 10.4108/eetiot.v9i1.3270.
J. Wilton and N. Ye, “Robust loss functions for training decision trees with noisy labels,” in Proc. AAAI Conf. Artif. Intell., Mar. 2024, pp. 15859–15867. doi: 10.1609/aaai.v38i14.29516.
A. Srivastava and M. Sahami, Text mining: Classification, clustering, and applications. Boca Raton, FL, USA: CRC Press, 2009.
B. W. Silverman and M. C. Jones, “E. Fix and J. L. Hodges (1951): An important contribution to nonparametric discriminant analysis and density estimation: Commentary on Fix and Hodges (1951),” Int. Stat. Rev., vol. 57, no. 3, p. 233, Dec. 1989, doi: 10.2307/1403796.
E. Alpaydın, Introduction to Machine Learning, 4th ed. Cambridge, MA, USA: MIT Press, 2020.
E. Parzen, “On estimation of a probability density function and mode,” Ann. Math. Stat., vol. 33, no. 3, pp. 1065–1076, Sep. 1962, doi: 10.1214/aoms/1177704472.
A. Christmann and I. Steinwart, “Support vector machines,” in Support Vector Machines: Theory and Applications, L. T. Yang, Ed., Boston, MA, USA: Springer, 2005, pp. 93–123.
A. Cutler, D. R. Cutler, and J. R. Stevens, “Random forests,” in Ensemble Machine Learning: Methods and Applications, C. Zhang and Y. Ma, Eds., Boston, MA, USA: Springer, 2012, pp. 157–175. doi: 10.1007/978-1-4419-9326-7_5.
S. Vijayarani, M. J. Ilamathi, and M. Nithya, “Preprocessing techniques for text mining—an overview,” Int. J. Comput. Sci. Commun. Netw., vol. 5, no. 1, pp. 7–16, Feb. 2015, [Online]. Available: https://www.researchgate.net/publication/339529230_Preprocessing_Techniques_for_Text_Mining_-_An_Overview
M. Hossin and M. N. Sulaiman, “A review on evaluation metrics for data classification evaluations,” Int. J. Data Min. Knowl. Manag. Process, vol. 5, no. 2, pp. 1–11, Mar. 2015, doi: 10.5121/ijdkp.2015.5201.
K. Pearson, “Mathematical contributions to the theory of evolution. XII—On a generalised theory of alternative inheritance, with special reference to Mendel’s laws,” Proc. R. Soc. Lond., vol. 72, no. 477–486, pp. 505–509, Jan. 1904, doi: 10.1098/rspl.1903.0081.
J. Fan, S. Upadhye, and A. Worster, “Understanding receiver operating characteristic (ROC) curves,” Can. J. Emerg. Med., vol. 8, no. 1, pp. 19–20, Jan. 2006, doi: 10.1017/s1481803500013336.
İ. Akhun, “İki yüzde arasındaki farkın manidarlığının test edilmesi,” Ankara Univ. Egit. Bilim. Fak. Derg., vol. 15, no. 1, pp. 240–259, Jan. 1982, doi: 10.1501/egifak_0000000817.

There are 49 citations in total.

Details

Primary Language	English
Subjects	Software Engineering (Other)
Journal Section	Research Article
Authors	Hüseyin Ataseven 0000-0001-9992-4518 Ömay Çokluk-bökeoglu 0000-0002-3879-9204
Early Pub Date	September 24, 2025
Publication Date	September 30, 2025
Submission Date	January 24, 2025
Acceptance Date	July 16, 2025
Published in Issue	Year 2025 Volume: 8 Issue: 3

Cite

APA	Ataseven, H., & Çokluk-bökeoglu, Ö. (2025). Investigating the Robustness of Text Mining Classification Algorithms: A Study of Algorithm and Expert Performance on Class-Inconsistent Data. Sakarya University Journal of Computer and Information Sciences, 8(3), 422-440. https://doi.org/10.35377/saucis...1626239
AMA	Ataseven H, Çokluk-bökeoglu Ö. Investigating the Robustness of Text Mining Classification Algorithms: A Study of Algorithm and Expert Performance on Class-Inconsistent Data. SAUCIS. September 2025;8(3):422-440. doi:10.35377/saucis.1626239
Chicago	Ataseven, Hüseyin, and Ömay Çokluk-bökeoglu. “Investigating the Robustness of Text Mining Classification Algorithms: A Study of Algorithm and Expert Performance on Class-Inconsistent Data”. Sakarya University Journal of Computer and Information Sciences 8, no. 3 (September 2025): 422-40. https://doi.org/10.35377/saucis. 1626239.
EndNote	Ataseven H, Çokluk-bökeoglu Ö (September 1, 2025) Investigating the Robustness of Text Mining Classification Algorithms: A Study of Algorithm and Expert Performance on Class-Inconsistent Data. Sakarya University Journal of Computer and Information Sciences 8 3 422–440.
IEEE	H. Ataseven and Ö. Çokluk-bökeoglu, “Investigating the Robustness of Text Mining Classification Algorithms: A Study of Algorithm and Expert Performance on Class-Inconsistent Data”, SAUCIS, vol. 8, no. 3, pp. 422–440, 2025, doi: 10.35377/saucis...1626239.
ISNAD	Ataseven, Hüseyin - Çokluk-bökeoglu, Ömay. “Investigating the Robustness of Text Mining Classification Algorithms: A Study of Algorithm and Expert Performance on Class-Inconsistent Data”. Sakarya University Journal of Computer and Information Sciences 8/3 (September2025), 422-440. https://doi.org/10.35377/saucis. 1626239.
JAMA	Ataseven H, Çokluk-bökeoglu Ö. Investigating the Robustness of Text Mining Classification Algorithms: A Study of Algorithm and Expert Performance on Class-Inconsistent Data. SAUCIS. 2025;8:422–440.
MLA	Ataseven, Hüseyin and Ömay Çokluk-bökeoglu. “Investigating the Robustness of Text Mining Classification Algorithms: A Study of Algorithm and Expert Performance on Class-Inconsistent Data”. Sakarya University Journal of Computer and Information Sciences, vol. 8, no. 3, 2025, pp. 422-40, doi:10.35377/saucis. 1626239.
Vancouver	Ataseven H, Çokluk-bökeoglu Ö. Investigating the Robustness of Text Mining Classification Algorithms: A Study of Algorithm and Expert Performance on Class-Inconsistent Data. SAUCIS. 2025;8(3):422-40.

Download Cover Image

Article Files

Full Text

INDEXING & ABSTRACTING & ARCHIVING

29070 The papers in this journal are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License