Investigating the Robustness of Text Mining Classification Algorithms: A Study of Algorithm and Expert Performance on Class-Inconsistent Data
Abstract
Keywords
References
- P. Baldi, S. Brunak, Y. Chauvin, C. A. F. Andersen, and H. Nielsen, “Assessing the accuracy of prediction algorithms for classification: an overview,” Bioinformatics, vol. 16, no. 5, pp. 412–424, May 2000, doi: 10.1093/bioinformatics/16.5.412.
- K. Kowsari, K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text classification algorithms: A survey,” Information, vol. 10, no. 4, Art. no. 150, Apr. 2019, doi: 10.3390/info10040150.
- J. Riggs and T. Lalonde, Handbook for Applied Modeling: Non-Gaussian and Correlated Data. Cambridge, U.K.: Cambridge Univ. Press, 2017.
- T. W. Anderson, An Introduction to Multivariate Statistical Analysis, 3rd ed. Hoboken, NJ, USA: Wiley-Interscience, 2003.
- R. A. Fisher, “The use of multiple measurements in taxonomic problems,” Ann. Eugenics, vol. 7, no. 2, pp. 179–188, Sep. 1936.
- S. Har-Peled, D. Roth, and D. Zimak, “Constraint classification for multiclass classification and ranking,” in Proc. 16th Int. Conf. Neural Inf. Process. Syst. (NIPS), Vancouver, BC, Canada, Dec. 2002, pp. 809–816.
- N. Matloff, Statistical Regression and Classification: From Linear Models to Machine Learning. Boca Raton, FL, USA: CRC Press, 2017.
- E. Apostolova and R. A. Kreek, “Training and prediction data discrepancies: Challenges of text classification with noisy, historical data,” in Proc. 2018 EMNLP Workshop W-NUT: 4th Workshop on Noisy User-Generated Text, Brussels, Belgium, Nov. 2018, pp. 104–109. doi: 10.18653/v1/W18-6114.
Details
Primary Language
English
Subjects
Software Engineering (Other)
Journal Section
Research Article
Authors
Early Pub Date
September 24, 2025
Publication Date
September 30, 2025
Submission Date
January 24, 2025
Acceptance Date
July 16, 2025
Published in Issue
Year 2025 Volume: 8 Number: 3
