In a standard text classification (TC) study, preprocessing is one of the key components to improve performance. This study aims to look at how preprocessing effects TC according to news text, text language, and feature selection. All potential combinations of commonly used preprocessing techniques are compared on one domain, namely news data, and in two different news datasets for this aim. Preprocessing technique contributions to classification performance at multiple feature sizes, possible interconnections among these techniques, and technique dependency on corresponding languages are all evaluated in this way. Using best combinations of preprocessing techniques rather than using or not using them all, experimental studies on public datasets reveals that, choosing best combinations of preprocessing techniques can improve classification accuracy significantly.
Feature selection news data preprocessing text classification
Birincil Dil | İngilizce |
---|---|
Konular | Bilgisayar Yazılımı, Yazılım Mühendisliği (Diğer) |
Bölüm | Makaleler |
Yazarlar | |
Erken Görünüm Tarihi | 28 Nisan 2023 |
Yayımlanma Tarihi | 30 Nisan 2023 |
Gönderilme Tarihi | 21 Kasım 2022 |
Kabul Tarihi | 30 Mart 2023 |
Yayımlandığı Sayı | Yıl 2023Cilt: 6 Sayı: 1 |
The papers in this journal are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License