The classification of documents is one of the problems studied since ancient times and still continues to be studied. With the social media becoming a part of daily life and its misuse, the importance of text classification has started to increase. This paper investigates the effect of data augmentation with sentence generation on classification performance in an imbalanced dataset. We propose an LSTM based sentence generation method, Term Frequency-Inverse Document Frequency (TF-IDF) and Word2vec and apply Logistic Regression (LR), Support Vector Machine (SVM), K Nearest Neighbour (KNN), Multilayer Perceptron (MLP), Extremly Randomized Trees (Extra tree), Random Forest, eXtreme Gradient Boosting (Xgboost), Adaptive Boosting (AdaBoost) and Bagging. Our experiment results on imbalanced Offensive Language Identification Dataset (OLID) that machine learning with sentence generation significantly outperforms.
sentence generation imbalance classification offensive language deep learning machine learning
Birincil Dil | İngilizce |
---|---|
Konular | Yapay Zeka |
Bölüm | Makaleler |
Yazarlar | |
Yayımlanma Tarihi | 30 Nisan 2022 |
Gönderilme Tarihi | 10 Şubat 2022 |
Kabul Tarihi | 18 Nisan 2022 |
Yayımlandığı Sayı | Yıl 2022Cilt: 5 Sayı: 1 |
The papers in this journal are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License