The development of technology increases data traffic and data size day by day. Therefore, it has become very important to collect and interpret data. This study, it is aimed to analyze the car sales data collected using web scraping techniques by using machine learning algorithms and to create a price estimation model. The data needed for analysis was collected using Selenium and BeautifulSoup and prepared for analysis by applying various data preprocessing steps. Lasso regression and PCA analysis were used for feature selection and size reduction, and the GridSearchCV method was used for hyperparameter tuning. The results were evaluated with machine learning algorithms.
Random Forest, K-Nearest Neighbor, Gradient Boost, AdaBoost, Support Vector and XGBoost regression algorithms were used in the analysis. The obtained analysis results were evaluated together with Mean Square Error (MSE), Root Mean Square Error (RMSE) and Coefficient of Determination (R-square). When the results for data set 1 were examined, the model that gave the best results was XGBoost Regression with 0.973 R2, 0.026 MSE and 0.161 RMSE values. When the results for data set 2 were examined, the model that gave the best results was K-Nearest Neighbor Regression with 0.978 R2, 0.021 MSE and 0.145 RMSE values.
Primary Language | English |
---|---|
Subjects | Software Engineering (Other) |
Journal Section | Articles |
Authors | |
Early Pub Date | August 30, 2023 |
Publication Date | August 31, 2023 |
Submission Date | June 2, 2023 |
Acceptance Date | August 28, 2023 |
Published in Issue | Year 2023 |
The papers in this journal are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License