Araştırma Makalesi
BibTex RIS Kaynak Göster

Price Prediction Using Web Scraping and Machine Learning Algorithms in the Used Car Market

Yıl 2023, Cilt: 6 Sayı: 2, 140 - 148, 31.08.2023
https://doi.org/10.35377/saucis...1309103

Öz

The development of technology increases data traffic and data size day by day. Therefore, it has become very important to collect and interpret data. This study, it is aimed to analyze the car sales data collected using web scraping techniques by using machine learning algorithms and to create a price estimation model. The data needed for analysis was collected using Selenium and BeautifulSoup and prepared for analysis by applying various data preprocessing steps. Lasso regression and PCA analysis were used for feature selection and size reduction, and the GridSearchCV method was used for hyperparameter tuning. The results were evaluated with machine learning algorithms.
Random Forest, K-Nearest Neighbor, Gradient Boost, AdaBoost, Support Vector and XGBoost regression algorithms were used in the analysis. The obtained analysis results were evaluated together with Mean Square Error (MSE), Root Mean Square Error (RMSE) and Coefficient of Determination (R-square). When the results for data set 1 were examined, the model that gave the best results was XGBoost Regression with 0.973 R2, 0.026 MSE and 0.161 RMSE values. When the results for data set 2 were examined, the model that gave the best results was K-Nearest Neighbor Regression with 0.978 R2, 0.021 MSE and 0.145 RMSE values.

Kaynakça

  • [1] Milev, P., Conceptual Approach for Development of Web Scraping Application for Tracking Information. Economic Alternatives, 475-485, 2017.
  • [2] Khder, M., Web Scraping or Web Crawling: State of Art, Techniques, 73 Approaches and Application. International Journal of Advances in Soft Computing and its Applications, 2021.
  • [3] Banerjee, R., Website Scraping, Happiest Minds Technologies, 2014.
  • [4] Haddaway, N., The use of web-scraping software in searching for grey literature. Grey Journal, 11(3):186-190, 2015.
  • [5] Gegic, E.; Isakovic, B.; Keco, D.; Masetic, Z.; Kevric, J. Car price prediction using machine learning techniques. TEM J. 2019, 8, 113.
  • [6] Asghar, M., Mehmood, K., Yasin, S., & Khan, Z. M., Used Cars Price Prediction using Machine Learning with Optimal Features. Pakistan Journal of Engineering and Technology, 4(2), 113-119, 2021.
  • [7] Pandey, A., Rastogi, V., & Singh, S., Car’s selling price prediction using random forest machine learning algorithm. In 5th International Conference on Next Generation Computing Technologies, 2020.
  • [8] Chen, K.-P., Liang, T.-P., Yin, S.-Y., Chang, T., Liu, Y.-C., & Yu, Y.-T., How serious is shill bidding in online auctions? evidence from eBay motors. work, 1–51, 2020.
  • [9] Yolcu360, Available: https://yolcu360.com/blog/oto-ekspertiz-raporunda-ne-yazar. [Accessed: 03-May-2023].
  • [10] Scikit-learn, Available: https://scikit-learn.org/stable/modules/cross_validation.html. [Accessed: 04-May-2023].
  • [11] Breiman, L., Random Forests. Machine Learning, 45(1), 5-32, 2001.
  • [12] Breiman, L., Bagging Predictors. Machine Learning, 24(2), 123-140, 1996.
  • [13] Altman, N. S. An introduction to kernel and nearest-neighbor nonparametric regression, The American Statistician, vol.46, s. 175–185, 1992.
  • [14] Friedman, J. H., Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189-1232, 2001.
  • [15] Freund, Y. and Schapire, R. E. “Experiments with a new boosting algorithm”, Icml, 96, 148-156, 1996.
  • [16] Schapire, R. E., Explaining adaboost. In Empirical Inference, pp. 37–52, Berlin Heidelberg., 2013.
  • [17] Vapnik V., The Nature of Statistical Learning Theory, 1995.
  • [18] Awad M. and Khanna R., Efficient Learning Machines, Apress, 2015.
  • [19] Chen, T., Guestrin, C., XGBoost: A scalable tree boosting system. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 13-17-Augu, 785–794, 2016.
Yıl 2023, Cilt: 6 Sayı: 2, 140 - 148, 31.08.2023
https://doi.org/10.35377/saucis...1309103

Öz

Kaynakça

  • [1] Milev, P., Conceptual Approach for Development of Web Scraping Application for Tracking Information. Economic Alternatives, 475-485, 2017.
  • [2] Khder, M., Web Scraping or Web Crawling: State of Art, Techniques, 73 Approaches and Application. International Journal of Advances in Soft Computing and its Applications, 2021.
  • [3] Banerjee, R., Website Scraping, Happiest Minds Technologies, 2014.
  • [4] Haddaway, N., The use of web-scraping software in searching for grey literature. Grey Journal, 11(3):186-190, 2015.
  • [5] Gegic, E.; Isakovic, B.; Keco, D.; Masetic, Z.; Kevric, J. Car price prediction using machine learning techniques. TEM J. 2019, 8, 113.
  • [6] Asghar, M., Mehmood, K., Yasin, S., & Khan, Z. M., Used Cars Price Prediction using Machine Learning with Optimal Features. Pakistan Journal of Engineering and Technology, 4(2), 113-119, 2021.
  • [7] Pandey, A., Rastogi, V., & Singh, S., Car’s selling price prediction using random forest machine learning algorithm. In 5th International Conference on Next Generation Computing Technologies, 2020.
  • [8] Chen, K.-P., Liang, T.-P., Yin, S.-Y., Chang, T., Liu, Y.-C., & Yu, Y.-T., How serious is shill bidding in online auctions? evidence from eBay motors. work, 1–51, 2020.
  • [9] Yolcu360, Available: https://yolcu360.com/blog/oto-ekspertiz-raporunda-ne-yazar. [Accessed: 03-May-2023].
  • [10] Scikit-learn, Available: https://scikit-learn.org/stable/modules/cross_validation.html. [Accessed: 04-May-2023].
  • [11] Breiman, L., Random Forests. Machine Learning, 45(1), 5-32, 2001.
  • [12] Breiman, L., Bagging Predictors. Machine Learning, 24(2), 123-140, 1996.
  • [13] Altman, N. S. An introduction to kernel and nearest-neighbor nonparametric regression, The American Statistician, vol.46, s. 175–185, 1992.
  • [14] Friedman, J. H., Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189-1232, 2001.
  • [15] Freund, Y. and Schapire, R. E. “Experiments with a new boosting algorithm”, Icml, 96, 148-156, 1996.
  • [16] Schapire, R. E., Explaining adaboost. In Empirical Inference, pp. 37–52, Berlin Heidelberg., 2013.
  • [17] Vapnik V., The Nature of Statistical Learning Theory, 1995.
  • [18] Awad M. and Khanna R., Efficient Learning Machines, Apress, 2015.
  • [19] Chen, T., Guestrin, C., XGBoost: A scalable tree boosting system. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 13-17-Augu, 785–794, 2016.
Toplam 19 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Yazılım Mühendisliği (Diğer)
Bölüm Makaleler
Yazarlar

Seda Yılmaz 0000-0002-9190-4835

İhsan Hakan Selvi 0000-0002-8837-2137

Erken Görünüm Tarihi 30 Ağustos 2023
Yayımlanma Tarihi 31 Ağustos 2023
Gönderilme Tarihi 2 Haziran 2023
Kabul Tarihi 28 Ağustos 2023
Yayımlandığı Sayı Yıl 2023Cilt: 6 Sayı: 2

Kaynak Göster

IEEE S. Yılmaz ve İ. H. Selvi, “Price Prediction Using Web Scraping and Machine Learning Algorithms in the Used Car Market”, SAUCIS, c. 6, sy. 2, ss. 140–148, 2023, doi: 10.35377/saucis...1309103.

    Sakarya University Journal of Computer and Information Sciences in Applied Sciences and Engineering: An interdisciplinary journal of information science