Research Article
BibTex RIS Cite

Price Prediction Using Web Scraping and Machine Learning Algorithms in the Used Car Market

Year 2023, , 140 - 148, 31.08.2023
https://doi.org/10.35377/saucis...1309103

Abstract

The development of technology increases data traffic and data size day by day. Therefore, it has become very important to collect and interpret data. This study, it is aimed to analyze the car sales data collected using web scraping techniques by using machine learning algorithms and to create a price estimation model. The data needed for analysis was collected using Selenium and BeautifulSoup and prepared for analysis by applying various data preprocessing steps. Lasso regression and PCA analysis were used for feature selection and size reduction, and the GridSearchCV method was used for hyperparameter tuning. The results were evaluated with machine learning algorithms.
Random Forest, K-Nearest Neighbor, Gradient Boost, AdaBoost, Support Vector and XGBoost regression algorithms were used in the analysis. The obtained analysis results were evaluated together with Mean Square Error (MSE), Root Mean Square Error (RMSE) and Coefficient of Determination (R-square). When the results for data set 1 were examined, the model that gave the best results was XGBoost Regression with 0.973 R2, 0.026 MSE and 0.161 RMSE values. When the results for data set 2 were examined, the model that gave the best results was K-Nearest Neighbor Regression with 0.978 R2, 0.021 MSE and 0.145 RMSE values.

References

  • [1] Milev, P., Conceptual Approach for Development of Web Scraping Application for Tracking Information. Economic Alternatives, 475-485, 2017.
  • [2] Khder, M., Web Scraping or Web Crawling: State of Art, Techniques, 73 Approaches and Application. International Journal of Advances in Soft Computing and its Applications, 2021.
  • [3] Banerjee, R., Website Scraping, Happiest Minds Technologies, 2014.
  • [4] Haddaway, N., The use of web-scraping software in searching for grey literature. Grey Journal, 11(3):186-190, 2015.
  • [5] Gegic, E.; Isakovic, B.; Keco, D.; Masetic, Z.; Kevric, J. Car price prediction using machine learning techniques. TEM J. 2019, 8, 113.
  • [6] Asghar, M., Mehmood, K., Yasin, S., & Khan, Z. M., Used Cars Price Prediction using Machine Learning with Optimal Features. Pakistan Journal of Engineering and Technology, 4(2), 113-119, 2021.
  • [7] Pandey, A., Rastogi, V., & Singh, S., Car’s selling price prediction using random forest machine learning algorithm. In 5th International Conference on Next Generation Computing Technologies, 2020.
  • [8] Chen, K.-P., Liang, T.-P., Yin, S.-Y., Chang, T., Liu, Y.-C., & Yu, Y.-T., How serious is shill bidding in online auctions? evidence from eBay motors. work, 1–51, 2020.
  • [9] Yolcu360, Available: https://yolcu360.com/blog/oto-ekspertiz-raporunda-ne-yazar. [Accessed: 03-May-2023].
  • [10] Scikit-learn, Available: https://scikit-learn.org/stable/modules/cross_validation.html. [Accessed: 04-May-2023].
  • [11] Breiman, L., Random Forests. Machine Learning, 45(1), 5-32, 2001.
  • [12] Breiman, L., Bagging Predictors. Machine Learning, 24(2), 123-140, 1996.
  • [13] Altman, N. S. An introduction to kernel and nearest-neighbor nonparametric regression, The American Statistician, vol.46, s. 175–185, 1992.
  • [14] Friedman, J. H., Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189-1232, 2001.
  • [15] Freund, Y. and Schapire, R. E. “Experiments with a new boosting algorithm”, Icml, 96, 148-156, 1996.
  • [16] Schapire, R. E., Explaining adaboost. In Empirical Inference, pp. 37–52, Berlin Heidelberg., 2013.
  • [17] Vapnik V., The Nature of Statistical Learning Theory, 1995.
  • [18] Awad M. and Khanna R., Efficient Learning Machines, Apress, 2015.
  • [19] Chen, T., Guestrin, C., XGBoost: A scalable tree boosting system. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 13-17-Augu, 785–794, 2016.
Year 2023, , 140 - 148, 31.08.2023
https://doi.org/10.35377/saucis...1309103

Abstract

References

  • [1] Milev, P., Conceptual Approach for Development of Web Scraping Application for Tracking Information. Economic Alternatives, 475-485, 2017.
  • [2] Khder, M., Web Scraping or Web Crawling: State of Art, Techniques, 73 Approaches and Application. International Journal of Advances in Soft Computing and its Applications, 2021.
  • [3] Banerjee, R., Website Scraping, Happiest Minds Technologies, 2014.
  • [4] Haddaway, N., The use of web-scraping software in searching for grey literature. Grey Journal, 11(3):186-190, 2015.
  • [5] Gegic, E.; Isakovic, B.; Keco, D.; Masetic, Z.; Kevric, J. Car price prediction using machine learning techniques. TEM J. 2019, 8, 113.
  • [6] Asghar, M., Mehmood, K., Yasin, S., & Khan, Z. M., Used Cars Price Prediction using Machine Learning with Optimal Features. Pakistan Journal of Engineering and Technology, 4(2), 113-119, 2021.
  • [7] Pandey, A., Rastogi, V., & Singh, S., Car’s selling price prediction using random forest machine learning algorithm. In 5th International Conference on Next Generation Computing Technologies, 2020.
  • [8] Chen, K.-P., Liang, T.-P., Yin, S.-Y., Chang, T., Liu, Y.-C., & Yu, Y.-T., How serious is shill bidding in online auctions? evidence from eBay motors. work, 1–51, 2020.
  • [9] Yolcu360, Available: https://yolcu360.com/blog/oto-ekspertiz-raporunda-ne-yazar. [Accessed: 03-May-2023].
  • [10] Scikit-learn, Available: https://scikit-learn.org/stable/modules/cross_validation.html. [Accessed: 04-May-2023].
  • [11] Breiman, L., Random Forests. Machine Learning, 45(1), 5-32, 2001.
  • [12] Breiman, L., Bagging Predictors. Machine Learning, 24(2), 123-140, 1996.
  • [13] Altman, N. S. An introduction to kernel and nearest-neighbor nonparametric regression, The American Statistician, vol.46, s. 175–185, 1992.
  • [14] Friedman, J. H., Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189-1232, 2001.
  • [15] Freund, Y. and Schapire, R. E. “Experiments with a new boosting algorithm”, Icml, 96, 148-156, 1996.
  • [16] Schapire, R. E., Explaining adaboost. In Empirical Inference, pp. 37–52, Berlin Heidelberg., 2013.
  • [17] Vapnik V., The Nature of Statistical Learning Theory, 1995.
  • [18] Awad M. and Khanna R., Efficient Learning Machines, Apress, 2015.
  • [19] Chen, T., Guestrin, C., XGBoost: A scalable tree boosting system. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 13-17-Augu, 785–794, 2016.
There are 19 citations in total.

Details

Primary Language English
Subjects Software Engineering (Other)
Journal Section Articles
Authors

Seda Yılmaz 0000-0002-9190-4835

İhsan Hakan Selvi 0000-0002-8837-2137

Early Pub Date August 30, 2023
Publication Date August 31, 2023
Submission Date June 2, 2023
Acceptance Date August 28, 2023
Published in Issue Year 2023

Cite

IEEE S. Yılmaz and İ. H. Selvi, “Price Prediction Using Web Scraping and Machine Learning Algorithms in the Used Car Market”, SAUCIS, vol. 6, no. 2, pp. 140–148, 2023, doi: 10.35377/saucis...1309103.

29070    The papers in this journal are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License