Research Article

Classification of Lung Cancer–Related Data Using Decision Trees: A Risk Factor Interpretability Study

Volume: 9 Number: 1 March 27, 2026

Classification of Lung Cancer–Related Data Using Decision Trees: A Risk Factor Interpretability Study

Abstract

In this study, lung cancer—related individual and environmental risk factors were analyzed using a Decision Tree–based machine learning approach. Two open-access datasets (“survey lung cancer.csv” and “lung-cancer.data”) were utilized. Since the survey dataset does not contain a direct clinical diagnosis label, the “SMOKING” variable was selected as the target class, and the study was designed as a risk-based classification task rather than a disease diagnosis model. Data preprocessing included handling missing values, encoding categorical variables, and feature normalization. The Decision Tree model was implemented in the MATLAB environment and evaluated using accuracy, precision, recall, specificity, F1-score, and AUC metrics. The model achieved an accuracy of 87.1% with a high Recall (Sensitivity) value of 0.918 and an AUC of 0.93, indicating effective identification of individuals carrying dominant lung cancer—related risk factors. These findings demonstrate that Decision Trees provide an interpretable and practical framework for risk-oriented screening and decision-support applications in healthcare.

Keywords

Ethical Statement

All datasets used in this study were obtained from publicly available sources and do not contain any personal, identifiable, or confidential patient information. Therefore, ethical committee approval was not required for this study.

Thanks

We would like to express our sincere gratitude to Prof. Dr. Mehmet Recep Bozkurt for his valuable contributions.

References

  1. “Lung cancer,” Accessed: Nov. 26, 2025. [Online].
  2. H. Sung, et al., “Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries,” CA. Cancer J. Clin., vol. 71, no. 3, pp. 209–249. doi: 10.3322/caac.21660.
  3. F. Bray, et al., “Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries,” CA. Cancer J. Clin., vol. 74, no. 3, pp. 229–263. doi: 10.3322/caac.21834.
  4. “Tobacco,” Accessed: Nov. 26, 2025. [Online].
  5. National Center for Chronic Disease Prevention and Health Promotion (US) Office on Smoking and Health, The health consequences of smoking—50 years of progress: A report of the Surgeon General, Atlanta (GA): Centers for Disease Control and Prevention (US), 2014. Accessed: Nov. 26, 2025. [Online].
  6. International Agency for Research on Cancer (IARC), Diesel and gasoline engine exhausts and some nitroarenes. Accessed: Nov. 26, 2025. [Online].
  7. M. C. Turner, et al., “Outdoor air pollution and cancer: An overview of the current evidence and public health recommendations,” CA. Cancer J. Clin. doi: 10.3322/caac.21632.
  8. “Radon and cancer - NCI,” Accessed: Nov. 26, 2025. [Online].

Details

Primary Language

English

Subjects

Computing Applications in Health , Artificial Intelligence (Other)

Journal Section

Research Article

Early Pub Date

March 27, 2026

Publication Date

March 27, 2026

Submission Date

December 5, 2025

Acceptance Date

December 28, 2025

Published in Issue

Year 2026 Volume: 9 Number: 1

APA
Yildiz, A., & Bozkurt, F. (2026). Classification of Lung Cancer–Related Data Using Decision Trees: A Risk Factor Interpretability Study. Sakarya University Journal of Computer and Information Sciences, 9(1), 226-232. https://doi.org/10.35377/saucis...1834536
AMA
1.Yildiz A, Bozkurt F. Classification of Lung Cancer–Related Data Using Decision Trees: A Risk Factor Interpretability Study. SAUCIS. 2026;9(1):226-232. doi:10.35377/saucis.1834536
Chicago
Yildiz, Ahmet, and Ferda Bozkurt. 2026. “Classification of Lung Cancer–Related Data Using Decision Trees: A Risk Factor Interpretability Study”. Sakarya University Journal of Computer and Information Sciences 9 (1): 226-32. https://doi.org/10.35377/saucis. 1834536.
EndNote
Yildiz A, Bozkurt F (March 1, 2026) Classification of Lung Cancer–Related Data Using Decision Trees: A Risk Factor Interpretability Study. Sakarya University Journal of Computer and Information Sciences 9 1 226–232.
IEEE
[1]A. Yildiz and F. Bozkurt, “Classification of Lung Cancer–Related Data Using Decision Trees: A Risk Factor Interpretability Study”, SAUCIS, vol. 9, no. 1, pp. 226–232, Mar. 2026, doi: 10.35377/saucis...1834536.
ISNAD
Yildiz, Ahmet - Bozkurt, Ferda. “Classification of Lung Cancer–Related Data Using Decision Trees: A Risk Factor Interpretability Study”. Sakarya University Journal of Computer and Information Sciences 9/1 (March 1, 2026): 226-232. https://doi.org/10.35377/saucis. 1834536.
JAMA
1.Yildiz A, Bozkurt F. Classification of Lung Cancer–Related Data Using Decision Trees: A Risk Factor Interpretability Study. SAUCIS. 2026;9:226–232.
MLA
Yildiz, Ahmet, and Ferda Bozkurt. “Classification of Lung Cancer–Related Data Using Decision Trees: A Risk Factor Interpretability Study”. Sakarya University Journal of Computer and Information Sciences, vol. 9, no. 1, Mar. 2026, pp. 226-32, doi:10.35377/saucis. 1834536.
Vancouver
1.Ahmet Yildiz, Ferda Bozkurt. Classification of Lung Cancer–Related Data Using Decision Trees: A Risk Factor Interpretability Study. SAUCIS. 2026 Mar. 1;9(1):226-32. doi:10.35377/saucis. 1834536

 

INDEXING & ABSTRACTING & ARCHIVING

 

31045 31044   ResimLink - Resim Yükle  31047 

31043 28939 28938 34240
 

 

29070    The papers in this journal are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License