Classification of Lung Cancer–Related Data Using Decision Trees: A Risk Factor Interpretability Study
Abstract
In this study, lung cancer—related individual and environmental risk factors were analyzed using a Decision Tree–based machine learning approach. Two open-access datasets (“survey lung cancer.csv” and “lung-cancer.data”) were utilized. Since the survey dataset does not contain a direct clinical diagnosis label, the “SMOKING” variable was selected as the target class, and the study was designed as a risk-based classification task rather than a disease diagnosis model. Data preprocessing included handling missing values, encoding categorical variables, and feature normalization. The Decision Tree model was implemented in the MATLAB environment and evaluated using accuracy, precision, recall, specificity, F1-score, and AUC metrics. The model achieved an accuracy of 87.1% with a high Recall (Sensitivity) value of 0.918 and an AUC of 0.93, indicating effective identification of individuals carrying dominant lung cancer—related risk factors. These findings demonstrate that Decision Trees provide an interpretable and practical framework for risk-oriented screening and decision-support applications in healthcare.
Keywords
Ethical Statement
Thanks
References
- “Lung cancer,” Accessed: Nov. 26, 2025. [Online].
- H. Sung, et al., “Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries,” CA. Cancer J. Clin., vol. 71, no. 3, pp. 209–249. doi: 10.3322/caac.21660.
- F. Bray, et al., “Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries,” CA. Cancer J. Clin., vol. 74, no. 3, pp. 229–263. doi: 10.3322/caac.21834.
- “Tobacco,” Accessed: Nov. 26, 2025. [Online].
- National Center for Chronic Disease Prevention and Health Promotion (US) Office on Smoking and Health, The health consequences of smoking—50 years of progress: A report of the Surgeon General, Atlanta (GA): Centers for Disease Control and Prevention (US), 2014. Accessed: Nov. 26, 2025. [Online].
- International Agency for Research on Cancer (IARC), Diesel and gasoline engine exhausts and some nitroarenes. Accessed: Nov. 26, 2025. [Online].
- M. C. Turner, et al., “Outdoor air pollution and cancer: An overview of the current evidence and public health recommendations,” CA. Cancer J. Clin. doi: 10.3322/caac.21632.
- “Radon and cancer - NCI,” Accessed: Nov. 26, 2025. [Online].
Details
Primary Language
English
Subjects
Computing Applications in Health , Artificial Intelligence (Other)
Journal Section
Research Article
Early Pub Date
March 27, 2026
Publication Date
March 27, 2026
Submission Date
December 5, 2025
Acceptance Date
December 28, 2025
Published in Issue
Year 2026 Volume: 9 Number: 1
