Academic Journal of Computing & Information Science, 2024, 7(11); doi: 10.25236/AJCIS.2024.071105.
Haiyuan Nong, Hailing Lin, Xiaomin Chen, Guorui Zhao
School of Computer Science and Engineering, Guangdong Ocean University, Yangjiang, 529500, China
Diabetes, a chronic metabolic disease, has long posed a significant challenge in the realm of global public health, seriously threatening the health of individuals worldwide. Utilizing epidemiological survey data on chronic noncommunicable diseases from Shenzhen, this study explored the relationship between lifestyle habits, dietary practices, and individual characteristics of diabetes. Through baseline analysis, 26 key variables, including age, gender, education, and hypertension, were identified. In comparing various statistical models and multiple machine learning algorithms, the XGBoost model was selected as the most effective diabetes prediction model due to its superior predictive performance. The SHAP model was subsequently employed to elucidate the XGBoost model's findings, revealing that age and hypertension emerged as significant positive factors, while social attributes and physical activity were identified as negative factors. Additionally, interactions between age and hypertension, as well as individual differences in dietary habits, were uncovered. The results of this study provide valuable insights into the prevention and control of diabetes.
Diabetes mellitus, Influencing effects, Interaction, Machine learning, SHAP
Haiyuan Nong, Hailing Lin, Xiaomin Chen, Guorui Zhao. Study on the factors influencing diabetes under interpretable machine learning model: A case study of Shenzhen residents' health survey data. Academic Journal of Computing & Information Science (2024), Vol. 7, Issue 11: 34-41. https://doi.org/10.25236/AJCIS.2024.071105.
[1] Magliano, D. J., Boyko, E. J., & IDF Diabetes Atlas 10th edition scientific committee. IDF DIABETES ATLAS. (10th ed.). International Diabetes Federation, 2021.
[2] Kaneko, K., Yatsuya, H., Li, Y., & Aoyama, A. Association of gamma-glutamyl transferase and alanine aminotransferase with type 2 diabetes mellitus incidence in middle-aged Japanese men: 12-year follow up. Journal of diabetes investigation, 2019, 10(3), 837–845.
[3] O'Hearn, M., Lara-Castor, L., Cudhea, F., & Global Dietary Database. Incident type 2 diabetes attributable to suboptimal diet in 184 countries. Nature medicine, 2023, 29(4), 982–995.
[4] Perry, A. S., Annis, J. S., Master, H., & Brittain, E. L. Association of Longitudinal Activity Measures and Diabetes Risk: An Analysis From the National Institutes of Health All of Us Research Program. The Journal of clinical endocrinology and metabolism, 2023, 108(5), 1101–1109.
[5] Punjabi, N. M., Shahar, E., Redline, S., Gottlieb, D. J., Givelber, R., Resnick, H. E., & Sleep Heart Health Study Investigators. Sleep-disordered breathing, glucose intolerance, and insulin resistance: the Sleep Heart Health Study. American journal of epidemiology, 2004, 160(6), 521–530.
[6] Bajaj M. Nicotine and insulin resistance: when the smoke clears. Diabetes, 2012, 61(12), 3078–3080.
[7] van der Velde, J. H. P. M., Boone, S. C., Winters-van Eekelen, E., Rosendaal, F. R., & de Mutsert, R. Timing of physical activity in relation to liver fat content and insulin resistance. Diabetologia, 2023, 66(3), 461–471.
[8] Lin huoshui. The significance of dietary factors in the onset and prevention of diabetes. Medical Philosophy, 1991, (05), 39-40.
[9] Huang gaoming, Liang qiuping. Application of non-conditional logistic regression in the analysis of risk factors for the onset of diabetes. Guangxi Preventive Medicine, 2000, (04), 205-207.
[10] Lin baowang, Huang xiaoke, Wei jing... & Zhao tiejian. The relationship between the factors of onset of type 2 diabetes and the pace of life. Contemporary Medicine, 2010, (04), 156-157+27.
[11] Gong, D., Chen, X., Yang, L., Zhang, Y., Zhong, Q., Liu, J., Yan, & Wang, J. From normal population to prediabetes and diabetes: study of influencing factors and prediction models. Frontiers in endocrinology, 2023, 14, 1225696.
[12] Lin, Y., Sun, Y., Zhang, Z., Wang, Z., Wu, T., Wu, F., Li, Z., Meng, F., & Fu, M.T. A cross-sectional study of optimal exercise combinations for type 2 diabetes. Journal of Public Health, 2023, 1-11.
[13] Zheng, R., Xin, Z., T., & Xu, Y. Outdoor light at night in relation to glucose homoeostasis and diabetes in Chinese adults: a national and cross-sectional study of 98,658 participants from 162 study sites. Diabetologia, 2023, 66(2), 336–345.
[14] Jin yuelong, Chen yan, Kang yaowen,et al. A case-control study on influencing factors of type 2 diabetes mellitus [J]. Journal of Wannan Medical College, 2012, 31(1):55-59.
[15] Ahmad, H. F., Mukhtar, H., Alaqail, H., Seliaman, M., & Alhumam, A. Investigating health-related features and their impact on the prediction of diabetes using machine learning. Applied Sciences, 2021, 11(3), 1173.
[16] Viveka, T., Columbus, C.C., Velmurugan, N.S. To control diabetes using machine learning algorithm and calorie measurement technique. Intelligent Automation & Soft Computing, 2022, 33(1), 535-547.
[17] Zhang hongmei, Zhang ning, Sun yujiao & Zhang zhou. Analysis of factors affecting mild cognitive impairment in type 2 diabetes mellitus based on machine learning and logistic regression analysis model. The Chinese journal of disease control, 2024, (03): 269-276.
[18] Zou, D., Ye, Y., Zou, N., & Yu, J. Analysis of risk factors and their interactions in type 2 diabetes mellitus: A cross-sectional survey in Guilin, China. Journal of diabetes investigation, 2017, 8(2), 188–194.
[19] Li Changshan. Construction of Enterprise financial risk early warning Model based on Logistic regression method [J]. Statistics and decision, 2018, (6): 185-188.
[20] Yuan Yaxiang, Optimization Theory and Method [M]. Beijing: Science Press, 1997:108-121.
[21] Zhang Xuegong. On statistical learning theory and support vector machine [J]. Journal of automation, 2000. (01): 36-46.
[22] Huang Li. Research on Algorithm Improvement and Application of BP Neural Network [D]. Chongqing Normal University, 2008.
[23] SANG Ying-bin,LIU Qiong-sun. Improved k-nearest neighbor classification algorithm[J]. Computer Engineering and Applications, 2009, 45(11): 145-146.
[24] Fang Kuangnan, Wu Jian-Bin, Zhu Jianping, A review of random forest method [J]. Statistics and Information Forum, 2011, 26(03):32-38.
[25] CAO Y, MIao Q G, Liu J C, Research Progress and Prospect of AdaBoost Algorithm [J]. Acta Automatica Sinica, 2013, 39(06):745-758.
[26] Cao Ying-chao. Improved decision tree classification GDBT iterative algorithm and its application [J]. Journal of horizon of science and technology, 2017, (12): 105 + 149.
[27] Miao Fengshun, Diabetes prediction method based on CatBoost algorithm [J]. Computer system application, 2019, 28 (9): 215-218.
[28] Chen, T., & Guestrin, C. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016.
[29] Wang Yan, Guo Yuankai. Application of Improved XGBoost Model in Stock Prediction [J]. Computer Engineering and Applications,2019,55(20):202-207.