中国全科医学 ›› 2024, Vol. 27 ›› Issue (08): 961-970.DOI: 10.12114/j.issn.1007-9572.2023.0360

• 论著 • 上一篇    下一篇

基于麻雀搜索算法优化的BP神经网络模型对2型糖尿病肾病的预测研究

邹琼1,2, 吴曦1, 张杨1, 万毅3, 陈长生1,*()   

  1. 1.710032 陕西省西安市,空军军医大学军事预防医学系军队卫生统计学教研室 特殊作业环境危害评估与防治教育部重点实验室
    2.712046 陕西省咸阳市,陕西中医药大学公共卫生学院
    3.710032 陕西省西安市,空军军医大学卫勤训练基地
  • 收稿日期:2023-06-20 修回日期:2023-09-05 出版日期:2024-03-15 发布日期:2023-12-19
  • 通讯作者: 陈长生

  • 作者贡献:邹琼、张杨进行数据的下载及整理;邹琼、吴曦、陈长生进行文章的构思与设计、论文的修订;邹琼、吴曦、张杨、万毅、陈长生进行研究的实施与可行性分析;邹琼、吴曦、张杨、万毅进行结果的分析与解释并撰写论文。
  • 基金资助:
    国家自然科学基金资助项目(82073663)

Prediction of Type 2 Diabetic Nephropathy Based on BP Neural Network Optimized by Sparrow Search Algorithm

ZOU Qiong1,2, WU Xi1, ZHANG Yang1, WAN Yi3, CHEN Changsheng1,*()   

  1. 1. Department of Military Health Statistics, School of Military Preventive Medicine, Air Force Medical University/Ministry of Education Key Lab of Hazard Assessment and Control in Special Operational Environment, Xi'an 710032, China
    2. College of Health Public, Shaanxi University of Chinese Medicine, Xianyang 712046, China
    3. Department of Health Services, Air Force Medical University, Xi'an 710032, China
  • Received:2023-06-20 Revised:2023-09-05 Published:2024-03-15 Online:2023-12-19
  • Contact: CHEN Changsheng

摘要: 背景 糖尿病肾病(DN)是糖尿病常见的微血管并发症之一,发病率高,危害性大。早期发现DN对预防相关疾病非常重要。目前大多研究基于传统的统计预测方法,数据需满足其所要求的前提假设条件。近年来已无法很好满足其在DN预测领域的需求,有必要尝试开展机器学习等新方法在DN预测领域的应用。 目的 利用LASSO回归和麻雀搜索算法(SSA)优化的BP神经网络(SSA-BP神经网络)构建DN预测模型。 方法 本研究时间为2023年4—8月,数据来源于公开的伊朗133例糖尿病患者的并发症数据。采用SPSS 26.0软件进行单因素分析,采用LASSO回归筛选变量。以是否患DN为因变量,分别用8∶2和7∶3的比例划分训练集和测试集,使用SSA-BP神经网络进行建模与分析,并与经典的机器学习模型对比预测性能以分析较优的DN模型。基于准确率、精确率、灵敏度、特异度、F1-score和受试者工作特征曲线下面积(AUC)指标进行模型评价。 结果 剔除9例1型糖尿病患者,本研究纳入的有效样本量为124例2型糖尿病(T2DM)患者,其中73例(58.9%)被诊断为DN患者。单因素分析显示年龄、BMI、糖尿病持续时间、空腹血糖(FBG)、糖化血红蛋白(HbA1c)、低密度脂蛋白(LDL)、高密度脂蛋白(HDL)、三酰甘油(TG)、收缩压(SBP)和舒张压(DBP)的T2DM患者DN危险因素(P<0.05)。训练集∶测试集=8∶2时,训练集(n=100)中有59例DN患者,测试集(n=24)含有14例DN患者。LASSO回归筛选出年龄、糖尿病持续时间、HbA1c、LDL和SBP共5个影响因素。Logistic回归(LR)、K近邻(KNN)、支持向量机(SVM)、BP神经网络、SSA-BP神经网络模型在测试集的准确率分别为83.33%、79.17%、79.17%、87.50%、95.83%。F1-score分别为0.846 2、0.800 0、0.800 0、0.888 9、0.960 0。训练集∶测试集=7∶3时,训练集(n=88)中有52例DN患者,测试集(n=36)含有21例DN患者。LASSO回归筛选出年龄、BMI、糖尿病持续时间、LDL、HDL、SBP和DBP这7个影响因素。LR、KNN、SVM、BP神经网络、SSA-BP神经网络模型在测试集的准确率分别为86.11%、86.11%、86.11%、72.22%、91.67%。F1-score分别为0.871 8、0.871 8、0.864 9、0.705 9、0.909 1。 结论 LR、KNN和SVM模型在训练集∶测试集=7∶3时性能较好,BP神经网络和SSA-BP神经网络模型在训练集∶测试集=8∶2时性能较好。相较于BP神经网络模型和传统机器学习模型,SSA-BP神经网络模型的预测性能更佳,可及时准确识别T2DM DN患者,实现DN的早发现和早治疗,从而预防并减缓对其身体带来的危害。

关键词: 糖尿病,2型, 糖尿病肾病, 神经网络,计算机, 预测模型

Abstract:

Background

Diabetic nephropathy (DN) is one of the most common microvascular complications of diabetes, which is highly prevalent and harmful. Early detection of DN is an important task in preventing related diseases. Currently, most of the researches are based on traditional statistical prediction methods, and data need to meet the prerequisites it requires. It is necessary to try to apply new methods such as machine learning in the area of DN prediction for its failing to meet the needs in the field of DN prediction in recent years.

Objective

To construct DN prediction model using the LASSO regression and BP neural network optimized by sparrow search algorithm (SSA-BP) .

Methods

This study was conducted from April 2023 to August 2023, and the data was obtained from publicly available data on complications of 133 patients with diabetes mellitus in Iran. Univariate analysis was conducted using SPSS 26.0 software, and variables were screened using LASSO regression. Using the presence of DN as the dependent variable, the training and testing sets were divided into 8∶2 and 7∶3 ratios, respectively. The SSA-BP neural network was used for modeling and analysis, and the prediction performance was compared with classical machine learning models to analyze the better DN model. Model evaluation was performed based on accuracy, precision, sensitivity, specificity, F1-score and AUC indicators.

Results

Excluding 9 patients with type 1 diabetes, the effective sample size included in this study was 124 patients with type 2 diabetes mellitus (T2DM) , of which 73 (58.9%) were diagnosed with DN. Univariate analysis of risk factors for type 2 DN showed statistically significant for age, BMI, duration of diabetes, fasting blood glucose (FBG) , glycosylated hemoglobin (HbA1c) , low-density lipoprotein (LDL) , high-density lipoprotein (HDL) , triacylglycerol (TG) , systolic blood pressure (SBP) and diastolic blood pressure (DBP) (P<0.05) . When the ratio of the training set to the test set was 8∶2, there were 59 DN patients in the training set (n=100) and 14 DN patients in the test set (n=24) . Five influencing factors of age, diabetes duration, HbA1c, LDL, and SBP were obtained by LASSO regression screening. The accuracy rates of Logistic regression (LR) , K-nearest neighbor (KNN) , support vector machine (SVM) and SSA-BP models in the test set were 83.33%, 79.17%, 79.17%, 87.50%, and 95.83%, with F1-score as 0.846 2, 0.800 0, 0.800 0, 0.888 9, and 0.960 0, respectively. When the ratio of the training set to the test set was 7∶3, there were 52 DN patients in the training set (n=88) and 21 DN patients in the test set (n=36) . Seven influencing factors obtained by LASSO regression screening included age, BMI, diabetes duration, LDL, HDL, SBP, and DBP. The accuracy rates of LR, KNN, SVM, BP, and SSA-BP models in the test set were 86.11%, 86.11%, 86.11%, 72.22%, and 91.67%, with F1-score as 0.871 8, 0.871 8, 0.864 9, 0.705 9, and 0.909 1, respectively.

Conclusion

LR, KNN, and SVM perform better when the training set to the test set is 7∶3, while BP and SSA-BP perform better when the training set to the test set is 8∶2. Compared with the BP neural network and traditional machine learning models, SSA-BP model has the best prediction performance and can timely and accurately identify type 2 DN patients, realize early detection and treatment of DN, thus preventing and mitigating the harm to their bodies.

Key words: Diabetes mellitus, type 2, Diabetic nephropathies, Neural networks, computer, Prediction model