Chinese General Practice ›› 2022, Vol. 25 ›› Issue (02): 217-226.DOI: 10.12114/j.issn.1007-9572.2021.01.313
Special Issue: 呼吸疾病文章合集
• Article·Chronic Obstructive Pulmonary Disease • Previous Articles Next Articles
Using Machine Learning to Build an Early Warning Model for the Risk of Severe Airflow Limitation in Patients with Chronic Obstructive Pulmonary Disease
1.Department of Respiratory and Critical Care Medicine,University of Electronic Science and Technology of China Affiliated Hospital & Sichuan Provincial People's Hospital,Chengdu 610072,China
2.Department of Nursing,University of Electronic Science and Technology of China Affiliated Hospital & Sichuan Provincial People's Hospital,Chengdu 610072,China
3.University of Electronic Science and Technology of China,Chengdu 610072,China
4.Department of Pharmacy,University of Electronic Science and Technology of China Affiliated Hospital & Sichuan Provincial People's Hospital,Chengdu 610072,China
5.Personalized Drug Therapy Key Laboratory of Sichuan Province,School of Medicine,University of Electronic Science and Technology of China,Chengdu 610072,China
*Corresponding author:WEN Xianxiu,Professor of nursing ;E-mail:392083173@qq.com
Received:
2021-06-09
Revised:
2021-11-04
Published:
2022-01-15
Online:
2021-12-29
通讯作者:
温贤秀
基金资助:
CLC Number:
ZHOU Lijuan, WEN Xianxiu, LYU Qin, JIANG Rong, WU Xingwei, ZHOU Huangyuan, XIANG Chao.
Using Machine Learning to Build an Early Warning Model for the Risk of Severe Airflow Limitation in Patients with Chronic Obstructive Pulmonary Disease [J]. Chinese General Practice, 2022, 25(02): 217-226.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.chinagp.net/EN/10.12114/j.issn.1007-9572.2021.01.313
变量 | 数据 | 变量 | 数据 | ||
---|---|---|---|---|---|
年龄(±s,岁) | 63.7±10.9 | 距上次急性发作门诊就诊天数a(±s,d) | 0.6±1.8 | ||
性别〔n(%)〕 | 全身激素使用〔n(%)〕 | ||||
女 | 46 (11.0) | 无 | 403 (96.4) | ||
男 | 372 (89.0) | 有 | 15 (3.6) | ||
病程分期〔n(%)〕 | 合并肺心病〔n(%)〕 | ||||
稳定期 | 304(72.7) | 无 | 407 (97.4) | ||
急性加重期 | 114(27.3) | 有 | 11 (2.6) | ||
BMI(±s,kg/m2) | 23.1±3.6 | 营养代谢异常〔n(%)〕 | |||
受教育程度a〔n(%)〕 | 无 | 416 (99.5) | |||
文盲 | 25 (6.0) | 有 | 2 (0.5) | ||
小学 | 150 (36.0) | 心血管疾病〔n(%)〕 | |||
初中 | 145 (34.8) | 无 | 408 (97.6) | ||
高中/中专 | 55 (13.2) | 有 | 10 (2.4) | ||
大专及以上 | 42 (10.0) | 其他疾病史〔n(%)〕 | |||
哮喘症状〔n(%)〕 | 无 | 300 (71.8) | |||
无 | 79 (18.9) | 有 | 118 (28.2) | ||
有 | 339 (81.1) | COPD家族史〔n(%)〕 | |||
喘息〔n(%)〕 | 无 | 260 (62.2) | |||
无 | 82 (19.6) | 有 | 158 (37.8) | ||
有 | 336 (80.4) | 吸烟史〔n(%)〕 | |||
呼吸困难〔n(%)〕 | 无 | 91 (21.8) | |||
无 | 62 (14.8) | 有 | 327 (78.2) | ||
有 | 356 (85.2) | 吸氧〔n(%)〕 | |||
mMRC等级a〔n(%)〕 | 无 | 389 (93.1) | |||
0级 | 25 (6.0) | 有 | 29 (6.9) | ||
1级 | 145 (34.8) | 使用经皮血氧饱和度监测仪〔n(%)〕 | |||
2级 | 178 (42.7) | 无 | 413 (98.8) | ||
3级 | 68 (16.3) | 有 | 5 (1.2) | ||
4级 | 1 (0.2) | 锻炼〔n(%)〕 | |||
食欲不振〔n(%)〕 | 无 | 109 (26.1) | |||
无 | 358 (85.6) | 有 | 309 (73.9) | ||
有 | 60 (14.4) | 缩唇腹式呼吸〔n(%)〕 | |||
咳嗽〔n(%)〕 | 无 | 256 (61.2) | |||
无 | 71 (17.0) | 有 | 162 (38.8) | ||
有 | 347 (83.0) | CAT评分(±s,分) | 12.8±5.6 | ||
急性发作次数(±s,次) | 1.4±1.5 | 使用吸入剂〔n(%)〕 | |||
距上次急性发作的天数(±s,d) | 1.4±31.7 | 无 | 47 (11.2) | ||
致病因素b〔n(%)〕 | 有 | 371 (88.8) | |||
不清楚 | 139 (33.4) | 长期使用吸入药物〔n(%)〕 | |||
感冒 | 244 (58.7) | 无 | 56 (13.4) | ||
冷空气 | 8 (1.9) | 有 | 361 (86.6) | ||
其他 | 8 (1.9) | 肺功能中FEV1%〔n(%)〕 | |||
运动 | 6 (1.4) | ≥50% | 206(49.3) | ||
刺激性气体 | 11 (2.7) | <50% | 212(50.7) | ||
急性发作住院次数(±s,次) | 0.6±1.1 |
Table 1 General information of the included COPD patients(n=418)
变量 | 数据 | 变量 | 数据 | ||
---|---|---|---|---|---|
年龄(±s,岁) | 63.7±10.9 | 距上次急性发作门诊就诊天数a(±s,d) | 0.6±1.8 | ||
性别〔n(%)〕 | 全身激素使用〔n(%)〕 | ||||
女 | 46 (11.0) | 无 | 403 (96.4) | ||
男 | 372 (89.0) | 有 | 15 (3.6) | ||
病程分期〔n(%)〕 | 合并肺心病〔n(%)〕 | ||||
稳定期 | 304(72.7) | 无 | 407 (97.4) | ||
急性加重期 | 114(27.3) | 有 | 11 (2.6) | ||
BMI(±s,kg/m2) | 23.1±3.6 | 营养代谢异常〔n(%)〕 | |||
受教育程度a〔n(%)〕 | 无 | 416 (99.5) | |||
文盲 | 25 (6.0) | 有 | 2 (0.5) | ||
小学 | 150 (36.0) | 心血管疾病〔n(%)〕 | |||
初中 | 145 (34.8) | 无 | 408 (97.6) | ||
高中/中专 | 55 (13.2) | 有 | 10 (2.4) | ||
大专及以上 | 42 (10.0) | 其他疾病史〔n(%)〕 | |||
哮喘症状〔n(%)〕 | 无 | 300 (71.8) | |||
无 | 79 (18.9) | 有 | 118 (28.2) | ||
有 | 339 (81.1) | COPD家族史〔n(%)〕 | |||
喘息〔n(%)〕 | 无 | 260 (62.2) | |||
无 | 82 (19.6) | 有 | 158 (37.8) | ||
有 | 336 (80.4) | 吸烟史〔n(%)〕 | |||
呼吸困难〔n(%)〕 | 无 | 91 (21.8) | |||
无 | 62 (14.8) | 有 | 327 (78.2) | ||
有 | 356 (85.2) | 吸氧〔n(%)〕 | |||
mMRC等级a〔n(%)〕 | 无 | 389 (93.1) | |||
0级 | 25 (6.0) | 有 | 29 (6.9) | ||
1级 | 145 (34.8) | 使用经皮血氧饱和度监测仪〔n(%)〕 | |||
2级 | 178 (42.7) | 无 | 413 (98.8) | ||
3级 | 68 (16.3) | 有 | 5 (1.2) | ||
4级 | 1 (0.2) | 锻炼〔n(%)〕 | |||
食欲不振〔n(%)〕 | 无 | 109 (26.1) | |||
无 | 358 (85.6) | 有 | 309 (73.9) | ||
有 | 60 (14.4) | 缩唇腹式呼吸〔n(%)〕 | |||
咳嗽〔n(%)〕 | 无 | 256 (61.2) | |||
无 | 71 (17.0) | 有 | 162 (38.8) | ||
有 | 347 (83.0) | CAT评分(±s,分) | 12.8±5.6 | ||
急性发作次数(±s,次) | 1.4±1.5 | 使用吸入剂〔n(%)〕 | |||
距上次急性发作的天数(±s,d) | 1.4±31.7 | 无 | 47 (11.2) | ||
致病因素b〔n(%)〕 | 有 | 371 (88.8) | |||
不清楚 | 139 (33.4) | 长期使用吸入药物〔n(%)〕 | |||
感冒 | 244 (58.7) | 无 | 56 (13.4) | ||
冷空气 | 8 (1.9) | 有 | 361 (86.6) | ||
其他 | 8 (1.9) | 肺功能中FEV1%〔n(%)〕 | |||
运动 | 6 (1.4) | ≥50% | 206(49.3) | ||
刺激性气体 | 11 (2.7) | <50% | 212(50.7) | ||
急性发作住院次数(±s,次) | 0.6±1.1 |
变量名 | 初筛剔除原因 | 变量名 | 初筛剔除原因 |
---|---|---|---|
吸氧(有、无) | ② | 营养状况 | ③ |
每日吸氧时间 | ② | 血氧饱和度值 | ③ |
吸氧流量 | ② | 无创通气使用(有、无) | ② |
吸氧方式 | ② | 每天无创通气时间 | ③ |
无创通气方式 | ② | 佩戴面罩(有、无) | ② |
是否知晓无创呼吸机湿化罐和呼吸机管道如何消毒 | ② | 使用经皮血氧饱和度监测仪(有、无) | ② |
Table 2 Total data set variable elimination
变量名 | 初筛剔除原因 | 变量名 | 初筛剔除原因 |
---|---|---|---|
吸氧(有、无) | ② | 营养状况 | ③ |
每日吸氧时间 | ② | 血氧饱和度值 | ③ |
吸氧流量 | ② | 无创通气使用(有、无) | ② |
吸氧方式 | ② | 每天无创通气时间 | ③ |
无创通气方式 | ② | 佩戴面罩(有、无) | ② |
是否知晓无创呼吸机湿化罐和呼吸机管道如何消毒 | ② | 使用经皮血氧饱和度监测仪(有、无) | ② |
机器学习算法 | AUC | 准确率 | 精确率 | 召回率 | F1值 | |||||
---|---|---|---|---|---|---|---|---|---|---|
(±s) | 95%CI | (±s) | 95%CI | (±s) | 95%CI | (±s) | 95%CI | (±s) | 95%CI | |
AdaBoost | 0.706±0.098 | (0.689,0.724) | 0.683±0.076 | (0.670,0.697) | 0.672±0.073 | (0.659,0.685) | 0.740±0.106 | (0.721,0.759) | 0.701±0.079 | (0.687,0.715) |
Bagging | 0.665±0.100 | (0.647,0.683) | 0.626±0.087 | (0.611,0.642) | 0.642±0.096 | (0.624,0.659) | 0.614±0.121 | (0.592,0.636) | 0.622±0.091 | (0.605,0.638) |
Bernoulli Naive Bayes | 0.715±0.074 | (0.702,0.729) | 0.664±0.061 | (0.653,0.675) | 0.659±0.067 | (0.647,0.672) | 0.709±0.093 | (0.693,0.726) | 0.680±0.063 | (0.668,0.691) |
Decision Tree | 0.694±0.074 | (0.680,0.707) | 0.685±0.071 | (0.672,0.697) | 0.667±0.065 | (0.656,0.679) | 0.758±0.115 | (0.737,0.779) | 0.706±0.076 | (0.692,0.720) |
Extra Tree | 0.678±0.080 | (0.663,0.692) | 0.664±0.071 | (0.651,0.676) | 0.664±0.074 | (0.651,0.678) | 0.694±0.110 | (0.674,0.714) | 0.674±0.074 | (0.661,0.687) |
Gaussian Naive Bayes | 0.702±0.084 | (0.687,0.717) | 0.639±0.067 | (0.627,0.651) | 0.616±0.058 | (0.605,0.626) | 0.777±0.079 | (0.763,0.791) | 0.685±0.058 | (0.675,0.696) |
Gradient Boosting | 0.700±0.097 | (0.682,0.717) | 0.664±0.082 | (0.649,0.678) | 0.662±0.079 | (0.647,0.676) | 0.695±0.131 | (0.671,0.719) | 0.673±0.091 | (0.656,0.689) |
KNN | 0.697±0.088 | (0.681,0.713) | 0.637±0.082 | (0.622,0.652) | 0.618±0.076 | (0.605,0.632) | 0.781±0.120 | (0.760,0.803) | 0.684±0.072 | (0.671,0.697) |
LDA | 0.729±0.091 | (0.712,0.745) | 0.677±0.072 | (0.664,0.690) | 0.676±0.075 | (0.662,0.689) | 0.704±0.111 | (0.684,0.724) | 0.685±0.077 | (0.671,0.699) |
Logistic Regression | 0.728±0.094 | (0.711,0.745) | 0.682±0.074 | (0.669,0.696) | 0.683±0.077 | (0.669,0.697) | 0.701±0.117 | (0.680,0.722) | 0.687±0.084 | (0.672,0.703) |
Multinomial Naive Bayes | 0.640±0.100 | (0.622,0.659) | 0.596±0.089 | (0.580,0.612) | 0.590±0.087 | (0.574,0.606) | 0.697±0.141 | (0.672,0.723) | 0.632±0.089 | (0.616,0.648) |
Passive Aggressive | 0.649±0.113 | (0.628,0.669) | 0.601±0.090 | (0.584,0.617) | 0.603±0.102 | (0.585,0.622) | 0.639±0.184 | (0.606,0.672) | 0.607±0.124 | (0.584,0.629) |
QDA | 0.719±0.089 | (0.703,0.735) | 0.661±0.076 | (0.647,0.674) | 0.650±0.074 | (0.637,0.664) | 0.723±0.114 | (0.703,0.744) | 0.681±0.078 | (0.667,0.695) |
Random Forest | 0.664±0.110 | (0.644,0.684) | 0.625±0.099 | (0.607,0.643) | 0.636±0.107 | (0.616,0.655) | 0.620±0.131 | (0.597,0.644) | 0.623±0.108 | (0.603,0.642) |
SGD | 0.738±0.089 | (0.722,0.755) | 0.685±0.075 | (0.672,0.699) | 0.684±0.077 | (0.670,0.698) | 0.716±0.110 | (0.696,0.736) | 0.695±0.077 | (0.681,0.709) |
SVM | 0.720±0.101 | (0.701,0.738) | 0.666±0.087 | (0.651,0.682) | 0.678±0.098 | (0.660,0.695) | 0.666±0.112 | (0.645,0.686) | 0.667±0.090 | (0.651,0.683) |
XGBoost | 0.677±0.099 | (0.659,0.695) | 0.637±0.079 | (0.622,0.651) | 0.642±0.078 | (0.628,0.656) | 0.642±0.124 | (0.620,0.665) | 0.637±0.090 | (0.621,0.654) |
P值 | <0.000 1 | <0.000 1 | <0.000 1 | <0.000 1 | <0.000 1 |
Table 3 Ten fold cross validation results of 17 machine learning algorithms
机器学习算法 | AUC | 准确率 | 精确率 | 召回率 | F1值 | |||||
---|---|---|---|---|---|---|---|---|---|---|
(±s) | 95%CI | (±s) | 95%CI | (±s) | 95%CI | (±s) | 95%CI | (±s) | 95%CI | |
AdaBoost | 0.706±0.098 | (0.689,0.724) | 0.683±0.076 | (0.670,0.697) | 0.672±0.073 | (0.659,0.685) | 0.740±0.106 | (0.721,0.759) | 0.701±0.079 | (0.687,0.715) |
Bagging | 0.665±0.100 | (0.647,0.683) | 0.626±0.087 | (0.611,0.642) | 0.642±0.096 | (0.624,0.659) | 0.614±0.121 | (0.592,0.636) | 0.622±0.091 | (0.605,0.638) |
Bernoulli Naive Bayes | 0.715±0.074 | (0.702,0.729) | 0.664±0.061 | (0.653,0.675) | 0.659±0.067 | (0.647,0.672) | 0.709±0.093 | (0.693,0.726) | 0.680±0.063 | (0.668,0.691) |
Decision Tree | 0.694±0.074 | (0.680,0.707) | 0.685±0.071 | (0.672,0.697) | 0.667±0.065 | (0.656,0.679) | 0.758±0.115 | (0.737,0.779) | 0.706±0.076 | (0.692,0.720) |
Extra Tree | 0.678±0.080 | (0.663,0.692) | 0.664±0.071 | (0.651,0.676) | 0.664±0.074 | (0.651,0.678) | 0.694±0.110 | (0.674,0.714) | 0.674±0.074 | (0.661,0.687) |
Gaussian Naive Bayes | 0.702±0.084 | (0.687,0.717) | 0.639±0.067 | (0.627,0.651) | 0.616±0.058 | (0.605,0.626) | 0.777±0.079 | (0.763,0.791) | 0.685±0.058 | (0.675,0.696) |
Gradient Boosting | 0.700±0.097 | (0.682,0.717) | 0.664±0.082 | (0.649,0.678) | 0.662±0.079 | (0.647,0.676) | 0.695±0.131 | (0.671,0.719) | 0.673±0.091 | (0.656,0.689) |
KNN | 0.697±0.088 | (0.681,0.713) | 0.637±0.082 | (0.622,0.652) | 0.618±0.076 | (0.605,0.632) | 0.781±0.120 | (0.760,0.803) | 0.684±0.072 | (0.671,0.697) |
LDA | 0.729±0.091 | (0.712,0.745) | 0.677±0.072 | (0.664,0.690) | 0.676±0.075 | (0.662,0.689) | 0.704±0.111 | (0.684,0.724) | 0.685±0.077 | (0.671,0.699) |
Logistic Regression | 0.728±0.094 | (0.711,0.745) | 0.682±0.074 | (0.669,0.696) | 0.683±0.077 | (0.669,0.697) | 0.701±0.117 | (0.680,0.722) | 0.687±0.084 | (0.672,0.703) |
Multinomial Naive Bayes | 0.640±0.100 | (0.622,0.659) | 0.596±0.089 | (0.580,0.612) | 0.590±0.087 | (0.574,0.606) | 0.697±0.141 | (0.672,0.723) | 0.632±0.089 | (0.616,0.648) |
Passive Aggressive | 0.649±0.113 | (0.628,0.669) | 0.601±0.090 | (0.584,0.617) | 0.603±0.102 | (0.585,0.622) | 0.639±0.184 | (0.606,0.672) | 0.607±0.124 | (0.584,0.629) |
QDA | 0.719±0.089 | (0.703,0.735) | 0.661±0.076 | (0.647,0.674) | 0.650±0.074 | (0.637,0.664) | 0.723±0.114 | (0.703,0.744) | 0.681±0.078 | (0.667,0.695) |
Random Forest | 0.664±0.110 | (0.644,0.684) | 0.625±0.099 | (0.607,0.643) | 0.636±0.107 | (0.616,0.655) | 0.620±0.131 | (0.597,0.644) | 0.623±0.108 | (0.603,0.642) |
SGD | 0.738±0.089 | (0.722,0.755) | 0.685±0.075 | (0.672,0.699) | 0.684±0.077 | (0.670,0.698) | 0.716±0.110 | (0.696,0.736) | 0.695±0.077 | (0.681,0.709) |
SVM | 0.720±0.101 | (0.701,0.738) | 0.666±0.087 | (0.651,0.682) | 0.678±0.098 | (0.660,0.695) | 0.666±0.112 | (0.645,0.686) | 0.667±0.090 | (0.651,0.683) |
XGBoost | 0.677±0.099 | (0.659,0.695) | 0.637±0.079 | (0.622,0.651) | 0.642±0.078 | (0.628,0.656) | 0.642±0.124 | (0.620,0.665) | 0.637±0.090 | (0.621,0.654) |
P值 | <0.000 1 | <0.000 1 | <0.000 1 | <0.000 1 | <0.000 1 |
机器学习算法 | AUC | 准确率 | 精确率 | 召回率 | F1值 | |||||
---|---|---|---|---|---|---|---|---|---|---|
(±s) | 95%CI | (±s) | 95%CI | (±s) | 95%CI | (±s) | 95%CI | (±s) | 95%CI | |
AdaBoost | 0.716±0.068 | (0.713,0.718) | 0.678±0.061 | (0.676,0.680) | 0.659±0.073 | (0.656,0.662) | 0.765±0.070 | (0.763,0.768) | 0.706±0.060 | (0.704,0.708) |
Bagging | 0.657±0.074 | (0.654,0.660) | 0.627±0.059 | (0.624,0.629) | 0.623±0.076 | (0.620,0.626) | 0.670±0.086 | (0.667,0.673) | 0.643±0.067 | (0.640,0.645) |
Bernoulli Naive Bayes | 0.697±0.062 | (0.694,0.699) | 0.648±0.057 | (0.646,0.650) | 0.639±0.073 | (0.636,0.642) | 0.721±0.069 | (0.718,0.724) | 0.675±0.058 | (0.673,0.677) |
Decision Tree | 0.681±0.065 | (0.678,0.684) | 0.683±0.061 | (0.681,0.686) | 0.658±0.074 | (0.655,0.661) | 0.799±0.069 | (0.797,0.802) | 0.719±0.058 | (0.717,0.721) |
Ensemble Learning | 0.757±0.057 | (0.755,0.760) | 0.708±0.056 | (0.706,0.711) | 0.695±0.074 | (0.692,0.698) | 0.771±0.074 | (0.768,0.774) | 0.728±0.057 | (0.725,0.730) |
Extra Tree | 0.666±0.065 | (0.664,0.669) | 0.658±0.062 | (0.655,0.660) | 0.646±0.077 | (0.643,0.649) | 0.733±0.089 | (0.729,0.737) | 0.683±0.064 | (0.680,0.685) |
Gaussian Naive Bayes | 0.654±0.066 | (0.651,0.656) | 0.610±0.057 | (0.608,0.612) | 0.597±0.070 | (0.595,0.600) | 0.728±0.074 | (0.725,0.731) | 0.654±0.060 | (0.651,0.656) |
Gradient Boosting | 0.707±0.064 | (0.705,0.710) | 0.655±0.065 | (0.653,0.658) | 0.645±0.079 | (0.642,0.648) | 0.726±0.074 | (0.723,0.729) | 0.680±0.065 | (0.678,0.683) |
KNN | 0.663±0.071 | (0.660,0.666) | 0.633±0.066 | (0.630,0.636) | 0.609±0.080 | (0.606,0.612) | 0.809±0.087 | (0.806,0.813) | 0.690±0.060 | (0.688,0.693) |
LDA | 0.714±0.060 | (0.712,0.716) | 0.678±0.053 | (0.676,0.680) | 0.665±0.070 | (0.662,0.667) | 0.743±0.070 | (0.740,0.746) | 0.699±0.056 | (0.697,0.701) |
Logistic Regression | 0.721±0.062 | (0.718,0.723) | 0.689±0.056 | (0.687,0.692) | 0.678±0.072 | (0.675,0.681) | 0.748±0.069 | (0.746,0.751) | 0.709±0.058 | (0.707,0.711) |
Multinomial Naive Bayes | 0.651±0.064 | (0.648,0.654) | 0.602±0.068 | (0.600,0.605) | 0.602±0.081 | (0.598,0.605) | 0.668±0.122 | (0.663,0.673) | 0.627±0.080 | (0.624,0.630) |
Passive Aggressive | 0.686±0.075 | (0.683,0.689) | 0.624±0.082 | (0.621,0.628) | 0.636±0.095 | (0.632,0.639) | 0.626±0.200 | (0.618,0.634) | 0.613±0.126 | (0.608,0.619) |
QDA | 0.686±0.067 | (0.683,0.688) | 0.646±0.061 | (0.643,0.648) | 0.630±0.074 | (0.627,0.633) | 0.753±0.075 | (0.750,0.756) | 0.683±0.061 | (0.681,0.686) |
Random Forest | 0.687±0.066 | (0.685,0.690) | 0.659±0.063 | (0.657,0.662) | 0.657±0.076 | (0.654,0.660) | 0.692±0.088 | (0.689,0.696) | 0.671±0.069 | (0.668,0.674) |
SGD | 0.718±0.064 | (0.715,0.720) | 0.672±0.054 | (0.670,0.674) | 0.657±0.071 | (0.655,0.660) | 0.747±0.075 | (0.744,0.750) | 0.697±0.058 | (0.694,0.699) |
SVM | 0.708±0.061 | (0.705,0.710) | 0.648±0.072 | (0.645,0.650) | 0.641±0.083 | (0.637,0.644) | 0.709±0.082 | (0.706,0.712) | 0.671±0.072 | (0.668,0.674) |
XGBoost | 0.680±0.069 | (0.677,0.683) | 0.639±0.066 | (0.637,0.642) | 0.636±0.082 | (0.632,0.639) | 0.697±0.081 | (0.694,0.700) | 0.662±0.067 | (0.659,0.664) |
P值 | <0.000 1 | <0.000 1 | <0.000 1 | <0.000 1 | <0.000 1 |
Table 4 External verification results of 17 machine learning algorithms
机器学习算法 | AUC | 准确率 | 精确率 | 召回率 | F1值 | |||||
---|---|---|---|---|---|---|---|---|---|---|
(±s) | 95%CI | (±s) | 95%CI | (±s) | 95%CI | (±s) | 95%CI | (±s) | 95%CI | |
AdaBoost | 0.716±0.068 | (0.713,0.718) | 0.678±0.061 | (0.676,0.680) | 0.659±0.073 | (0.656,0.662) | 0.765±0.070 | (0.763,0.768) | 0.706±0.060 | (0.704,0.708) |
Bagging | 0.657±0.074 | (0.654,0.660) | 0.627±0.059 | (0.624,0.629) | 0.623±0.076 | (0.620,0.626) | 0.670±0.086 | (0.667,0.673) | 0.643±0.067 | (0.640,0.645) |
Bernoulli Naive Bayes | 0.697±0.062 | (0.694,0.699) | 0.648±0.057 | (0.646,0.650) | 0.639±0.073 | (0.636,0.642) | 0.721±0.069 | (0.718,0.724) | 0.675±0.058 | (0.673,0.677) |
Decision Tree | 0.681±0.065 | (0.678,0.684) | 0.683±0.061 | (0.681,0.686) | 0.658±0.074 | (0.655,0.661) | 0.799±0.069 | (0.797,0.802) | 0.719±0.058 | (0.717,0.721) |
Ensemble Learning | 0.757±0.057 | (0.755,0.760) | 0.708±0.056 | (0.706,0.711) | 0.695±0.074 | (0.692,0.698) | 0.771±0.074 | (0.768,0.774) | 0.728±0.057 | (0.725,0.730) |
Extra Tree | 0.666±0.065 | (0.664,0.669) | 0.658±0.062 | (0.655,0.660) | 0.646±0.077 | (0.643,0.649) | 0.733±0.089 | (0.729,0.737) | 0.683±0.064 | (0.680,0.685) |
Gaussian Naive Bayes | 0.654±0.066 | (0.651,0.656) | 0.610±0.057 | (0.608,0.612) | 0.597±0.070 | (0.595,0.600) | 0.728±0.074 | (0.725,0.731) | 0.654±0.060 | (0.651,0.656) |
Gradient Boosting | 0.707±0.064 | (0.705,0.710) | 0.655±0.065 | (0.653,0.658) | 0.645±0.079 | (0.642,0.648) | 0.726±0.074 | (0.723,0.729) | 0.680±0.065 | (0.678,0.683) |
KNN | 0.663±0.071 | (0.660,0.666) | 0.633±0.066 | (0.630,0.636) | 0.609±0.080 | (0.606,0.612) | 0.809±0.087 | (0.806,0.813) | 0.690±0.060 | (0.688,0.693) |
LDA | 0.714±0.060 | (0.712,0.716) | 0.678±0.053 | (0.676,0.680) | 0.665±0.070 | (0.662,0.667) | 0.743±0.070 | (0.740,0.746) | 0.699±0.056 | (0.697,0.701) |
Logistic Regression | 0.721±0.062 | (0.718,0.723) | 0.689±0.056 | (0.687,0.692) | 0.678±0.072 | (0.675,0.681) | 0.748±0.069 | (0.746,0.751) | 0.709±0.058 | (0.707,0.711) |
Multinomial Naive Bayes | 0.651±0.064 | (0.648,0.654) | 0.602±0.068 | (0.600,0.605) | 0.602±0.081 | (0.598,0.605) | 0.668±0.122 | (0.663,0.673) | 0.627±0.080 | (0.624,0.630) |
Passive Aggressive | 0.686±0.075 | (0.683,0.689) | 0.624±0.082 | (0.621,0.628) | 0.636±0.095 | (0.632,0.639) | 0.626±0.200 | (0.618,0.634) | 0.613±0.126 | (0.608,0.619) |
QDA | 0.686±0.067 | (0.683,0.688) | 0.646±0.061 | (0.643,0.648) | 0.630±0.074 | (0.627,0.633) | 0.753±0.075 | (0.750,0.756) | 0.683±0.061 | (0.681,0.686) |
Random Forest | 0.687±0.066 | (0.685,0.690) | 0.659±0.063 | (0.657,0.662) | 0.657±0.076 | (0.654,0.660) | 0.692±0.088 | (0.689,0.696) | 0.671±0.069 | (0.668,0.674) |
SGD | 0.718±0.064 | (0.715,0.720) | 0.672±0.054 | (0.670,0.674) | 0.657±0.071 | (0.655,0.660) | 0.747±0.075 | (0.744,0.750) | 0.697±0.058 | (0.694,0.699) |
SVM | 0.708±0.061 | (0.705,0.710) | 0.648±0.072 | (0.645,0.650) | 0.641±0.083 | (0.637,0.644) | 0.709±0.082 | (0.706,0.712) | 0.671±0.072 | (0.668,0.674) |
XGBoost | 0.680±0.069 | (0.677,0.683) | 0.639±0.066 | (0.637,0.642) | 0.636±0.082 | (0.632,0.639) | 0.697±0.081 | (0.694,0.700) | 0.662±0.067 | (0.659,0.664) |
P值 | <0.000 1 | <0.000 1 | <0.000 1 | <0.000 1 | <0.000 1 |
处理方法 | AUC | 准确率 | 精确率 | 召回率 | F1值 | |||||
---|---|---|---|---|---|---|---|---|---|---|
(±s) | 95%CI | (±s) | 95%CI | (±s) | 95%CI | (±s) | 95%CI | (±s) | 95%CI | |
Not | 0.724±0.070 | (0.723,0.725) | 0.689±0.070 | (0.688,0.691) | 0.676±0.083 | (0.674,0.677) | 0.751±0.098 | (0.749,0.753) | 0.707±0.073 | (0.705,0.708) |
Random Forest | 0.682±0.068 | (0.681,0.684) | 0.640±0.063 | (0.638,0.641) | 0.630±0.074 | (0.628,0.631) | 0.722±0.101 | (0.720,0.724) | 0.668±0.072 | (0.667,0.669) |
Random Forest Improve | 0.681±0.069 | (0.680,0.683) | 0.642±0.063 | (0.640,0.643) | 0.632±0.076 | (0.631,0.634) | 0.720±0.101 | (0.718,0.722) | 0.669±0.073 | (0.667,0.670) |
Simple | 0.679±0.068 | (0.677,0.680) | 0.642±0.064 | (0.641,0.644) | 0.634±0.079 | (0.633,0.636) | 0.720±0.104 | (0.718,0.722) | 0.669±0.073 | (0.668,0.671) |
P值 | <0.000 1 | <0.000 1 | <0.000 1 | <0.000 1 | <0.000 1 |
Table 5 The results of external validation of different missing value processing methods
处理方法 | AUC | 准确率 | 精确率 | 召回率 | F1值 | |||||
---|---|---|---|---|---|---|---|---|---|---|
(±s) | 95%CI | (±s) | 95%CI | (±s) | 95%CI | (±s) | 95%CI | (±s) | 95%CI | |
Not | 0.724±0.070 | (0.723,0.725) | 0.689±0.070 | (0.688,0.691) | 0.676±0.083 | (0.674,0.677) | 0.751±0.098 | (0.749,0.753) | 0.707±0.073 | (0.705,0.708) |
Random Forest | 0.682±0.068 | (0.681,0.684) | 0.640±0.063 | (0.638,0.641) | 0.630±0.074 | (0.628,0.631) | 0.722±0.101 | (0.720,0.724) | 0.668±0.072 | (0.667,0.669) |
Random Forest Improve | 0.681±0.069 | (0.680,0.683) | 0.642±0.063 | (0.640,0.643) | 0.632±0.076 | (0.631,0.634) | 0.720±0.101 | (0.718,0.722) | 0.669±0.073 | (0.667,0.670) |
Simple | 0.679±0.068 | (0.677,0.680) | 0.642±0.064 | (0.641,0.644) | 0.634±0.079 | (0.633,0.636) | 0.720±0.104 | (0.718,0.722) | 0.669±0.073 | (0.668,0.671) |
P值 | <0.000 1 | <0.000 1 | <0.000 1 | <0.000 1 | <0.000 1 |
筛选方法 | AUC | 准确率 | 精确率 | 召回率 | F1Score | |||||
---|---|---|---|---|---|---|---|---|---|---|
(±s) | 95%CI | (±s) | 95%CI | (±s) | 95%CI | (±s) | 95%CI | (±s) | 95%CI | |
Boruta | 0.681±0.072 | (0.680,0.682) | 0.652±0.068 | (0.650,0.653) | 0.643±0.081 | (0.641,0.644) | 0.722±0.100 | (0.721,0.724) | 0.676±0.073 | (0.674,0.677) |
Lasso | 0.703±0.069 | (0.701,0.704) | 0.651±0.069 | (0.649,0.652) | 0.643±0.082 | (0.642,0.644) | 0.717±0.110 | (0.715,0.719) | 0.672±0.079 | (0.671,0.674) |
Not | 0.691±0.071 | (0.690,0.692) | 0.658±0.068 | (0.656,0.659) | 0.643±0.078 | (0.642,0.645) | 0.745±0.094 | (0.743,0.746) | 0.687±0.071 | (0.686,0.688) |
Pvalue | <0.000 1 | <0.000 1 | 0.534 4 | <0.000 1 | <0.000 1 |
Table 6 The results of external validation of different feature screening methods
筛选方法 | AUC | 准确率 | 精确率 | 召回率 | F1Score | |||||
---|---|---|---|---|---|---|---|---|---|---|
(±s) | 95%CI | (±s) | 95%CI | (±s) | 95%CI | (±s) | 95%CI | (±s) | 95%CI | |
Boruta | 0.681±0.072 | (0.680,0.682) | 0.652±0.068 | (0.650,0.653) | 0.643±0.081 | (0.641,0.644) | 0.722±0.100 | (0.721,0.724) | 0.676±0.073 | (0.674,0.677) |
Lasso | 0.703±0.069 | (0.701,0.704) | 0.651±0.069 | (0.649,0.652) | 0.643±0.082 | (0.642,0.644) | 0.717±0.110 | (0.715,0.719) | 0.672±0.079 | (0.671,0.674) |
Not | 0.691±0.071 | (0.690,0.692) | 0.658±0.068 | (0.656,0.659) | 0.643±0.078 | (0.642,0.645) | 0.745±0.094 | (0.743,0.746) | 0.687±0.071 | (0.686,0.688) |
Pvalue | <0.000 1 | <0.000 1 | 0.534 4 | <0.000 1 | <0.000 1 |
气流受限程度风险预警模型 | 模型类型 | 填充方式 | 筛选方式 | 变量个数 | AUC | 准确率 | 精确率 | 召回率 | F1值 |
---|---|---|---|---|---|---|---|---|---|
model 1 | 集成学习 | Not | Not | 23 | 0.790 9 | 0.759 0 | 0.750 0 | 0.785 7 | 0.767 4 |
model 2 | 集成学习 | Not | Boruta | 16 | 0.787 5 | 0.759 0 | 0.739 1 | 0.809 5 | 0.772 7 |
model 3 | 逻辑回归 | Not | Not | 23 | 0.776 4 | 0.747 0 | 0.723 4 | 0.809 5 | 0.764 0 |
model 4 | 自适应增强 | Not | Lasso | 4 | 0.773 8 | 0.698 8 | 0.680 9 | 0.761 9 | 0.719 1 |
model 5 | 集成学习 | Not | Lasso | 4 | 0.773 8 | 0.698 8 | 0.680 9 | 0.761 9 | 0.719 1 |
Table 7 Summary of 5 best risk prediction models for airflow limitation in patients with COPD
气流受限程度风险预警模型 | 模型类型 | 填充方式 | 筛选方式 | 变量个数 | AUC | 准确率 | 精确率 | 召回率 | F1值 |
---|---|---|---|---|---|---|---|---|---|
model 1 | 集成学习 | Not | Not | 23 | 0.790 9 | 0.759 0 | 0.750 0 | 0.785 7 | 0.767 4 |
model 2 | 集成学习 | Not | Boruta | 16 | 0.787 5 | 0.759 0 | 0.739 1 | 0.809 5 | 0.772 7 |
model 3 | 逻辑回归 | Not | Not | 23 | 0.776 4 | 0.747 0 | 0.723 4 | 0.809 5 | 0.764 0 |
model 4 | 自适应增强 | Not | Lasso | 4 | 0.773 8 | 0.698 8 | 0.680 9 | 0.761 9 | 0.719 1 |
model 5 | 集成学习 | Not | Lasso | 4 | 0.773 8 | 0.698 8 | 0.680 9 | 0.761 9 | 0.719 1 |
[1] | GBD Chronic Respiratory Disease Collaborators. Prevalence and attributable health burden of chronic respiratory diseases,1990-2017:a systematic analysis for the Global Burden of Disease Study 2017[J]. Lancet Respir Med,2020,8(6):585-596. |
[2] | LOZANO R,NAGHAVI M,FOREMAN K,et al. Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010:a systematic analysis for the Global Burden of Disease Study 2010[J]. Lancet,2012,380(9859):2095-2128. |
[3] | 国家卫生计生委疾病预防控制局. 中国居民营养与慢性病状况报告(2015年)[M]. 北京:人民卫生出版社,2015. |
[4] | 宋元林,郑劲平. 有关《常规肺功能检查基层指南(2018年)》的几点说明[J]. 中华全科医师杂志,2019,18(6):505-506. DOI:10.3760/cma.j.issn.1671-7368.2019.06.001. |
[5] | SINGH D,AGUSTI A,ANZUETO A,et al. Global strategy for the diagnosis,management,and prevention of chronic obstructive lung disease:the GOLD science committee report 2019[J]. Eur Respir J,2019,53(5):1900164. DOI:10.1183/13993003.00164-2019. |
[6] | 周钰焱,金志贤,刘翱,等. 慢阻肺患者临床症状评估方法比较及进展[J]. 临床肺科杂志,2019,24(12):2284-2287,2294. DOI:10.3969/j.issn.1009-6663.2019.12.035. |
[7] | 兰丰铃,李嘉琛,余灿清,等. 中国成年人气流受限与慢性病主要死亡风险的前瞻性研究[J]. 中华流行病学杂志,2017,38(1):13-19. DOI:10.3760/cma.j.issn.0254-6450.2017.01.003. |
[8] | LIU B,LI K,HUANG D S,et al. iEnhancer-EL:identifying enhancers and their strength with ensemble learning approach[J]. Bioinformatics,2018,34(22):3835-3842. |
[9] | SAGI O,ROKACH L. Ensemble learning:a survey[J].Wiley Interdiscip Rev:Data Min Knowl Discov,2018,8(4):e1249. |
[10] | 董泉明,宋天然,姜晨宇,等. FEV1多元线性回归模型在肺功能测试中的应用[J]. 南方医科大学学报,2020,40(12):1799-1803. DOI:10.12122/j.issn.1673-4254.2020.12.15. |
[11] | ZAFARI Z,SIN D D,POSTMA D S,et al. Individualized prediction of lung-function decline in chronic obstructive pulmonary disease[J]. CMAJ,2016,188(14):1004-1011. |
[12] | 苏建华,车国卫. 肺癌患者术前肺功能评定的现状与进展[J].中国肿瘤临床,2017,44(7):301-305. DOI:10.3969/j.issn.1000-8179.2017.07.730. |
[13] | 郭志斌,李宣广,陈军. 肺癌患者术前肺功能评估研究进展[J]. 社区医学杂志,2019,17(7):431-434. |
[14] | 祁卉卉,陆燕,刘晓东,等. 上海市老年人肺通气功能检查正常参考值的初步研究[J]. 临床肺科杂志,2018,23(7):1236-1239. DOI:10.3969/j.issn.1009-6663.2018.07.020. |
[15] | 胡银霞,张丽,范锦秀. 老年慢性阻塞性肺疾病病人肺功能指标变化以及危险因素分析[J]. 实用老年医学,2020,34(9):934-936. DOI:10.3969/j.issn.1003-9198.2020.09.019. |
[16] | SALVI S S,BRASHIER B B,LONDHE J,et al. Phenotypic comparison between smoking and non-smoking chronic obstructive pulmonary disease[J].Respir Res,2020,21(1):50. |
[17] | 王辉,叶彩虹,马焕丽,等. 吸烟介导的COPD呼吸道微生态失调对Treg/Th17失衡的影响[J]. 分子诊断与治疗杂志,2021,13(3):437-440,444. DOI:10.19930/j.cnki.jmdt.2021.03.025. |
[18] | KÖCHLI S,ENDES K,BARTENSTEIN T,et al. Lung function,obesity and physical fitness in young children:The EXAMIN YOUTH study[J]. Respir Med,2019,159:105813. |
[19] | ZHU J,ZHAO Z,WU B,et al. Effect of body mass index on lung function in Chinese patients with chronic obstructive pulmonary disease:a multicenter cross-sectional study[J]. Int J Chron Obstruct Pulmon Dis,2020,15:2477-2486. DOI:10.2147/COPD.S265676.eCollection2020. |
[20] | GRIGSBY M R,SIDDHARTHAN T,POLLARD S L,et al. Low body mass index is associated with higher odds of COPD and lower lung function in low- and middle-income countries[J]. COPD,2019,16(1):58-65. DOI:10.1080/15412555.2019.1589443. |
[21] | GUPTA N,PINTO L M,MOROGAN A,et al. The COPD assessment test:a systematic review[J]. Eur Respir J,2014,44(4):873-884. DOI:10.1183/09031936.00025214. |
[22] | PASQUALE M K,XU Y,BAKER C L,et al. COPD exacerbations associated with the modified Medical Research Council scale and COPD assessment test among Humana Medicare members[J]. Int J Chron Obstruct Pulmon Dis,2016,11:111-121. DOI:10.2147/COPD.S94323. |
[23] | STEKHOVEN D J,BÜHLMANN P. MissForest—non-parametric missing value imputation for mixed-type data[J]. Bioinformatics,2012,28(1):112-118. DOI:10.1093/bioinformatics/btr597. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||