Page 93 - 中国全科医学2022-01

P. 93

·218· http://www.chinagp.net E-mail:zgqkyx@chinagp.net.cn

People's Hospital，Chengdu 610072，China
3.University of Electronic Science and Technology of China，Chengdu 610072，China
4.Department of Pharmacy，University of Electronic Science and Technology of China Affiliated Hospital & Sichuan Provincial
People's Hospital，Chengdu 610072，China
5.Personalized Drug Therapy Key Laboratory of Sichuan Province，School of Medicine，University of Electronic Science and
Technology of China，Chengdu 610072，China
*
Corresponding author：WEN Xianxiu，Professor of nursing ；E-mail：392083173@qq.com
【Abstract】 Background The degree of airflow limitation is a key indicator of the progression degree in COPD
patients. However，problems such as contraindications to testing and compliance make it difficult for some patients to undergo
the relevant tests and evaluate the severity of the disease. Objective To develop and evaluate a machine learning algorithm-
based early warning model for the risk of severe airflow limitation in COPD patients. Methods A cross-sectional design was
used to investigate COPD inpatients in a tertiary hospital in Sichuan Province from 2019-01 to 2020-06. General clinical indexes
and pulmonary function test data were collected. The data were randomly divided into training and test sets in the ratio of 8 ∶ 2，
and 216 risk warning models were constructed in the training set using four missing value filling methods，three feature screening
methods，17 machine learning and one integrated learning algorithm. The area under the ROC curve （AUC），accuracy，
precision，recall and F1 score were used to evaluate the predictive performance of the model；and the ten-fold cross-validation
method and Bootstrapping were used for internal and external validation，respectively. The test set data was used for model testing
and selection，the posterior method was used for sample size verification. Results A total of 418 patients were included，
of which 212 （50.7%） patients were at risk of severe airflow limitation. After four missing value treatments and three feature
filters，a total of 12 processed datasets and the importance ranking of 12 factors affecting airflow limitation were obtained，and
the results showed that modified medical research council dyspnea scale grade （mMRC），age，body mass index （BMI），
smoking history （yes，no），chronic obstructive pulmonary disease assessment test （CAT） score，and dyspnea （yes，
no） were at the forefront inthe ranking of variable features and were key indicators for constructing the model，which had an
important role in predicting the outcome. Using unfilled，Lasso screening，mMRC grade，smoking history （yes，no），and
dyspnea （yes，no） were the top 3 predictors，with mMRC grade accounting for 54.15% of feature importance. In which，using
unfilled，Boruta screening，CAT score，age，and mMRC class were the top 3 predictors，and CAT score accounted for 26.64%
of feature importance. A total of 216 prediction models were obtained using 17 machine learning algorithms and 1 integrated
learning for each of the 12 datasets. 17 machine learning algorithms with 10-fold cross-validation showed that the differences were
statistically significant（P<0.05） when comparing the prediction performance of different algorithms，and the average AUC of
the stochastic gradient descent algorithm was maximum （0.738±0.089）. The results of external validation of the test set using
the Bootstrapping algorithm showed that the differences were statistically significant （P<0.05） when comparing the prediction
performance of the models obtained by different algorithms，and the average AUC of the integrated learning algorithm was
maximum （0.757±0.057）. Evaluation of the prediction performance of four missing value treatments and three feature filters
using the Bootstrapping algorithm showed that the performance of the model was improved when no padding and Lasso filtering
were applied，with a statistically significant difference （P<0.05）. Using the test set data for 216 machine learning models，
the best model had an AUC of 0.790 9，accuracy of 75.90%，precision of 75.00%，recall of 78.57%，and F1 value of 0.767 4.
The sample size validation results suggested that the study sample size can meet the modeling needs. Conclusion In this study，
a risk warning model for severe airflow limitation in COPD patients was developed and evaluated. mMRC class，age，BMI，CAT
score，presence of smoking history and dyspnea were the key indicators affecting airflow limitation. The model has good predictive
effect and has potential clinical application.
【Key words】 Pulmonary disease，chronic obstructive；Machine learning；Degree of airflow limitation；Lung
function；Respiratory function tests；Prediction model

据报道，全球有 5.449 亿人患有慢性呼吸道疾病，重程度的主要指标［4-5］，然而部分患者无法顺利实施
大多数慢性呼吸道疾病患者死于慢性阻塞性肺疾病检查获得该指标，如咯血、气胸、安置胸腔闭式引流管、
（chronic obstructive pulmonary disease，COPD）［1-2］。心功能不全、主动脉瘤等有肺功能检查禁忌证的患者，
COPD 严重影响患者的生活质量，为家庭和社会带来沉无法完成检查体位和呼吸动作且配合度较差的患者，以
重的经济负担［3］，已成为 21 世纪危害人类健康的重要及各种原因未定期随访的患者等。该项指标的缺失可能
公共卫生问题。气流受限程度是判定 COPD 患者疾病严使 COPD 患者忽视重度气流受限风险，导致不良预后。

88 89 90 91 92 93 94 95 96 97 98