BackgroundAs a leading cause of ischemic cerebrovascular disease, carotid atherosclerosis (CAS) lowers the productivity of steelworkers. An increasing number of scholars have used machine learning to identify readily available factors to predict the risk of diseases. But there is still a lack of research on risk prediction models for CAS.
ObjectiveTo compare the performance of support vector machine (SVM) -, BP neural network (BPNN) - and random forest (RF) -based models in predicting the risk of CAS in steelworkers.
Methods4 568 steelworkers who underwent physical examination and health monitoring in Tangshan Hongci Hospital from March to June 2017 were selected for a survey using the Health Assessment Checklist developed by us for understanding their information about demographic characteristics (sex, age, BMI, education level, marital status) , personal behavior and lifestyle (smoking and drinking) , medical history (hypertension, diabetes, family history of CAS) , occupation history (current work in shifts, working under high temperature or in noisy environments) . Levels of serum cholesterol, triglyceride, homocysteine and uric acid were also collected. Variables for building SVM-, BPNN- and RF-based models for predicting the risk of CAS were determined using unconditioned multivariate Logistic regression analysis and literature review.
ResultsIn predicting the risk of CAS in participants in the training set, the accuracy, sensitivity and specificity were 83.81%, 80.10%, 87.32%, respectively, for the SVM-based model, 79.27%, 66.19%, 91.62%, respectively, for the BPNN-based model, and 86.60%, 73.62%, and 98.90%, respectively, for the RF-based model. And the AUC for SVM-, BPNN- and RF-based models was 0.84, 0.79 and 0.86, respectively. The SVM-based model had the highest sensitivity, while the RF-based model had the highest accuracy and specificity (P<0.05) . In predicting the risk of CAS in participants in the test set, the accuracy, sensitivity and specificity were 85.70%, 81.63%, 90.29%, respectively, for the SVM-based model, 75.46%, 64.65%, 87.66%, respectively, for the BPNN-based model, and 73.37%, 60.00%, and 88.45%, respectively, for the RF-based model. And the AUC for SVM-, BPNN- and RF-based models was 0.86, 0.76, and 0.74, respectively. The SVM-based model had the greatest accuracy, sensitivity and AUC. The sensitivity, accuracy and AUC of the SVM-based model were significantly different from those of the BPNN- or RF-based model in predicting the CAS risk (P<0.05) .
ConclusionThe SVM-based model may be better than other two models in predicting the risk of CAS in steelworkers.