中国全科医学 ›› 2023, Vol. 26 ›› Issue (25): 3104-3111.DOI: 10.12114/j.issn.1007-9572.2022.0756

所属专题: 神经退行性病变最新文章合集 阿尔茨海默病最新文章合集

• 论著·人群健康研究 • 上一篇    下一篇

基于metaPRS与APOEε4优化轻度认知障碍遗传风险统计建模策略的应用研究

李梓盟1, 王荣1, 陈帅1, 赵彩丽1, 王晓聪2, 温雅璐1,3,*(), 刘龙1,3,*()   

  1. 1.030000 山西省太原市,山西医科大学公共卫生学院卫生统计学教研室
    2.School of Public Health and Preventive Medicine,Monash University,Melbourne 3800,Australia
    3.030000 山西省太原市,重大疾病风险评估山西省重点实验室
  • 收稿日期:2022-11-16 修回日期:2023-04-10 出版日期:2023-09-05 发布日期:2023-05-30
  • 通讯作者: 温雅璐, 刘龙

  • 作者贡献:李梓盟负责提出研究选题方向、对文章进行可行性分析、对结果进行解释分析、论文撰写与修订;王荣、陈帅、赵彩丽负责文献/资料收集、翻译与整理;王晓聪负责搜集数据;温雅璐、刘龙负责核心督导,对文章整体负责;所有作者确认了论文的最终稿。
  • 基金资助:
    国家自然科学基金资助项目(81903418,82173632)

Application of metaPRS and APOEε4 to Optimize Genetic Risk Prediction Modeling Strategy for Mild Cognitive Impairment

LI Zimeng1, WANG Rong1, CHEN Shuai1, ZHAO Caili1, WANG Xiaocong2, WEN Yalu1,3,*(), LIU Long1,3,*()   

  1. 1. Department of Biostatistics, School of Public Health, Shanxi Medical University, Taiyuan 030000, China
    2. School of Public Health and Preventive Medicine, Monash University, Melbourne 3800, Australia
    3. Shanxi Provincial Key Laboratory of Major Assessment Disease Risk, Taiyuan 030000, China
  • Received:2022-11-16 Revised:2023-04-10 Published:2023-09-05 Online:2023-05-30
  • Contact: WEN Yalu, LIU Long

摘要: 背景 轻度认知障碍(MCI)是干预和延缓痴呆进展的重要阶段,既往研究发现MCI与遗传因素存在紧密关联,且载脂蛋白E(APOE)ε4是医学界公认的MCI重要风险等位基因。由于缺少MCI的全基因组关联研究(GWAS)汇总数据,当前普遍以阿尔茨海默病(AD)的GWAS汇总数据作为Base数据集来计算MCI的多基因风险评分(PRS),致使MCI的PRS遗传风险预测效果并不理想。 目的 本研究以多基因遗传风险综合评分(metaPRS)与APOEε4作为重要预测因子,从广义线性模型与机器学习角度,探索并优化MCI的遗传风险统计建模策略。 方法 计算MCI的12个亚表型PRS,并利用弹性网状Logistic回归模型将其整合为MCI的metaPRS。利用年龄矫正的APOEε4效应量计算APOEε4加权总和(SCOREAPOE)。以metaPRS、SCOREAPOE及基本人口学信息(年龄、性别、受教育程度)构建不同的预测因子纳入策略,以XGBoost、GBM、Logistic回归及Lasso回归作为统计建模方法,采用AUC及F-measure评价MCI遗传风险统计建模的预测效果。 结果 metaPRS与SCOREAPOE对于MCI的遗传风险有较高的预测价值,纳入metaPRS、SCOREAPOE及基本人口学信息(年龄、性别、受教育程度)后,各个统计建模方法的预测效果为:XGBoost(AUC=0.69,F-measure=0.88),GBM(AUC=0.76,F-measure=0.87),Logistic回归(AUC=0.77,F-measure=0.89),Lasso回归(AUC=0.76,F-measure=0.92)。 结论 在样本量为325(<500)的情况下,以metaPRS、SCOREAPOE与基本人口学信息为预测因子,以Lasso回归为统计建模方法的MCI遗传风险预测效果最好。本研究为MCI等复杂疾病的遗传风险统计建模提供了新的思路与视角。

关键词: 轻度认知障碍, 多基因风险评分, 多基因遗传风险综合评分, 载脂蛋白Eε4, 遗传风险预测, 统计建模优化

Abstract:

Background

Mild cognitive impairment (MCI) is an important stage to intervene and delay the progression of dementia, and it has been shown closely associated with genetic factors, among which apolipoprotein E (APOE) ε4 is recognized as an important risk allele of MCI in the medical field. Due to the lack of Genome-Wide Association Study (GWAS) summary data of MCI, it is common to use the GWAS summary data of Alzheimer's disease (AD) as the base dataset to calculate the polygenic risk score (PRS) of MCI, resulting in suboptimal PRS genetic risk prediction for MCI.

Objective

To explore the and optimize the statistical modeling strategy of genetic risk in MCI from the perspective of generalized linear model and machine learning, using meta-polygenic risk score (metaPRS) and APOEε4 as important predictors.

Methods

PRS for the 12 MCI-related traits were calculated and integrated into metaPRS for MCI by elastic-net Logistic regression model. SCOREAPOE was calculated by weighting the APOEε4 effect size with age correction. XGBoost, GBM, Logistic regression and Lasso regression were used as statistical modeling methods to verify the inclusion strategies of different predictors based on metaPRS, SCOREAPOE and basic demographic information (age, gender, education level) . AUC and F-measure were used to evaluate the predictive effect of statistical modeling of genetic risk of MCI.

Results

metaPRS and SCOREAPOE have high predictive value for the genetic risk of MCI. After including metaPRS, SCOREAPOE and basic demographic information (age, gender, education level) , the predictive effect of each statistical modeling method is XGBoost (AUC=0.69, F-measure=0.88) , GBM (AUC=0.76, F-measure=0.87) , Logistic regression (AUC=0.77, F-measure=0.89) , and Lasso regression (AUC=0.76, F-measure=0.92) .

Conclusion

When the sample size is 325 (less than 500) , the Lasso regression model constructed by including metaPRS, SCOREAPOE and basic demographic information (age, gender, education level) as predictors has the best effect on MCI genetic risk prediction, providing a new idea and perspective for statistical modeling of genetic risk of complex diseases such as MCI.

Key words: Mild cognitive impairment, Polygenic risk score, MetaPRS, APOEε4, Genetic risk prediction, Statistical modeling optimization