Refine
Year of publication
- 2022 (2)
Document Type
- Article (2)
Language
- English (2) (remove)
Has Fulltext
- yes (2)
Is part of the Bibliography
- no (2)
Institute
Publisher
The Apolipoprotein E (APOE) gene polymorphism (rs429358 and rs7412) shows a well-established association with lipid profiles, but its effect on cardiovascular disease is still conflicting. Therefore, we examined the association of different APOE alleles with common carotid artery intima-media thickness (CCA-IMT), carotid plaques, incident myocardial infarction (MI) and stroke. We analyzed data from 3327 participants aged 20–79 years of the population-based Study of Health in Pomerania (SHIP) from Northeast Germany with a median follow-up time of 14.5 years. Linear, logistic, and Cox-regression models were used to assess the associations of the APOE polymorphism with CCA-IMT, carotid plaques, incident MI and stroke, respectively. In our study, the APOE E2 allele was associated with lower CCA-IMT at baseline compared to E3 homozygotes (β: − 0.02 [95% CI − 0.04, − 0.004]). Over the follow-up, 244 MI events and 218 stroke events were observed. APOE E2 and E4 allele were not associated with incident MI (E2 HR: 1.06 [95% CI 0.68, 1.66]; E4 HR: 1.03 [95% CI 0.73, 1.45]) and incident stroke (E2 HR: 0.79 [95% CI 0.48, 1.30]; E4 HR: 0.96 [95% CI 0.66, 1.38]) in any of the models adjusting for potential confounders. However, the positive association between CCA-IMT and incident MI was more pronounced in E2 carriers than E3 homozygotes. Thus, our study suggests that while APOE E2 allele may predispose individuals to lower CCA-IMT, E2 carriers may be more prone to MI than E3 homozygotes as the CCA-IMT increases. APOE E4 allele had no effect on CCA-IMT, plaques, MI or stroke.
Background
Approaching epidemiological data with flexible machine learning algorithms is of great value for understanding disease-specific association patterns. However, it can be difficult to correctly extract and understand those patterns due to the lack of model interpretability.
Method
We here propose a machine learning workflow that combines random forests with Bayesian network surrogate models to allow for a deeper level of interpretation of complex association patterns. We first evaluate the proposed workflow on synthetic data. We then apply it to data from the large population-based Study of Health in Pomerania (SHIP). Based on this combination, we discover and interpret broad patterns of individual serum TSH concentrations, an important marker of thyroid functionality.
Results
Evaluations using simulated data show that feature associations can be correctly recovered by combining random forests and Bayesian networks. The presented model achieves predictive accuracy that is similar to state-of-the-art models (root mean square error of 0.66, mean absolute error of 0.55, coefficient of determination of R2 = 0.15). We identify 62 relevant features from the final random forest model, ranging from general health variables over dietary and genetic factors to physiological, hematological and hemostasis parameters. The Bayesian network model is used to put these features into context and make the black-box random forest model more understandable.
Conclusion
We demonstrate that the combination of random forest and Bayesian network analysis is helpful to reveal and interpret broad association patterns of individual TSH concentrations. The discovered patterns are in line with state-of-the-art literature. They may be useful for future thyroid research and improved dosing of therapeutics.