Refine
Year of publication
- 2022 (6) (remove)
Document Type
- Article (5)
- Doctoral Thesis (1)
Language
- English (6)
Has Fulltext
- yes (6)
Is part of the Bibliography
- no (6)
Keywords
- - (2)
- Bayes-Netz (1)
- COVID (1)
- Cluster (1)
- Data Science (1)
- Diagnostic test (1)
- Dimensionsreduktion (1)
- Fettleber (1)
- Hierarchie (1)
- Interpretability (1)
Institute
- Institut für Biometrie und Medizinische Informatik (6) (remove)
Publisher
- MDPI (2)
- BMJ Publishing Group (1)
- Public Library of Science (PLoS) (1)
- Springer Nature (1)
Background: Retrospective research on real-world data provides the ability to gain evidence on specific topics especially when running across different sites in research networks. Those research networks have become increasingly relevant in recent years; not least due to the special situation caused by the COVID-19 pandemic. An important requirement for those networks is the data harmonization by ensuring the semantic interoperability. Aims: In this paper we demonstrate (1) how to facilitate digital infrastructures to run a retrospective study in a research network spread across university and non-university hospital sites; and (2) to answer a medical question on COVID-19 related change in diagnostic counts for diabetes-related eye diseases. Materials and methods: The study is retrospective and non-interventional and runs on medical case data documented in routine care at the participating sites. The technical infrastructure consists of the OMOP CDM and other OHDSI tools that is provided in a transferable format. An ETL process to transfer and harmonize the data to the OMOP CDM has been utilized. Cohort definitions for each year in observation have been created centrally and applied locally against medical case data of all participating sites and analyzed with descriptive statistics. Results: The analyses showed an expectable drop of the total number of diagnoses and the diagnoses for diabetes in general; whereas the number of diagnoses for diabetes-related eye diseases surprisingly decreased stronger compared to non-eye diseases. Differences in relative changes of diagnoses counts between sites show an urgent need to process multi-centric studies rather than single-site studies to reduce bias in the data. Conclusions: This study has demonstrated the ability to utilize an existing portable and standardized infrastructure and ETL process from a university hospital setting and transfer it to non-university sites. From a medical perspective further activity is needed to evaluate data quality of the utilized real-world data documented in routine care and to investigate its eligibility of this data for research.
(1) Background: Global incidence of type 1 diabetes (T1D) is rising and nearly half occurred in adults. However, it is unclear if certain early-life childhood T1D risk factors were also associated with adult-onset T1D. This study aimed to assess associations between birth order, delivery mode or daycare attendance and type 1 diabetes (T1D) risk in a population-based cohort and whether these were similar for childhood- and adult-onset T1D (cut-off age 15); (2) Methods: Data were obtained from the German National Cohort (NAKO Gesundheitsstudie) baseline assessment. Self-reported diabetes was classified as T1D if: diagnosis age ≤ 40 years and has been receiving insulin treatment since less than one year after diagnosis. Cox regression was applied for T1D risk analysis; (3) Results: Analyses included 101,411 participants (100 childhood- and 271 adult-onset T1D cases). Compared to “only-children”, HRs for second- or later-born individuals were 0.70 (95% CI = 0.50–0.96) and 0.65 (95% CI = 0.45–0.94), respectively, regardless of parental diabetes, migration background, birth year and perinatal factors. In further analyses, higher birth order reduced T1D risk in children and adults born in recent decades. Caesarean section and daycare attendance showed no clear associations with T1D risk; (4) Conclusions: Birth order should be considered in both children and adults’ T1D risk assessment for early detection.
Background
Approaching epidemiological data with flexible machine learning algorithms is of great value for understanding disease-specific association patterns. However, it can be difficult to correctly extract and understand those patterns due to the lack of model interpretability.
Method
We here propose a machine learning workflow that combines random forests with Bayesian network surrogate models to allow for a deeper level of interpretation of complex association patterns. We first evaluate the proposed workflow on synthetic data. We then apply it to data from the large population-based Study of Health in Pomerania (SHIP). Based on this combination, we discover and interpret broad patterns of individual serum TSH concentrations, an important marker of thyroid functionality.
Results
Evaluations using simulated data show that feature associations can be correctly recovered by combining random forests and Bayesian networks. The presented model achieves predictive accuracy that is similar to state-of-the-art models (root mean square error of 0.66, mean absolute error of 0.55, coefficient of determination of R2 = 0.15). We identify 62 relevant features from the final random forest model, ranging from general health variables over dietary and genetic factors to physiological, hematological and hemostasis parameters. The Bayesian network model is used to put these features into context and make the black-box random forest model more understandable.
Conclusion
We demonstrate that the combination of random forest and Bayesian network analysis is helpful to reveal and interpret broad association patterns of individual TSH concentrations. The discovered patterns are in line with state-of-the-art literature. They may be useful for future thyroid research and improved dosing of therapeutics.
Discovering Latent Structure in High-Dimensional Healthcare Data: Toward Improved Interpretability
(2022)
This cumulative thesis describes contributions to the field of interpretable machine learning in the healthcare domain. Three research articles are presented that lie at the intersection of biomedical and machine learning research. They illustrate how incorporating latent structure can provide a valuable compression of the information hidden in complex healthcare data.
Methodologically, this thesis gives an overview of interpretable machine learning and the discovery of latent structure, including clusters, latent factors, graph structure, and hierarchical structure. Different workflows are developed and applied to two main types of complex healthcare data (cohort study data and time-resolved molecular data). The core result builds on Bayesian networks, a type of probabilistic graphical model. On the application side, we provide accurate predictive or discriminative models focusing on relevant medical conditions, related biomarkers, and their interactions.
Introduction
The co-occurrence of health risk behaviours (HRBs, ie, tobacco smoking, at-risk alcohol use, insufficient physical activity and unhealthy diet) increases the risks of cancer, other chronic diseases and mortality more than additively; and applies to more than half of adult general populations. However, preventive measures that target all four HRBs and that reach the majority of the target populations, particularly those persons most in need and hard to reach are scarce. Electronic interventions may help to efficiently address multiple HRBs in healthcare patients. The aim is to investigate the acceptance of a proactive and brief electronic multiple behaviour change intervention among general hospital patients with regard to reach, retention, equity in reach and retention, satisfaction and changes in behaviour change motivation, HRBs and health.
Methods and analysis
A pre–post intervention study with four time points is conducted at a general hospital in Germany. All patients, aged 18–64 years, admitted to participating wards of five medical departments (internal medicine A and B, general surgery, trauma surgery, ear, nose and throat medicine) are systematically approached and invited to participate. Based on behaviour change theory and individual HRB profile, 175 participants receive individualised and motivation-enhancing computer-generated feedback at months 0, 1 and 3. Intervention reach and retention are determined by the proportion of participants among eligible patients and of participants who continue participation, respectively. Equity in reach and retention are measured with regard to school education and other sociodemographics. To investigate satisfaction with the intervention and subsequent changes, a 6-month follow-up is conducted. Descriptive statistics, multivariate regressions and latent growth modelling are applied.
Ethics and dissemination
The local ethics commission and data safety appointee approved the study procedures. Results will be disseminated via publication in international scientific journals and presentations on scientific conferences.
Trial registration numberNCT05365269.
Since autumn 2020, rapid antigen tests (RATs) have been implemented in several countries as an important pillar of the national testing strategy to rapidly screen for infections on site during the SARS-CoV-2 pandemic. The current surge in infection rates around the globe is driven by the variant of concern (VoC) omicron (B.1.1.529). Here, we evaluated the performance of nine SARS-CoV-2 RATs in a single-centre laboratory study. We examined a total of 115 SARS-CoV-2 PCR-negative and 166 SARS-CoV-2 PCR-positive respiratory swab samples (101 omicron, 65 delta (B.1.617.2)) collected from October 2021 until January 2022 as well as cell culture-expanded clinical isolates of both VoCs. In an assessment of the analytical sensitivity in clinical specimen, the 50% limit of detection (LoD50) ranged from 1.77 × 106 to 7.03 × 107 RNA copies subjected to the RAT for omicron compared to 1.32 × 105 to 2.05 × 106 for delta. To score positive in these point-of-care tests, up to 10-fold (LoD50) or 101-fold (LoD95) higher virus loads were required for omicron- compared to delta-containing samples. The rates of true positive test results for omicron samples in the highest virus load category (Ct values < 25) ranged between 31.4 and 77.8%, while they dropped to 0–8.3% for samples with intermediate Ct values (25–30). Of note, testing of expanded virus stocks suggested a comparable RAT sensitivity of both VoCs, questioning the predictive value of this type of in vitro-studies for clinical performance. Given their importance for national test strategies in the current omicron wave, awareness must be increased for the reduced detection rate of omicron infections by RATs and a short list of suitable RATs that fulfill the minimal requirements of performance should be rapidly disclosed.