Refine
Document Type
- Article (13)
- Doctoral Thesis (7)
Has Fulltext
- yes (20)
Is part of the Bibliography
- no (20)
Keywords
- - (9)
- heat stress (2)
- <i>Solanum lycopersicum</i> L. (1)
- Aging (1)
- Alterung (1)
- Artificial neural networks (1)
- Bayes-Netz (1)
- Bioinformatics (1)
- Bioinformatik (1)
- Biometrie (1)
Institute
- Institut für Biometrie und Medizinische Informatik (20) (remove)
Publisher
Plus‐strand RNA [(+)RNA] viruses are the largest group of viruses, medically highly relevant human pathogens, and are a socio‐economic burden. The current global pandemic of the severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) shows how a virus has been rapidly spreading around the globe and that– without an antiviral treatment– virus trans mission is solely dependent on human behavior. However, other (+)RNA viruses such as rhino‐, noro‐, dengue‐ (DENV), Zika, and hepatitis C virus (HCV) are constantly spreading and expanding geographically. As in the case of hepatitis C, since its first identification in the 1970s, it took more than 30 years to understand the HCV structure, genome organiza t ion, life cycle, and virus‐host interplay leading to the cure of a chronic and life‐threatening disease. However, no vaccination or antiviral treatment exists for most (+)RNA viruses. Con sequently, a precise and comprehensive analysis of the viruses, their life cycles, and parasitic interactions with their hosts remains an important field of research. In the presented thesis, we use mathematical modeling to study the life cycles of (+)RNA viruses. We analyze replication strategies of closely related (+)RNA viruses, namely HCV, DENV, and coxsackievirus B3 (CVB3), to compare their life cycles in the presence and ab sence of the host’s immune response and antiviral drug treatment and consider different viral spreading mechanisms. Host dependency factors shape the viral life cycle, contribut ing to permissiveness and replication efficiency. Our mathematical models predicted that host dependency factors, such as ribosomes, and thus the virus’ ability to hijack the host cell’s translation machinery play an essential role in the viral genome replication efficiency. Furthermore, our mathematical model suggested that the availability of ribosomes in the vi ral life cycle is a crucial factor in disease outcome: the development of an acute or chronic disease. Even though the host developed strategies to attack the virus, e.g., by degrading the viral genome, blocking the viral protein production, and preventing viral spread, viruses found strategies to countermeasure those so‐called host restriction factors derived from the immune system. Our mathematical models predicted that DENV might be highly effective in blocking the cell’s attempts to recognize the invader. Moreover, we found ongoing HCV RNAreplication even with highly effective antiviral drugs that block processes in the viral life cycle. Furthermore, we found alternative pathways of infection spread, e.g., by HCV RNA carrying exosomes, which may be a possible explanation for reported plasma HCV RNA at the end of treatment, found in a subset of patients. Hence, the mathematical models presented in this thesis provide valuable tools to study the viral replication mechanism in detail. Even though being a simplification of reality, our model predictions confirm and explain known and suggest novel biological mechanisms. In the pre sented thesis, I will summarize and discuss key findings and contextualize model predictions in the broader scientific literature to improve our understanding of the viral dynamics and the virus‐host interplay.
Age is the single biggest risk factor for most major human diseases. As such, understanding the intricate molecular changes that drive biological aging holds great promise in attempting to slow
the onset of systemic diseases and thereby increase the effective health-span in modern societies.
This thesis explores several computational approaches to capture and analyze the molecular biological alterations triggered by intrinsic and extrinsic aging using skin as a model tissue to deliver genes and pathways as potential targets for intervention strategies.
Publication 1 demonstrates the utility of multi-omics data integration strategies for aging research, leading to the identification of four latent aging phases in skin tissue through an integrated cluster analysis of gene expression and DNA methylation data. The four phases improved the detection of molecular aging signals and were shown to be associated with sunbathing habits of the test subjects. Deeper analysis revealed extensive non-linear alterations in various biological pathways particularly at the transition into the fourth aging phase, coinciding with menopause, with potentially wide-reaching functional implications. Publication 2 describes the development of a novel type of age clock, that provides a new level of interpretability by embedding biological pathway information in the architecture of an artificial neural network. The clock not only generates meaningful biological age estimates from gene expression data, but further allows simultaneous monitoring of the aging states of various biological processes through the activations of intermediate neurons. Analyses of the inner workings of the clock revealed a wide-spread impact of aging on the global pathway landscape. Simulation experiments using the transcriptomic clock recapitulated known functional aging gene associations and allowed deciphering of the pathways by which accelerated aging conditions such as chronic sun exposure and Hutchinson-Gilford progeria syndrome exert their effects. Publication 3 further explores the molecular alterations caused by the pro-aging effector UV irradiation in the skin. The multi-omics data analysis of repetitively irradiated skin revealed signs of the immediate acquisition of aging- and cancer-related epigenetic signatures and concurrent wide-spread transcriptional changes across various biological processes. Investigations into the varying resilience to irradiation between subjects revealed prognostic biomarker signatures capable of predicting individual UV tolerances, with accuracies far surpassing the traditional Fitzpatrick classification scheme. Further analysis of the transcripts and pathways associated with UV tolerance identified a form of melanin-independent DNA damage protection in individuals with higher innate UV resilience.
Together, the approaches and findings described in this thesis explore several new angles to advance our understanding of aging processes and external drivers of aging such as UV irradiation in the human skin and deliver new insight on target genes and pathways involved.
Non-coding RNA (ncRNA) classes take over important housekeeping and regulatory functions and are quite heterogeneous in terms of length, sequence conservation and secondary structure. High-throughput sequencing reveals that the expressed novel ncRNAs and their classification are important to understand cell regulation and identify potential diagnostic and therapeutic biomarkers. To improve the classification of ncRNAs, we investigated different approaches of utilizing primary sequences and secondary structures as well as the late integration of both using machine learning models, including different neural network architectures. As input, we used the newest version of RNAcentral, focusing on six ncRNA classes, including lncRNA, rRNA, tRNA, miRNA, snRNA and snoRNA. The late integration of graph-encoded structural features and primary sequences in our MncR classifier achieved an overall accuracy of >97%, which could not be increased by more fine-grained subclassification. In comparison to the actual best-performing tool ncRDense, we had a minimal increase of 0.5% in all four overlapping ncRNA classes on a similar test set of sequences. In summary, MncR is not only more accurate than current ncRNA prediction tools but also allows the prediction of long ncRNA classes (lncRNAs, certain rRNAs) up to 12.000 nts and is trained on a more diverse ncRNA dataset retrieved from RNAcentral.
Discovering Latent Structure in High-Dimensional Healthcare Data: Toward Improved Interpretability
(2022)
This cumulative thesis describes contributions to the field of interpretable machine learning in the healthcare domain. Three research articles are presented that lie at the intersection of biomedical and machine learning research. They illustrate how incorporating latent structure can provide a valuable compression of the information hidden in complex healthcare data.
Methodologically, this thesis gives an overview of interpretable machine learning and the discovery of latent structure, including clusters, latent factors, graph structure, and hierarchical structure. Different workflows are developed and applied to two main types of complex healthcare data (cohort study data and time-resolved molecular data). The core result builds on Bayesian networks, a type of probabilistic graphical model. On the application side, we provide accurate predictive or discriminative models focusing on relevant medical conditions, related biomarkers, and their interactions.
Background: Retrospective research on real-world data provides the ability to gain evidence on specific topics especially when running across different sites in research networks. Those research networks have become increasingly relevant in recent years; not least due to the special situation caused by the COVID-19 pandemic. An important requirement for those networks is the data harmonization by ensuring the semantic interoperability. Aims: In this paper we demonstrate (1) how to facilitate digital infrastructures to run a retrospective study in a research network spread across university and non-university hospital sites; and (2) to answer a medical question on COVID-19 related change in diagnostic counts for diabetes-related eye diseases. Materials and methods: The study is retrospective and non-interventional and runs on medical case data documented in routine care at the participating sites. The technical infrastructure consists of the OMOP CDM and other OHDSI tools that is provided in a transferable format. An ETL process to transfer and harmonize the data to the OMOP CDM has been utilized. Cohort definitions for each year in observation have been created centrally and applied locally against medical case data of all participating sites and analyzed with descriptive statistics. Results: The analyses showed an expectable drop of the total number of diagnoses and the diagnoses for diabetes in general; whereas the number of diagnoses for diabetes-related eye diseases surprisingly decreased stronger compared to non-eye diseases. Differences in relative changes of diagnoses counts between sites show an urgent need to process multi-centric studies rather than single-site studies to reduce bias in the data. Conclusions: This study has demonstrated the ability to utilize an existing portable and standardized infrastructure and ETL process from a university hospital setting and transfer it to non-university sites. From a medical perspective further activity is needed to evaluate data quality of the utilized real-world data documented in routine care and to investigate its eligibility of this data for research.
(1) Background: Global incidence of type 1 diabetes (T1D) is rising and nearly half occurred in adults. However, it is unclear if certain early-life childhood T1D risk factors were also associated with adult-onset T1D. This study aimed to assess associations between birth order, delivery mode or daycare attendance and type 1 diabetes (T1D) risk in a population-based cohort and whether these were similar for childhood- and adult-onset T1D (cut-off age 15); (2) Methods: Data were obtained from the German National Cohort (NAKO Gesundheitsstudie) baseline assessment. Self-reported diabetes was classified as T1D if: diagnosis age ≤ 40 years and has been receiving insulin treatment since less than one year after diagnosis. Cox regression was applied for T1D risk analysis; (3) Results: Analyses included 101,411 participants (100 childhood- and 271 adult-onset T1D cases). Compared to “only-children”, HRs for second- or later-born individuals were 0.70 (95% CI = 0.50–0.96) and 0.65 (95% CI = 0.45–0.94), respectively, regardless of parental diabetes, migration background, birth year and perinatal factors. In further analyses, higher birth order reduced T1D risk in children and adults born in recent decades. Caesarean section and daycare attendance showed no clear associations with T1D risk; (4) Conclusions: Birth order should be considered in both children and adults’ T1D risk assessment for early detection.
IntroductionThe co-occurrence of health risk behaviours (HRBs, ie, tobacco smoking, at-risk alcohol use, insufficient physical activity and unhealthy diet) increases the risks of cancer, other chronic diseases and mortality more than additively; and applies to more than half of adult general populations. However, preventive measures that target all four HRBs and that reach the majority of the target populations, particularly those persons most in need and hard to reach are scarce. Electronic interventions may help to efficiently address multiple HRBs in healthcare patients. The aim is to investigate the acceptance of a proactive and brief electronic multiple behaviour change intervention among general hospital patients with regard to reach, retention, equity in reach and retention, satisfaction and changes in behaviour change motivation, HRBs and health.Methods and analysisA pre–post intervention study with four time points is conducted at a general hospital in Germany. All patients, aged 18–64 years, admitted to participating wards of five medical departments (internal medicine A and B, general surgery, trauma surgery, ear, nose and throat medicine) are systematically approached and invited to participate. Based on behaviour change theory and individual HRB profile, 175 participants receive individualised and motivation-enhancing computer-generated feedback at months 0, 1 and 3. Intervention reach and retention are determined by the proportion of participants among eligible patients and of participants who continue participation, respectively. Equity in reach and retention are measured with regard to school education and other sociodemographics. To investigate satisfaction with the intervention and subsequent changes, a 6-month follow-up is conducted. Descriptive statistics, multivariate regressions and latent growth modelling are applied.Ethics and disseminationThe local ethics commission and data safety appointee approved the study procedures. Results will be disseminated via publication in international scientific journals and presentations on scientific conferences.Trial registration numberNCT05365269.
Background
Approaching epidemiological data with flexible machine learning algorithms is of great value for understanding disease-specific association patterns. However, it can be difficult to correctly extract and understand those patterns due to the lack of model interpretability.
Method
We here propose a machine learning workflow that combines random forests with Bayesian network surrogate models to allow for a deeper level of interpretation of complex association patterns. We first evaluate the proposed workflow on synthetic data. We then apply it to data from the large population-based Study of Health in Pomerania (SHIP). Based on this combination, we discover and interpret broad patterns of individual serum TSH concentrations, an important marker of thyroid functionality.
Results
Evaluations using simulated data show that feature associations can be correctly recovered by combining random forests and Bayesian networks. The presented model achieves predictive accuracy that is similar to state-of-the-art models (root mean square error of 0.66, mean absolute error of 0.55, coefficient of determination of R2 = 0.15). We identify 62 relevant features from the final random forest model, ranging from general health variables over dietary and genetic factors to physiological, hematological and hemostasis parameters. The Bayesian network model is used to put these features into context and make the black-box random forest model more understandable.
Conclusion
We demonstrate that the combination of random forest and Bayesian network analysis is helpful to reveal and interpret broad association patterns of individual TSH concentrations. The discovered patterns are in line with state-of-the-art literature. They may be useful for future thyroid research and improved dosing of therapeutics.
Since autumn 2020, rapid antigen tests (RATs) have been implemented in several countries as an important pillar of the national testing strategy to rapidly screen for infections on site during the SARS-CoV-2 pandemic. The current surge in infection rates around the globe is driven by the variant of concern (VoC) omicron (B.1.1.529). Here, we evaluated the performance of nine SARS-CoV-2 RATs in a single-centre laboratory study. We examined a total of 115 SARS-CoV-2 PCR-negative and 166 SARS-CoV-2 PCR-positive respiratory swab samples (101 omicron, 65 delta (B.1.617.2)) collected from October 2021 until January 2022 as well as cell culture-expanded clinical isolates of both VoCs. In an assessment of the analytical sensitivity in clinical specimen, the 50% limit of detection (LoD50) ranged from 1.77 × 106 to 7.03 × 107 RNA copies subjected to the RAT for omicron compared to 1.32 × 105 to 2.05 × 106 for delta. To score positive in these point-of-care tests, up to 10-fold (LoD50) or 101-fold (LoD95) higher virus loads were required for omicron- compared to delta-containing samples. The rates of true positive test results for omicron samples in the highest virus load category (Ct values < 25) ranged between 31.4 and 77.8%, while they dropped to 0–8.3% for samples with intermediate Ct values (25–30). Of note, testing of expanded virus stocks suggested a comparable RAT sensitivity of both VoCs, questioning the predictive value of this type of in vitro-studies for clinical performance. Given their importance for national test strategies in the current omicron wave, awareness must be increased for the reduced detection rate of omicron infections by RATs and a short list of suitable RATs that fulfill the minimal requirements of performance should be rapidly disclosed.
Data stewardship is an essential driver of research and clinical practice. Data collection, storage, access, sharing, and analytics are dependent on the proper and consistent use of data management principles among the investigators. Since 2016, the FAIR (findable, accessible, interoperable, and reusable) guiding principles for research data management have been resonating in scientific communities. Enabling data to be findable, accessible, interoperable, and reusable is currently believed to strengthen data sharing, reduce duplicated efforts, and move toward harmonization of data from heterogeneous unconnected data silos. FAIR initiatives and implementation trends are rising in different facets of scientific domains. It is important to understand the concepts and implementation practices of the FAIR data principles as applied to human health data by studying the flourishing initiatives and implementation lessons relevant to improved health research, particularly for data sharing during the coronavirus pandemic.