Institut für Biometrie und Medizinische Informatik
Refine
Document Type
- Article (13)
- Doctoral Thesis (7)
Has Fulltext
- yes (20)
Is part of the Bibliography
- no (20)
Keywords
- - (8)
- heat stress (2)
- <i>Solanum lycopersicum</i> L. (1)
- Aging (1)
- Alterung (1)
- Artificial neural networks (1)
- Bayes-Netz (1)
- Bioinformatics (1)
- Bioinformatik (1)
- Biometrie (1)
Institute
Publisher
- MDPI (6)
- Frontiers Media S.A. (3)
- BMJ Publishing Group (1)
- JMIR Publications (1)
- Public Library of Science (PLoS) (1)
- Springer Nature (1)
Dengue virus (DV) is a positive-strand RNA virus of the Flavivirus genus. It is one of the most prevalent mosquito-borne viruses, infecting globally 390 million individuals per year. The clinical spectrum of DV infection ranges from an asymptomatic course to severe complications such as dengue hemorrhagic fever (DHF) and dengue shock syndrome (DSS), the latter because of severe plasma leakage. Given that the outcome of infection is likely determined by the kinetics of viral replication and the antiviral host cell immune response (HIR) it is of importance to understand the interaction between these two parameters. In this study, we use mathematical modeling to characterize and understand the complex interplay between intracellular DV replication and the host cells' defense mechanisms. We first measured viral RNA, viral protein, and virus particle production in Huh7 cells, which exhibit a notoriously weak intrinsic antiviral response. Based on these measurements, we developed a detailed intracellular DV replication model. We then measured replication in IFN competent A549 cells and used this data to couple the replication model with a model describing IFN activation and production of IFN stimulated genes (ISGs), as well as their interplay with DV replication. By comparing the cell line specific DV replication, we found that host factors involved in replication complex formation and virus particle production are crucial for replication efficiency. Regarding possible modes of action of the HIR, our model fits suggest that the HIR mainly affects DV RNA translation initiation, cytosolic DV RNA degradation, and naïve cell infection. We further analyzed the potential of direct acting antiviral drugs targeting different processes of the DV lifecycle in silico and found that targeting RNA synthesis and virus assembly and release are the most promising anti-DV drug targets.
Background
Approaching epidemiological data with flexible machine learning algorithms is of great value for understanding disease-specific association patterns. However, it can be difficult to correctly extract and understand those patterns due to the lack of model interpretability.
Method
We here propose a machine learning workflow that combines random forests with Bayesian network surrogate models to allow for a deeper level of interpretation of complex association patterns. We first evaluate the proposed workflow on synthetic data. We then apply it to data from the large population-based Study of Health in Pomerania (SHIP). Based on this combination, we discover and interpret broad patterns of individual serum TSH concentrations, an important marker of thyroid functionality.
Results
Evaluations using simulated data show that feature associations can be correctly recovered by combining random forests and Bayesian networks. The presented model achieves predictive accuracy that is similar to state-of-the-art models (root mean square error of 0.66, mean absolute error of 0.55, coefficient of determination of R2 = 0.15). We identify 62 relevant features from the final random forest model, ranging from general health variables over dietary and genetic factors to physiological, hematological and hemostasis parameters. The Bayesian network model is used to put these features into context and make the black-box random forest model more understandable.
Conclusion
We demonstrate that the combination of random forest and Bayesian network analysis is helpful to reveal and interpret broad association patterns of individual TSH concentrations. The discovered patterns are in line with state-of-the-art literature. They may be useful for future thyroid research and improved dosing of therapeutics.
Bioinformatics Algorithms and Predictive Models: The Grand Challenge in Computational Virology
(2021)
Never in the past has the relevance of bioinformatic and predictive tools been more central
in the field of virology as today. SARS-CoV-2 has brought along a huge health burden, but also
a deeper awareness that scientific progress can no longer be effective without extensive systems
for data storage, sharing and analysis, as well as computational tools dedicated to molecular
epidemiology, NGS data analysis, prediction of drug targets, multi-OMIC data integration, and
many other applications.
(1) Background: Global incidence of type 1 diabetes (T1D) is rising and nearly half occurred in adults. However, it is unclear if certain early-life childhood T1D risk factors were also associated with adult-onset T1D. This study aimed to assess associations between birth order, delivery mode or daycare attendance and type 1 diabetes (T1D) risk in a population-based cohort and whether these were similar for childhood- and adult-onset T1D (cut-off age 15); (2) Methods: Data were obtained from the German National Cohort (NAKO Gesundheitsstudie) baseline assessment. Self-reported diabetes was classified as T1D if: diagnosis age ≤ 40 years and has been receiving insulin treatment since less than one year after diagnosis. Cox regression was applied for T1D risk analysis; (3) Results: Analyses included 101,411 participants (100 childhood- and 271 adult-onset T1D cases). Compared to “only-children”, HRs for second- or later-born individuals were 0.70 (95% CI = 0.50–0.96) and 0.65 (95% CI = 0.45–0.94), respectively, regardless of parental diabetes, migration background, birth year and perinatal factors. In further analyses, higher birth order reduced T1D risk in children and adults born in recent decades. Caesarean section and daycare attendance showed no clear associations with T1D risk; (4) Conclusions: Birth order should be considered in both children and adults’ T1D risk assessment for early detection.
Discovering Latent Structure in High-Dimensional Healthcare Data: Toward Improved Interpretability
(2022)
This cumulative thesis describes contributions to the field of interpretable machine learning in the healthcare domain. Three research articles are presented that lie at the intersection of biomedical and machine learning research. They illustrate how incorporating latent structure can provide a valuable compression of the information hidden in complex healthcare data.
Methodologically, this thesis gives an overview of interpretable machine learning and the discovery of latent structure, including clusters, latent factors, graph structure, and hierarchical structure. Different workflows are developed and applied to two main types of complex healthcare data (cohort study data and time-resolved molecular data). The core result builds on Bayesian networks, a type of probabilistic graphical model. On the application side, we provide accurate predictive or discriminative models focusing on relevant medical conditions, related biomarkers, and their interactions.
Objective: In the rat, the pancreatic islet transplantation model is an established method to induce hepatocellular carcinomas (HCC), due to insulin-mediated metabolic and molecular alterations like increased glycolysis and de novo lipogenesis and the oncogenic AKT/mTOR pathway including upregulation of the transcription factor Carbohydrate-response element-binding protein (ChREBP). ChREBP could therefore represent an essential oncogenic co-factor during hormonally induced hepatocarcinogenesis. Methods: Pancreatic islet transplantation was implemented in diabetic C57Bl/6J (wild type, WT) and ChREBP-knockout (KO) mice for 6 and 12 months. Liver tissue was examined using histology, immunohistochemistry, electron microscopy and Western blot analysis. Finally, we performed NGS-based transcriptome analysis between WT and KO liver tumor tissues. Results: Three hepatocellular carcinomas were detectable after 6 and 12 months in diabetic transplanted WT mice, but only one in a KO mouse after 12 months. Pre-neoplastic clear cell foci (CCF) were also present in liver acini downstream of the islets in WT and KO mice. In KO tumors, glycolysis, de novo lipogenesis and AKT/mTOR signalling were strongly downregulated compared to WT lesions. Extrafocal liver tissue of diabetic, transplanted KO mice revealed less glycogen storage and proliferative activity than WT mice. From transcriptome analysis, we identified a set of transcripts pertaining to metabolic, oncogenic and immunogenic pathways that are differentially expressed between tumors of WT and KO mice. Of 315 metabolism-associated genes, we observed 199 genes that displayed upregulation in the tumor of WT mice, whereas 116 transcripts showed their downregulated expression in KO mice tumor. Conclusions: The pancreatic islet transplantation model is a suitable method to study hormonally induced hepatocarcinogenesis also in mice, allowing combination with gene knockout models. Our data indicate that deletion of ChREBP delays insulin-induced hepatocarcinogenesis, suggesting a combined oncogenic and lipogenic function of ChREBP along AKT/mTOR-mediated proliferation of hepatocytes and induction of hepatocellular carcinoma.
Identification and Regulation of Tomato Serine/Arginine-Rich Proteins Under High Temperatures
(2021)
Alternative splicing is an important mechanism for the regulation of gene expression in eukaryotes during development, cell differentiation or stress response. Alterations in the splicing profiles of genes under high temperatures that cause heat stress (HS) can impact the maintenance of cellular homeostasis and thermotolerance. Consequently, information on factors involved in HS-sensitive alternative splicing is required to formulate the principles of HS response. Serine/arginine-rich (SR) proteins have a central role in alternative splicing. We aimed for the identification and characterization of SR-coding genes in tomato (Solanum lycopersicum), a plant extensively used in HS studies. We identified 17 canonical SR and two SR-like genes. Several SR-coding genes show differential expression and altered splicing profiles in different organs as well as in response to HS. The transcriptional induction of five SR and one SR-like genes is partially dependent on the master regulator of HS response, HS transcription factor HsfA1a. Cis-elements in the promoters of these SR genes were predicted, which can be putatively recognized by HS-induced transcription factors. Further, transiently expressed SRs show reduced or steady-state protein levels in response to HS. Thus, the levels of SRs under HS are regulated by changes in transcription, alternative splicing and protein stability. We propose that the accumulation or reduction of SRs under HS can impact temperature-sensitive alternative splicing.
Since autumn 2020, rapid antigen tests (RATs) have been implemented in several countries as an important pillar of the national testing strategy to rapidly screen for infections on site during the SARS-CoV-2 pandemic. The current surge in infection rates around the globe is driven by the variant of concern (VoC) omicron (B.1.1.529). Here, we evaluated the performance of nine SARS-CoV-2 RATs in a single-centre laboratory study. We examined a total of 115 SARS-CoV-2 PCR-negative and 166 SARS-CoV-2 PCR-positive respiratory swab samples (101 omicron, 65 delta (B.1.617.2)) collected from October 2021 until January 2022 as well as cell culture-expanded clinical isolates of both VoCs. In an assessment of the analytical sensitivity in clinical specimen, the 50% limit of detection (LoD50) ranged from 1.77 × 106 to 7.03 × 107 RNA copies subjected to the RAT for omicron compared to 1.32 × 105 to 2.05 × 106 for delta. To score positive in these point-of-care tests, up to 10-fold (LoD50) or 101-fold (LoD95) higher virus loads were required for omicron- compared to delta-containing samples. The rates of true positive test results for omicron samples in the highest virus load category (Ct values < 25) ranged between 31.4 and 77.8%, while they dropped to 0–8.3% for samples with intermediate Ct values (25–30). Of note, testing of expanded virus stocks suggested a comparable RAT sensitivity of both VoCs, questioning the predictive value of this type of in vitro-studies for clinical performance. Given their importance for national test strategies in the current omicron wave, awareness must be increased for the reduced detection rate of omicron infections by RATs and a short list of suitable RATs that fulfill the minimal requirements of performance should be rapidly disclosed.
Data stewardship is an essential driver of research and clinical practice. Data collection, storage, access, sharing, and analytics are dependent on the proper and consistent use of data management principles among the investigators. Since 2016, the FAIR (findable, accessible, interoperable, and reusable) guiding principles for research data management have been resonating in scientific communities. Enabling data to be findable, accessible, interoperable, and reusable is currently believed to strengthen data sharing, reduce duplicated efforts, and move toward harmonization of data from heterogeneous unconnected data silos. FAIR initiatives and implementation trends are rising in different facets of scientific domains. It is important to understand the concepts and implementation practices of the FAIR data principles as applied to human health data by studying the flourishing initiatives and implementation lessons relevant to improved health research, particularly for data sharing during the coronavirus pandemic.
Age is the single biggest risk factor for most major human diseases. As such, understanding the intricate molecular changes that drive biological aging holds great promise in attempting to slow
the onset of systemic diseases and thereby increase the effective health-span in modern societies.
This thesis explores several computational approaches to capture and analyze the molecular biological alterations triggered by intrinsic and extrinsic aging using skin as a model tissue to deliver genes and pathways as potential targets for intervention strategies.
Publication 1 demonstrates the utility of multi-omics data integration strategies for aging research, leading to the identification of four latent aging phases in skin tissue through an integrated cluster analysis of gene expression and DNA methylation data. The four phases improved the detection of molecular aging signals and were shown to be associated with sunbathing habits of the test subjects. Deeper analysis revealed extensive non-linear alterations in various biological pathways particularly at the transition into the fourth aging phase, coinciding with menopause, with potentially wide-reaching functional implications. Publication 2 describes the development of a novel type of age clock, that provides a new level of interpretability by embedding biological pathway information in the architecture of an artificial neural network. The clock not only generates meaningful biological age estimates from gene expression data, but further allows simultaneous monitoring of the aging states of various biological processes through the activations of intermediate neurons. Analyses of the inner workings of the clock revealed a wide-spread impact of aging on the global pathway landscape. Simulation experiments using the transcriptomic clock recapitulated known functional aging gene associations and allowed deciphering of the pathways by which accelerated aging conditions such as chronic sun exposure and Hutchinson-Gilford progeria syndrome exert their effects. Publication 3 further explores the molecular alterations caused by the pro-aging effector UV irradiation in the skin. The multi-omics data analysis of repetitively irradiated skin revealed signs of the immediate acquisition of aging- and cancer-related epigenetic signatures and concurrent wide-spread transcriptional changes across various biological processes. Investigations into the varying resilience to irradiation between subjects revealed prognostic biomarker signatures capable of predicting individual UV tolerances, with accuracies far surpassing the traditional Fitzpatrick classification scheme. Further analysis of the transcripts and pathways associated with UV tolerance identified a form of melanin-independent DNA damage protection in individuals with higher innate UV resilience.
Together, the approaches and findings described in this thesis explore several new angles to advance our understanding of aging processes and external drivers of aging such as UV irradiation in the human skin and deliver new insight on target genes and pathways involved.