OPUS 4 | Search

MncR: Late Integration Machine Learning Model for Classification of ncRNA Classes Using Sequence and Structural Encoding (2023)

Dunkel, Heiko ; Wehrmann, Henning ; Jensen, Lars R. ; Kuss, Andreas W. ; Simm, Stefan

Non-coding RNA (ncRNA) classes take over important housekeeping and regulatory functions and are quite heterogeneous in terms of length, sequence conservation and secondary structure. High-throughput sequencing reveals that the expressed novel ncRNAs and their classification are important to understand cell regulation and identify potential diagnostic and therapeutic biomarkers. To improve the classification of ncRNAs, we investigated different approaches of utilizing primary sequences and secondary structures as well as the late integration of both using machine learning models, including different neural network architectures. As input, we used the newest version of RNAcentral, focusing on six ncRNA classes, including lncRNA, rRNA, tRNA, miRNA, snRNA and snoRNA. The late integration of graph-encoded structural features and primary sequences in our MncR classifier achieved an overall accuracy of >97%, which could not be increased by more fine-grained subclassification. In comparison to the actual best-performing tool ncRDense, we had a minimal increase of 0.5% in all four overlapping ncRNA classes on a similar test set of sequences. In summary, MncR is not only more accurate than current ncRNA prediction tools but also allows the prediction of long ncRNA classes (lncRNAs, certain rRNAs) up to 12.000 nts and is trained on a more diverse ncRNA dataset retrieved from RNAcentral.

Opportunities of Digital Infrastructures for Disease Management—Exemplified on COVID-19-Related Change in Diagnosis Counts for Diabetes-Related Eye Diseases (2022)

Bathelt, Franziska ; Reinecke, Ines ; Peng, Yuan ; Henke, Elisa ; Weidner, Jens ; Bartos, Martin ; Gött, Robert ; Waltemath, Dagmar ; Engelmann, Katrin ; Schwarz, Peter EH ; Sedlmayr, Martin

Background: Retrospective research on real-world data provides the ability to gain evidence on specific topics especially when running across different sites in research networks. Those research networks have become increasingly relevant in recent years; not least due to the special situation caused by the COVID-19 pandemic. An important requirement for those networks is the data harmonization by ensuring the semantic interoperability. Aims: In this paper we demonstrate (1) how to facilitate digital infrastructures to run a retrospective study in a research network spread across university and non-university hospital sites; and (2) to answer a medical question on COVID-19 related change in diagnostic counts for diabetes-related eye diseases. Materials and methods: The study is retrospective and non-interventional and runs on medical case data documented in routine care at the participating sites. The technical infrastructure consists of the OMOP CDM and other OHDSI tools that is provided in a transferable format. An ETL process to transfer and harmonize the data to the OMOP CDM has been utilized. Cohort definitions for each year in observation have been created centrally and applied locally against medical case data of all participating sites and analyzed with descriptive statistics. Results: The analyses showed an expectable drop of the total number of diagnoses and the diagnoses for diabetes in general; whereas the number of diagnoses for diabetes-related eye diseases surprisingly decreased stronger compared to non-eye diseases. Differences in relative changes of diagnoses counts between sites show an urgent need to process multi-centric studies rather than single-site studies to reduce bias in the data. Conclusions: This study has demonstrated the ability to utilize an existing portable and standardized infrastructure and ETL process from a university hospital setting and transfer it to non-university sites. From a medical perspective further activity is needed to evaluate data quality of the utilized real-world data documented in routine care and to investigate its eligibility of this data for research.

Birth Order, Caesarean Section, or Daycare Attendance in Relation to Child- and Adult-Onset Type 1 Diabetes: Results from the German National Cohort (2022)

(1) Background: Global incidence of type 1 diabetes (T1D) is rising and nearly half occurred in adults. However, it is unclear if certain early-life childhood T1D risk factors were also associated with adult-onset T1D. This study aimed to assess associations between birth order, delivery mode or daycare attendance and type 1 diabetes (T1D) risk in a population-based cohort and whether these were similar for childhood- and adult-onset T1D (cut-off age 15); (2) Methods: Data were obtained from the German National Cohort (NAKO Gesundheitsstudie) baseline assessment. Self-reported diabetes was classified as T1D if: diagnosis age ≤ 40 years and has been receiving insulin treatment since less than one year after diagnosis. Cox regression was applied for T1D risk analysis; (3) Results: Analyses included 101,411 participants (100 childhood- and 271 adult-onset T1D cases). Compared to “only-children”, HRs for second- or later-born individuals were 0.70 (95% CI = 0.50–0.96) and 0.65 (95% CI = 0.45–0.94), respectively, regardless of parental diabetes, migration background, birth year and perinatal factors. In further analyses, higher birth order reduced T1D risk in children and adults born in recent decades. Caesarean section and daycare attendance showed no clear associations with T1D risk; (4) Conclusions: Birth order should be considered in both children and adults’ T1D risk assessment for early detection.

Analysis of epidemiological association patterns of serum thyrotropin by combining random forests and Bayesian networks (2022)

Becker, Ann-Kristin ; Ittermann, Till ; Dörr, Markus ; Felix, Stephan B. ; Nauck, Matthias ; Teumer, Alexander ; Völker, Uwe ; Völzke, Henry ; Kaderali, Lars ; Nath, Neetika

Background Approaching epidemiological data with flexible machine learning algorithms is of great value for understanding disease-specific association patterns. However, it can be difficult to correctly extract and understand those patterns due to the lack of model interpretability. Method We here propose a machine learning workflow that combines random forests with Bayesian network surrogate models to allow for a deeper level of interpretation of complex association patterns. We first evaluate the proposed workflow on synthetic data. We then apply it to data from the large population-based Study of Health in Pomerania (SHIP). Based on this combination, we discover and interpret broad patterns of individual serum TSH concentrations, an important marker of thyroid functionality. Results Evaluations using simulated data show that feature associations can be correctly recovered by combining random forests and Bayesian networks. The presented model achieves predictive accuracy that is similar to state-of-the-art models (root mean square error of 0.66, mean absolute error of 0.55, coefficient of determination of R2 = 0.15). We identify 62 relevant features from the final random forest model, ranging from general health variables over dietary and genetic factors to physiological, hematological and hemostasis parameters. The Bayesian network model is used to put these features into context and make the black-box random forest model more understandable. Conclusion We demonstrate that the combination of random forest and Bayesian network analysis is helpful to reveal and interpret broad association patterns of individual TSH concentrations. The discovered patterns are in line with state-of-the-art literature. They may be useful for future thyroid research and improved dosing of therapeutics.

Proactive automatised lifestyle intervention (PAL) in general hospital patients: study protocol of a single-group trial (2022)

Freyer-Adam, Jennis ; Krolo, Filipa ; Tiede, Anika ; Goeze, Christian ; Sadewasser, Kornelia ; Spielmann, Marie ; Krause, Kristian ; John, Ulrich

Introduction The co-occurrence of health risk behaviours (HRBs, ie, tobacco smoking, at-risk alcohol use, insufficient physical activity and unhealthy diet) increases the risks of cancer, other chronic diseases and mortality more than additively; and applies to more than half of adult general populations. However, preventive measures that target all four HRBs and that reach the majority of the target populations, particularly those persons most in need and hard to reach are scarce. Electronic interventions may help to efficiently address multiple HRBs in healthcare patients. The aim is to investigate the acceptance of a proactive and brief electronic multiple behaviour change intervention among general hospital patients with regard to reach, retention, equity in reach and retention, satisfaction and changes in behaviour change motivation, HRBs and health. Methods and analysis A pre–post intervention study with four time points is conducted at a general hospital in Germany. All patients, aged 18–64 years, admitted to participating wards of five medical departments (internal medicine A and B, general surgery, trauma surgery, ear, nose and throat medicine) are systematically approached and invited to participate. Based on behaviour change theory and individual HRB profile, 175 participants receive individualised and motivation-enhancing computer-generated feedback at months 0, 1 and 3. Intervention reach and retention are determined by the proportion of participants among eligible patients and of participants who continue participation, respectively. Equity in reach and retention are measured with regard to school education and other sociodemographics. To investigate satisfaction with the intervention and subsequent changes, a 6-month follow-up is conducted. Descriptive statistics, multivariate regressions and latent growth modelling are applied. Ethics and dissemination The local ethics commission and data safety appointee approved the study procedures. Results will be disseminated via publication in international scientific journals and presentations on scientific conferences. Trial registration numberNCT05365269.

Impaired detection of omicron by SARS-CoV-2 rapid antigen tests (2022)

Since autumn 2020, rapid antigen tests (RATs) have been implemented in several countries as an important pillar of the national testing strategy to rapidly screen for infections on site during the SARS-CoV-2 pandemic. The current surge in infection rates around the globe is driven by the variant of concern (VoC) omicron (B.1.1.529). Here, we evaluated the performance of nine SARS-CoV-2 RATs in a single-centre laboratory study. We examined a total of 115 SARS-CoV-2 PCR-negative and 166 SARS-CoV-2 PCR-positive respiratory swab samples (101 omicron, 65 delta (B.1.617.2)) collected from October 2021 until January 2022 as well as cell culture-expanded clinical isolates of both VoCs. In an assessment of the analytical sensitivity in clinical specimen, the 50% limit of detection (LoD50) ranged from 1.77 × 106 to 7.03 × 107 RNA copies subjected to the RAT for omicron compared to 1.32 × 105 to 2.05 × 106 for delta. To score positive in these point-of-care tests, up to 10-fold (LoD50) or 101-fold (LoD95) higher virus loads were required for omicron- compared to delta-containing samples. The rates of true positive test results for omicron samples in the highest virus load category (Ct values < 25) ranged between 31.4 and 77.8%, while they dropped to 0–8.3% for samples with intermediate Ct values (25–30). Of note, testing of expanded virus stocks suggested a comparable RAT sensitivity of both VoCs, questioning the predictive value of this type of in vitro-studies for clinical performance. Given their importance for national test strategies in the current omicron wave, awareness must be increased for the reduced detection rate of omicron infections by RATs and a short list of suitable RATs that fulfill the minimal requirements of performance should be rapidly disclosed.

Initiatives, Concepts, and Implementation Practices of FAIR (Findable, Accessible, Interoperable, and Reusable) Data Principles in Health Data Stewardship Practice: Protocol for a Scoping Review (2021)

Inau, Esthe Thea ; Sack, Jean ; Waltemath, Dagmar ; Zeleke, Atinkut Alamirrew

Data stewardship is an essential driver of research and clinical practice. Data collection, storage, access, sharing, and analytics are dependent on the proper and consistent use of data management principles among the investigators. Since 2016, the FAIR (findable, accessible, interoperable, and reusable) guiding principles for research data management have been resonating in scientific communities. Enabling data to be findable, accessible, interoperable, and reusable is currently believed to strengthen data sharing, reduce duplicated efforts, and move toward harmonization of data from heterogeneous unconnected data silos. FAIR initiatives and implementation trends are rising in different facets of scientific domains. It is important to understand the concepts and implementation practices of the FAIR data principles as applied to human health data by studying the flourishing initiatives and implementation lessons relevant to improved health research, particularly for data sharing during the coronavirus pandemic.

Hormonally Induced Hepatocellular Carcinoma in Diabetic Wild Type and Carbohydrate Responsive Element Binding Protein Knockout Mice (2021)

Nuernberger, Vincent ; Mortoga, Sharif ; Metzendorf, Christoph ; Burkert, Christian ; Ehricke, Katrina ; Knuth, Elisa ; Zimmer, Jenny ; Singer, Stephan ; Nath, Neetika ; Karim, Majedul ; Yasser, Mohd ; Calvisi, Diego F. ; Dombrowski, Frank ; Ribback, Silvia

Objective: In the rat, the pancreatic islet transplantation model is an established method to induce hepatocellular carcinomas (HCC), due to insulin-mediated metabolic and molecular alterations like increased glycolysis and de novo lipogenesis and the oncogenic AKT/mTOR pathway including upregulation of the transcription factor Carbohydrate-response element-binding protein (ChREBP). ChREBP could therefore represent an essential oncogenic co-factor during hormonally induced hepatocarcinogenesis. Methods: Pancreatic islet transplantation was implemented in diabetic C57Bl/6J (wild type, WT) and ChREBP-knockout (KO) mice for 6 and 12 months. Liver tissue was examined using histology, immunohistochemistry, electron microscopy and Western blot analysis. Finally, we performed NGS-based transcriptome analysis between WT and KO liver tumor tissues. Results: Three hepatocellular carcinomas were detectable after 6 and 12 months in diabetic transplanted WT mice, but only one in a KO mouse after 12 months. Pre-neoplastic clear cell foci (CCF) were also present in liver acini downstream of the islets in WT and KO mice. In KO tumors, glycolysis, de novo lipogenesis and AKT/mTOR signalling were strongly downregulated compared to WT lesions. Extrafocal liver tissue of diabetic, transplanted KO mice revealed less glycogen storage and proliferative activity than WT mice. From transcriptome analysis, we identified a set of transcripts pertaining to metabolic, oncogenic and immunogenic pathways that are differentially expressed between tumors of WT and KO mice. Of 315 metabolism-associated genes, we observed 199 genes that displayed upregulation in the tumor of WT mice, whereas 116 transcripts showed their downregulated expression in KO mice tumor. Conclusions: The pancreatic islet transplantation model is a suitable method to study hormonally induced hepatocarcinogenesis also in mice, allowing combination with gene knockout models. Our data indicate that deletion of ChREBP delays insulin-induced hepatocarcinogenesis, suggesting a combined oncogenic and lipogenic function of ChREBP along AKT/mTOR-mediated proliferation of hepatocytes and induction of hepatocellular carcinoma.

Identification and Regulation of Tomato Serine/Arginine-Rich Proteins Under High Temperatures (2021)

Rosenkranz, Remus R. E. ; Bachiri, Samia ; Vraggalas, Stavros ; Keller, Mario ; Simm, Stefan ; Schleiff, Enrico ; Fragkostefanakis, Sotirios

Alternative splicing is an important mechanism for the regulation of gene expression in eukaryotes during development, cell differentiation or stress response. Alterations in the splicing profiles of genes under high temperatures that cause heat stress (HS) can impact the maintenance of cellular homeostasis and thermotolerance. Consequently, information on factors involved in HS-sensitive alternative splicing is required to formulate the principles of HS response. Serine/arginine-rich (SR) proteins have a central role in alternative splicing. We aimed for the identification and characterization of SR-coding genes in tomato (Solanum lycopersicum), a plant extensively used in HS studies. We identified 17 canonical SR and two SR-like genes. Several SR-coding genes show differential expression and altered splicing profiles in different organs as well as in response to HS. The transcriptional induction of five SR and one SR-like genes is partially dependent on the master regulator of HS response, HS transcription factor HsfA1a. Cis-elements in the promoters of these SR genes were predicted, which can be putatively recognized by HS-induced transcription factors. Further, transiently expressed SRs show reduced or steady-state protein levels in response to HS. Thus, the levels of SRs under HS are regulated by changes in transcription, alternative splicing and protein stability. We propose that the accumulation or reduction of SRs under HS can impact temperature-sensitive alternative splicing.

Bioinformatics Algorithms and Predictive Models: The Grand Challenge in Computational Virology (2021)

Sironi, Manuela ; Kaderali, Lars

Never in the past has the relevance of bioinformatic and predictive tools been more central in the field of virology as today. SARS-CoV-2 has brought along a huge health burden, but also a deeper awareness that scientific progress can no longer be effective without extensive systems for data storage, sharing and analysis, as well as computational tools dedicated to molecular epidemiology, NGS data analysis, prediction of drug targets, multi-OMIC data integration, and many other applications.

Open Access

Article

Refine

Author

Year of publication

Document Type

Language

Has Fulltext

Is part of the Bibliography

Keywords

Institute

Publisher

13 search hits