Refine
Document Type
- Article (10)
Language
- English (10)
Has Fulltext
- yes (10)
Is part of the Bibliography
- no (10)
Keywords
- machine learning (10) (remove)
Institute
- Institut für Community Medicine (2)
- Klinik für Psychiatrie und Psychotherapie (2)
- Friedrich-Loeffler-Institut für Medizinische Mikrobiologie (1)
- Institut für Biochemie (1)
- Institut für Biometrie und Medizinische Informatik (1)
- Institut für Mathematik und Informatik (1)
- Klinik und Poliklinik für Neurologie (1)
- Kliniken und Polikliniken für Innere Medizin (1)
Publisher
- Frontiers Media S.A. (3)
- MDPI (3)
- JMIR Publications (1)
- Oxford University Press (1)
- SAGE Publications (1)
- Wiley (1)
The benefit of regular physical activity and exercise training for the prevention of cardiovascular and metabolic diseases is undisputed. Many molecular mechanisms mediating exercise effects have been deciphered. Personalised exercise prescription can help patients in achieving their individual greatest benefit from an exercise-based cardiovascular rehabilitation programme. Yet, we still struggle to provide truly personalised exercise prescriptions to our patients. In this position paper, we address novel basic and translational research concepts that can help us understand the principles underlying the inter-individual differences in the response to exercise, and identify early on who would most likely benefit from which exercise intervention. This includes hereditary, non-hereditary and sex-specific concepts. Recent insights have helped us to take on a more holistic view, integrating exercise-mediated molecular mechanisms with those influenced by metabolism and immunity. Unfortunately, while the outline is recognisable, many details are still lacking to turn the understanding of a concept into a roadmap ready to be used in clinical routine. This position paper therefore also investigates perspectives on how the advent of ‘big data’ and the use of animal models could help unravel inter-individual responses to exercise parameters and thus influence hypothesis-building for translational research in exercise-based cardiovascular rehabilitation.
Introduction: It has been shown that Alzheimer’s disease (AD) is accompanied by marked structural brain changes that can be detected several years before clinical diagnosis via structural magnetic resonance (MR) imaging. In this study, we developed a structural MR-based biomarker for in vivo detection of AD using a supervised machine learning approach. Based on an individual’s pattern of brain atrophy a continuous AD score is assigned which measures the similarity with brain atrophy patterns seen in clinical cases of AD.
Methods: The underlying statistical model was trained with MR scans of patients and healthy controls from the Alzheimer’s Disease Neuroimaging Initiative (ADNI-1 screening). Validation was performed within ADNI-1 and in an independent patient sample from the Open Access Series of Imaging Studies (OASIS-1). In addition, our analyses included data from a large general population sample of the Study of Health in Pomerania (SHIP-Trend).
Results: Based on the proposed AD score we were able to differentiate patients from healthy controls in ADNI-1 and OASIS-1 with an accuracy of 89% (AUC = 95%) and 87% (AUC = 93%), respectively. Moreover, we found the AD score to be significantly associated with cognitive functioning as assessed by the Mini-Mental State Examination in the OASIS-1 sample after correcting for diagnosis, age, sex, age·sex, and total intracranial volume (Cohen’s f2 = 0.13). Additional analyses showed that the prediction accuracy of AD status based on both the AD score and the MMSE score is significantly higher than when using just one of them. In SHIP-Trend we found the AD score to be weakly but significantly associated with a test of verbal memory consisting of an immediate and a delayed word list recall (again after correcting for age, sex, age·sex, and total intracranial volume, Cohen’s f2 = 0.009). This association was mainly driven by the immediate recall performance.
Discussion: In summary, our proposed biomarker well differentiated between patients and healthy controls in an independent test sample. It was associated with measures of cognitive functioning both in a patient sample and a general population sample. Our approach might be useful for defining robust MR-based biomarkers for other neurodegenerative diseases, too.
(1) Background: Predicting chronic low back pain (LBP) is of clinical and economic interest as LBP leads to disabilities and health service utilization. This study aims to build a competitive and interpretable prediction model; (2) Methods: We used clinical and claims data of 3837 participants of a population-based cohort study to predict future LBP consultations (ICD-10: M40.XX-M54.XX). Best subset selection (BSS) was applied in repeated random samples of training data (75% of data); scoring rules were used to identify the best subset of predictors. The rediction accuracy of BSS was compared to randomforest and support vector machines (SVM) in the validation data (25% of data); (3) Results: The best subset comprised 16 out of 32 predictors. Previous occurrence of LBP increased the odds for future LBP consultations (odds ratio (OR) 6.91 [5.05; 9.45]), while concomitant diseases reduced the odds (1 vs. 0, OR: 0.74 [0.57; 0.98], >1 vs. 0: 0.37 [0.21; 0.67]). The area-under-curve (AUC) of BSS was acceptable (0.78 [0.74; 0.82]) and comparable with SVM (0.78 [0.74; 0.82]) and randomforest (0.79 [0.75; 0.83]); (4) Conclusions: Regarding prediction accuracy, BSS has been considered competitive with established machine-learning approaches. Nonetheless, considerable misclassification is inherent and further refinements are required to improve predictions.
Analysis of volatile organic compounds (VOCs) is a novel approach to accelerate bacterial culture diagnostics of Mycobacterium avium subsp. paratuberculosis (MAP). In the present study, cultures of fecal and tissue samples from MAP-infected and non-suspect dairy cattle and goats were explored to elucidate the effects of sample matrix and of animal species on VOC emissions during bacterial cultivation and to identify early markers for bacterial growth. The samples were processed following standard laboratory procedures, culture tubes were incubated for different time periods. Headspace volume of the tubes was sampled by needle trap-micro-extraction, and analyzed by gas chromatography-mass spectrometry. Analysis of MAP-specific VOC emissions considered potential characteristic VOC patterns. To address variation of the patterns, a flexible and robust machine learning workflow was set up, based on random forest classifiers, and comprising three steps: variable selection, parameter optimization, and classification. Only a few substances originated either from a certain matrix or could be assigned to one animal species. These additional emissions were not considered informative by the variable selection procedure. Classification accuracy of MAP-positive and negative cultures of bovine feces was 0.98 and of caprine feces 0.88, respectively. Six compounds indicating MAP presence were selected in all four settings (cattle vs. goat, feces vs. tissue): 2-Methyl-1-propanol, 2-methyl-1-butanol, 3-methyl-1-butanol, heptanal, isoprene, and 2-heptanone. Classification accuracies for MAP growth-scores ranged from 0.82 for goat tissue to 0.89 for cattle feces. Misclassification occurred predominantly between related scores. Seventeen compounds indicating MAP growth were selected in all four settings, including the 6 compounds indicating MAP presence. The concentration levels of 2,3,5-trimethylfuran, 2-pentylfuran, 1-propanol, and 1-hexanol were indicative for MAP cultures before visible growth was apparent. Thus, very accurate classification of the VOC samples was achieved and the potential of VOC analysis to detect bacterial growth before colonies become visible was confirmed. These results indicate that diagnosis of paratuberculosis can be optimized by monitoring VOC emissions of bacterial cultures. Further validation studies are needed to increase the robustness of indicative VOC patterns for early MAP growth as a pre-requisite for the development of VOC-based diagnostic analysis systems.
Background
The alignment of large numbers of protein sequences is a challenging task and its importance grows rapidly along with the size of biological datasets. State-of-the-art algorithms have a tendency to produce less accurate alignments with an increasing number of sequences. This is a fundamental problem since many downstream tasks rely on accurate alignments.
Results
We present learnMSA, a novel statistical learning approach of profile hidden Markov models (pHMMs) based on batch gradient descent. Fundamentally different from popular aligners, we fit a custom recurrent neural network architecture for (p)HMMs to potentially millions of sequences with respect to a maximum a posteriori objective and decode an alignment. We rely on automatic differentiation of the log-likelihood, and thus, our approach is different from existing HMM training algorithms like Baum–Welch. Our method does not involve progressive, regressive, or divide-and-conquer heuristics. We use uniform batch sampling to adapt to large datasets in linear time without the requirement of a tree. When tested on ultra-large protein families with up to 3.5 million sequences, learnMSA is both more accurate and faster than state-of-the-art tools. On the established benchmarks HomFam and BaliFam with smaller sequence sets, it matches state-of-the-art performance. All experiments were done on a standard workstation with a GPU.
Conclusions
Our results show that learnMSA does not share the counterintuitive drawback of many popular heuristic aligners, which can substantially lose accuracy when many additional homologs are input. LearnMSA is a future-proof framework for large alignments with many opportunities for further improvements.
Non-coding RNA (ncRNA) classes take over important housekeeping and regulatory functions and are quite heterogeneous in terms of length, sequence conservation and secondary structure. High-throughput sequencing reveals that the expressed novel ncRNAs and their classification are important to understand cell regulation and identify potential diagnostic and therapeutic biomarkers. To improve the classification of ncRNAs, we investigated different approaches of utilizing primary sequences and secondary structures as well as the late integration of both using machine learning models, including different neural network architectures. As input, we used the newest version of RNAcentral, focusing on six ncRNA classes, including lncRNA, rRNA, tRNA, miRNA, snRNA and snoRNA. The late integration of graph-encoded structural features and primary sequences in our MncR classifier achieved an overall accuracy of >97%, which could not be increased by more fine-grained subclassification. In comparison to the actual best-performing tool ncRDense, we had a minimal increase of 0.5% in all four overlapping ncRNA classes on a similar test set of sequences. In summary, MncR is not only more accurate than current ncRNA prediction tools but also allows the prediction of long ncRNA classes (lncRNAs, certain rRNAs) up to 12.000 nts and is trained on a more diverse ncRNA dataset retrieved from RNAcentral.
Guidelines and Standard Frameworks for AI in Medicine: Protocol for a Systematic Literature Review
(2023)
Background: Applications of artificial intelligence (AI) are pervasive in modern biomedical science. In fact, research results suggesting algorithms and AI models for different target diseases and conditions are continuously increasing. While this situation undoubtedly improves the outcome of AI models, health care providers are increasingly unsure which AI model to use due to multiple alternatives for a specific target and the “black box” nature of AI. Moreover, the fact that studies rarely use guidelines in developing and reporting AI models poses additional challenges in trusting and adapting models for practical implementation.
Objective: This review protocol describes the planned steps and methods for a review of the synthesized evidence regarding the quality of available guidelines and frameworks to facilitate AI applications in medicine.
Methods: We will commence a systematic literature search using medical subject headings terms for medicine, guidelines, and machine learning (ML). All available guidelines, standard frameworks, best practices, checklists, and recommendations will be included, irrespective of the study design. The search will be conducted on web-based repositories such as PubMed, Web of Science, and the EQUATOR (Enhancing the Quality and Transparency of Health Research) network. After removing duplicate results, a preliminary scan for titles will be done by 2 reviewers. After the first scan, the reviewers will rescan the selected literature for abstract review, and any incongruities about whether to include the article for full-text review or not will be resolved by the third and fourth reviewer based on the predefined criteria. A Google Scholar (Google LLC) search will also be performed to identify gray literature. The quality of identified guidelines will be evaluated using the Appraisal of Guidelines, Research, and Evaluation (AGREE II) tool. A descriptive summary and narrative synthesis will be carried out, and the details of critical appraisal and subgroup synthesis findings will be presented.
Results: The results will be reported using the PRISMA (Preferred Reporting Items for Systematic Review and Meta-Analyses) reporting guidelines. Data analysis is currently underway, and we anticipate finalizing the review by November 2023.
Conclusions: Guidelines and recommended frameworks for developing, reporting, and implementing AI studies have been developed by different experts to facilitate the reliable assessment of validity and consistent interpretation of ML models for medical applications. We postulate that a guideline supports the assessment of an ML model only if the quality and reliability of the guideline are high. Assessing the quality and aspects of available guidelines, recommendations, checklists, and frameworks—as will be done in the proposed review—will provide comprehensive insights into current gaps and help to formulate future research directions.
International Registered Report Identifier (IRRID): DERR1-10.2196/47105
Protein engineering is essential for altering the substrate scope, catalytic activity and selectivity of enzymes for applications in biocatalysis. However, traditional approaches, such as directed evolution and rational design, encounter the challenge in dealing with the experimental screening process of a large protein mutation space. Machine learning methods allow the approximation of protein fitness landscapes and the identification of catalytic patterns using limited experimental data, thus providing a new avenue to guide protein engineering campaigns. In this concept article, we review machine learning models that have been developed to assess enzyme-substrate-catalysis performance relationships aiming to improve enzymes through data-driven protein engineering. Furthermore, we prospect the future development of this field to provide additional strategies and tools for achieving desired activities and selectivities.
Manual sleep scoring for research purposes and for the diagnosis of sleep disorders is labor-intensive and often varies significantly between scorers, which has motivated many attempts to design automatic sleep stage classifiers. With the recent introduction of large, publicly available hand-scored polysomnographic data, and concomitant advances in machine learning methods to solve complex classification problems with supervised learning, the problem has received new attention, and a number of new classifiers that provide excellent accuracy. Most of these however have non-trivial barriers to use. We introduce the Greifswald Sleep Stage Classifier (GSSC), which is free, open source, and can be relatively easily installed and used on any moderately powered computer. In addition, the GSSC has been trained to perform well on a large variety of electrode set-ups, allowing high performance sleep staging with portable systems. The GSSC can also be readily integrated into brain-computer interfaces for real-time inference. These innovations were achieved while simultaneously reaching a level of accuracy equal to, or exceeding, recent state of the art classifiers and human experts, making the GSSC an excellent choice for researchers in need of reliable, automatic sleep staging.