Volltext-Downloads (blau) und Frontdoor-Views (grau)

Bitte verwenden Sie diesen Link, wenn Sie dieses Dokument zitieren oder verlinken wollen: https://nbn-resolving.org/urn:nbn:de:gbv:9-opus-133783

Combining hidden Markov models and deep learning for biological sequences

  • This cumulative dissertation presents software which enhances state-of-the-art accuracy for two profound bioinformatics problems. The tools combine established mathematical frameworks for biological sequence Analysis and modern deep learning techniques. The cornerstone of this work is a novel hidden Markov model layer, which implements parallelized algorithms for supervised and unsupervised training and inference. It can be combined with other layers and efficiently runs on a GPU. This thesis contains three research articles in which deep learning is combined with hidden Markov models - a synergy that is driven by pattern detection in large amounts of biological sequence data, but also the incorporation of strong inductive biases. We pioneered learnMSA, a tool that constructs multiple sequence alignments for families of protein sequences. It is the first aligner for large numbers of sequences to integrate a deep learning model, surpassing the state-of-the-art accuracy of amino acid-based aligners. It offers runtime and accuracy advantages over long-established software when scaling up the number of input proteins and when aligning distantly related sequences. Article I describes the original release of learnMSA and introduces its unique methodology, while Article II features the addition of parameter-rich, pre-trained protein language models, which incorporate rich prior knowledge about proteins. Furthermore, we developed Tiberius, a deep learning model for ab initio gene prediction in eukaryotic species. The tool is fast and, compared to other ab initio predictors, unprecedentedly accurate, as shown in Article III. Tiberius, receiving only a genome as input, matches the accuracy of Pipelines that additionally require complex extrinsic evidence.

Download full text files

Export metadata

Additional Services

Search Google Scholar
Metadaten
Author: Felix BeckerORCiD
URN:urn:nbn:de:gbv:9-opus-133783
Title Additional (German):Verschmelzen von versteckten Markov Modellen und Deep Learning für biologische Sequenzen
Referee:Prof. Dr. Mario Stanke, Prof. Dr. Alexander Schliep, Dr. Johannes Söding
Advisor:Prof. Dr. Mario Stanke, Prof. Dr. Joscha Diehl
Document Type:Doctoral Thesis
Language:English
Year of Completion:2025
Date of first Publication:2025/06/23
Granting Institution:Universität Greifswald, Mathematisch-Naturwissenschaftliche Fakultät
Date of final exam:2025/06/05
Release Date:2025/06/23
Tag:gene prediction; multiple sequence alignment
GND Keyword:Deep Learning; Hidden-Markov-Modell; Bioinformatik
Page Number:121
Faculties:Mathematisch-Naturwissenschaftliche Fakultät / Institut für Mathematik und Informatik
DDC class:500 Naturwissenschaften und Mathematik / 510 Mathematik