Refine
Year of publication
Document Type
- Doctoral Thesis (36)
- Article (29)
- Final Thesis (1)
Language
- English (66) (remove)
Has Fulltext
- yes (66)
Is part of the Bibliography
- no (66)
Keywords
- - (22)
- Statistik (4)
- fractal (4)
- permutation entropy (3)
- Algebra (2)
- Bioinformatik (2)
- Bundle Gerbes (2)
- Fraktal (2)
- Funktionalanalysis (2)
- Hierarchie (2)
Institute
- Institut für Mathematik und Informatik (66) (remove)
Publisher
- MDPI (14)
- Frontiers Media S.A. (6)
- Springer Nature (3)
- BioMed Central (BMC) (2)
- Oxford University Press (1)
- Wiley (1)
Mathematical phylogenetics provides the theoretical framework for the reconstruction and analysis of phylogenetic trees and networks. The underlying theory is based on various mathematical disciplines, ranging from graph theory to probability theory.
In this thesis, we take a mostly combinatorial and graph-theoretical position and study different problems concerning phylogenetic trees and networks.
We start by considering phylogenetic diversity indices that rank species for conservation. Two such indices for rooted trees are the Fair Proportion index and the Equal Splits index, and we analyze how different they can be from each other and under which circumstances they coincide. Moreover, we define and investigate analogues of these indices for unrooted trees.
Subsequently, we study the Shapley value of unrooted trees, another popular phylogenetic diversity index. We show that it may fail as a prioritization criterion in biodiversity conservation and is outcompeted by an existing greedy approach. Afterwards, we leave the biodiversity setting and consider the Shapley value as a tree reconstruction tool. Here, we show that non-isomorphic trees may have permutation-equivalent Shapley transformation matrices and identical Shapley values, implying that the Shapley value cannot reliably be employed in tree reconstruction.
In addition to phylogenetic diversity indices, another class of indices frequently discussed in mathematical phylogenetics, is the class of balance indices. In this thesis, we study one of the oldest and most popular of them, namely the Colless index for rooted binary trees. We focus on its extremal values and analyze both its maximum and minimum values as well as the trees that achieve them.
Having analyzed various questions regarding phylogenetic trees, we finally turn to phylogenetic networks. We focus on a certain class of phylogenetic networks, namely tree-based networks, and consider this class both in a rooted and in an unrooted setting.
First, we prove the existence of a rooted non-binary universal tree-based network with n leaves for all positive integers n, that is, we show that there exists a rooted non-binary tree-based network with $n$ leaves that has every non-binary phylogenetic tree on the same leaf set as a base tree.
Finally, we study unrooted tree-based networks and introduce a class of networks that are necessarily tree-based, namely edge-based networks. We show that edge-based networks are closely related to a family of graphs in classical graph theory, so-called generalized series-parallel graphs, and explore this relationship in full detail.
In summary, we add new insights into existing concepts in mathematical phylogenetics, answer open questions in the literature, and introduce new concepts and approaches. In doing so, we make a small but relevant contribution to current research in mathematical phylogenetics.
The innate immune system relies on families of pattern recognition receptors (PRRs)
that detect distinct conserved molecular motifs from microbes to initiate antimicrobial responses.
Activation of PRRs triggers a series of signaling cascades, leading to the release of pro-inflammatory
cytokines, chemokines and antimicrobials, thereby contributing to the early host defense against
microbes and regulating adaptive immunity. Additionally, PRRs can detect perturbation of cellular
homeostasis caused by pathogens and fine-tune the immune responses. Among PRRs, nucleotide
binding oligomerization domain (NOD)-like receptors (NLRs) have attracted particular interest in the
context of cellular stress-induced inflammation during infection. Recently, mechanistic insights into
the monitoring of cellular homeostasis perturbation by NLRs have been provided. We summarize
the current knowledge about the disruption of cellular homeostasis by pathogens and focus on NLRs
as innate immune sensors for its detection. We highlight the mechanisms employed by various
pathogens to elicit cytoskeleton disruption, organelle stress as well as protein translation block, point
out exemplary NLRs that guard cellular homeostasis during infection and introduce the concept of
stress-associated molecular patterns (SAMPs). We postulate that integration of information about
microbial patterns, danger signals, and SAMPs enables the innate immune system with adequate
plasticity and precision in elaborating responses to microbes of variable virulence.
Neutrophils in Tuberculosis: Cell Biology, Cellular Networking and Multitasking in Host Defense
(2021)
Neutrophils readily infiltrate infection foci, phagocytose and usually destroy microbes. In
tuberculosis (TB), a chronic pulmonary infection caused by Mycobacterium tuberculosis (Mtb),
neutrophils harbor bacilli, are abundant in tissue lesions, and their abundances in blood correlate
with poor disease outcomes in patients. The biology of these innate immune cells in TB is complex.
Neutrophils have been assigned host-beneficial as well as deleterious roles. The short lifespan of
neutrophils purified from blood poses challenges to cell biology studies, leaving intracellular
biological processes and the precise consequences of Mtb–neutrophil interactions ill-defined. The
phenotypic heterogeneity of neutrophils, and their propensity to engage in cellular cross-talk and
to exert various functions during homeostasis and disease, have recently been reported, and such
observations are newly emerging in TB. Here, we review the interactions of neutrophils with Mtb,
including subcellular events and cell fate upon infection, and summarize the cross-talks between
neutrophils and lung-residing and -recruited cells. We highlight the roles of neutrophils in TB
pathophysiology, discussing recent findings from distinct models of pulmonary TB, and emphasize
technical advances that could facilitate the discovery of novel neutrophil-related disease
mechanisms and enrich our knowledge of TB pathogenesis
We introduce a multi-step machine learning approach and use it to classify data from EEG-based brain computer interfaces. This approach works very well for high-dimensional EEG data. First all features are divided into subgroups and linear discriminant analysis is used to obtain a score for each subgroup. Then it is applied to subgroups of the resulting scores. This procedure is iterated until there is only one score remaining and this one is used for classification. In this way we avoid estimation of the high-dimensional covariance matrix of all features. We investigate the classifification performance with special attention to the small sample size case. For the normal model, we study the asymptotic error rate when dimension p and sample size n tend to infinity. This indicates how to defifine the sizes of subgroups at each step. In addition we present a theoretical error bound for the spatio-temporal normal model with separable covariance matrix, which results in a recommendation on how subgroups should be formed for this kind of data. Finally some techniques, for example wavelets and independent component analysis, are used to extract features of some kind of EEG-based brain computer interface data.
Spatial variation in survival has individual fitness consequences and influences population dynamics. It proximately and ultimately impacts space use including migratory connectivity. Therefore, knowing spatial patterns of survival is crucial to understand demography of migrating animals. Extracting information on survival and space use from observation data, in particular dead recovery data, requires explicitly identifying the observation process. The main aim of this work is to establish a modeling framework which allows estimating spatial variation in survival, migratory connectivity and observation probability using dead recovery data. We provide some biological background on survival and migration and a short methodological overview of how similar situations are modeled in literature.
Afterwards, we provide REML-like estimators for discrete space and show identifiability of all three parameters using the characteristics of the multinomial distribution. Moreover, we formulate a model in continuous space using mixed binomial point processes. The continuous model assumes a constant recovery probability over space. To drop this strict assumption, we develop an optimization procedure combining the discrete and continuous space model. Therefore, we use penalized M-splines. In simulation studies we demonstrate the performance of the estimators for all three model approaches. Furthermore, we apply the models to real-world data sets of European robins \textit{Erithacus rubecula} and ospreys \textit{Pandion haliaetus}.
We discuss how this study can be embedded in the framework of animal movement and the capture mark recapture/recovery methodology. It can be seen as a contribution and an extension to distance sampling, local stationary everyday movement and dispersal. We emphasize the importance of having a mathematically clearly formulated modeling framework for applied methods. Moreover, we comment on model assumptions and their limits. In the future, it would be appealing to extend this framework to the full annual cycle and carry-over effects.
Gram-negative bacteria secrete lipopolysaccharides (LPS), leading to a host immune
response of proinflammatory cytokine secretion. Those proinflammatory cytokines are
TNF-α and IFN-γ, which induce the production of indoleamine 2,3-dioxygenase (IDO). IDO production is increased during severe sepsis, and septic shock. High IDO
levels are associated with increased mortality. This enzyme catalyzes the degradation of tryptophan (TRP) to kynurenine (KYN) along the kynurenine pathway (KP).
KYN is further degraded to kynurenic acid (KYNA). Increased IDO levels accompany
with increased levels of KYNA, which is associated with immunoparalysis.
Due to its central role, the KP is a potential target of therapeutic intervention.
The degradation of TRP to KYN by IDO was intervened by 1-Methyltryptophan (1-
MT), which is assumed to inhibit IDO. By administering 1-MT, the survival of
1-MT-treated mice suffering from sepsis increased compared to mice not treated with
1-MT. The levels of downstream metabolites such as KYN and KYNA were
expected to be decreased. Surprisingly, in healthy mice and pigs, an increase in KYNA
after 1-MT administration was reported. Those unexpected metabolite alterations after 1-MT administration, and the mode of action, were not the focus of recent
research. Hence, there is no explanation for KYNA increase, while KYN did not change.
This thesis aims to postulate a possible degradation pathway of 1-MT along the KP
with the help of ordinary differential equation (ODE) systems.
Moreover, the developed ODE models were used to determine the ability of 1-MT to
inhibit IDO in vivo. Therefore, a multiplicity of ODE models were developed, including
a model of the KP, an extension by lipopolysaccharide (LPS) administration, and 1-MT
administration.
Moreover, seven ODE models were developed, all considering possible degradation pathways of 1-MT. The most likely degradation pathway was combined with the ODE model
of LPS administration, including the inhibitory effects of 1-MT.
Those models consist of several dependent equations describing the dynamics of the KP.
For each component of the KP, one equation describes the alterations over time. Equations for TRP, KYN, KYNA, and quinolinic acid (QUIN) were developed.
Moreover, the alterations of serotonin (SER) were also included. All together belong
to the TRP metabolism. They include the degradation of TRP to SER and to KYN,
which is further degraded to KYNA and QUIN. Every degradation is catalyzed by an enzyme. Therefore, Michaelis-Menten (MM) equations were used employing the substrate
constant Km and the maximal degradation velocity Vmax. To reduce the complexity of
parameter calculation, Km values of the different enzymes were fixed to literature values.
The remaining parameters of the equations were determined so that the trajectories of
the calculated metabolite levels correspond to data. The parameters of different models were determined. To propose a degradation pathway of 1-MT leading to increased
KYNA levels, seven models were developed and compared. The most likely model was
extended to test whether the inhibitory effects of 1-MT on IDO can be determined.
Three different approaches determined the ODE model parameters of the different hypothesis of 1-MT degradation. In the first approach, ODE model parameters were fixed
to values fitted to an independent data set. In the second approach, parameters were
fitted to a subset of the data set, which was used for simulations of the different hypotheses. The third approach calculated ODE model parameters 100 times without
fixed parameters. The parameter set ending up in trajectories of the TRP metabolites,
which have the smallest distance to the data, was assumed to be the most likely. The
ODE model parameters were fitted to data measured in pigs. Two different
experimental models delivered data used in this thesis. The first experimental model
activates IDO by LPS administration in pigs. The second one combines the IDO
activation by LPS with the administration of 1-MT in pigs.
The most likely hypothesis, according to approach 1 was the degradation of 1-MT to
KYNA and TRP. For the second data set the most likely one was the direct degradation of 1-MT to KYNA. With approach 2 the most likely degradation pathways were
the combination of all degradation pathways and the degradation of 1-MT to TRP and
TRP to KYNA. With approach 3 the most likely way of KYNA increase was given by
the direct degradation of 1-MT to KYNA. In summary, the three approaches revealed
hypothesis 2, the direct degradation of 1-MT to KYNA most frequently. A cell-free
assay validated this result. This experiment combined 1-MT or TRP with or without
the enzyme kynurenine aminotransferase (KAT). KAT was already shown to degrade
TRP directly to KYNA. The levels of TRP, KYN and KYNA were measured. The
highest KYNA levels were yielded with an assay adding KAT to 1-MT, corresponding
to hypothesis 2. The models describing the inhibitory effects of 1-MT revealed that
the model without inhibitory effects of 1-MT on IDO was more likely for all three approaches.
The correctness of hypothesis 2 has to be confirmed by further in vitro experiments. It
also has to be investigated which reactions promote the degradation of 1-MT to KYNA.
The missing inhibitory properties of 1-MT on IDO, determined by the in silico ODE
models, align with previous research. It was shown that the saturation of 1-MT was too
low, e.g. in pigs, to inhibit IDO efficiently.
In this study, the first possible degradation pathway of 1-MT along the KP is proposed.
The reliability of the results depends on the quality of the experimental data, and the
season, when data were measured. Moreover, the results vary between the different
approaches of parameter fitting. Different approaches of parameter fitting have to be
included in the analysis to get more evidence for the correctness of the results.
Background
The alignment of large numbers of protein sequences is a challenging task and its importance grows rapidly along with the size of biological datasets. State-of-the-art algorithms have a tendency to produce less accurate alignments with an increasing number of sequences. This is a fundamental problem since many downstream tasks rely on accurate alignments.
Results
We present learnMSA, a novel statistical learning approach of profile hidden Markov models (pHMMs) based on batch gradient descent. Fundamentally different from popular aligners, we fit a custom recurrent neural network architecture for (p)HMMs to potentially millions of sequences with respect to a maximum a posteriori objective and decode an alignment. We rely on automatic differentiation of the log-likelihood, and thus, our approach is different from existing HMM training algorithms like Baum–Welch. Our method does not involve progressive, regressive, or divide-and-conquer heuristics. We use uniform batch sampling to adapt to large datasets in linear time without the requirement of a tree. When tested on ultra-large protein families with up to 3.5 million sequences, learnMSA is both more accurate and faster than state-of-the-art tools. On the established benchmarks HomFam and BaliFam with smaller sequence sets, it matches state-of-the-art performance. All experiments were done on a standard workstation with a GPU.
Conclusions
Our results show that learnMSA does not share the counterintuitive drawback of many popular heuristic aligners, which can substantially lose accuracy when many additional homologs are input. LearnMSA is a future-proof framework for large alignments with many opportunities for further improvements.
The history of Mathematics has been lead in part by the desire for generalization: once an object was given and had been understood, there was the desire to find a more general version of it, to fit it into a broader framework. Noncommutative Mathematics fits into this description, as its interests are objects analoguous to vector spaces, or probability spaces, etc., but without the commonsense interpretation that those latter objects possess. Indeed, a space can be described by its points, but also and equivalently, by the set of functions on this space. This set is actually a commutative algebra, sometimes equipped with some more structure: *-algebra, C*-algebra, von Neumann algebras, Hopf algebras, etc. The idea that lies at the basis of noncommutative Mathematics is to replace such algebras by algebras that are not necessarily commutative any more and to interpret them as "algebras of functions on noncommutative spaces". Of course, these spaces do not exist independently from their defining algebras, but facts show that a lot of the results holding in (classical) probability or (classical) group theory can be extended to their noncommutative counterparts, or find therein powerful analogues. The extensions of group theory into the realm of noncommutative Mathematics has long been studied and has yielded the various quantum groups. The easiest version of them, the compact quantum groups, consist of C*-algebras equipped with a *-homomorphism &Delta with values in the tensor product of the algebra with itself and verifying some coassociativity condition. It is also required that the compact quantum group verifies what is known as quantum cancellation property. It can be shown that (classical) compact groups are indeed a particular case of compact quantum groups. The area of compact quantum groups, and of quantum groups at large, is a fruitful area of research. Nevertheless, another generalization of group theory could be envisioned, namely by taking a comultiplication &Delta taking values not in the tensor product but rather in the free product (in the category of unital *-algebras). This leads to the theory of dual groups in the sense of Voiculescu, also called H-algebras by Zhang. These objects have not been so thoroughly studied as their quantum counterparts. It is true that they are not so flexible and that we therefore do not know many examples of them and showing that some relations cannot exist in the dual group case because they do not pass the coproduct. Nevertheless, I have been interested during a great part of my PhD work by these objects and I have made some progress towards their understanding, especially regarding quantum Lévy processes defined on them and Haar states.
Interactive Visualization for the Exploration of Aligned Biological Networks and Their Evolution
(2011)
Network Visualization is a widely used tool in biology. The biological networks, as protein-interaction-networks are important for many aspects in life. Today biologists use the comparison of networks of different species (network alignment) to understand the networks in more detail and to understand the underlying evolution. The goal of this work is to develop a visualization software that is able to visualize network alignments and also their evolution. The presented software is the first software for such visualization tasks. It uses 3D graphics and also animations for the dynamic visualization of evolution. This work consists of a review of the Related Work, a chapter about our Graph-based Approach for Interactive Visualization of Evolving Network Alignments, an explanation of the Graph Layout Algorithm and some hints for the Software System.
High-throughput expression data have become the norm in molecular biology research. However, the analysis of expression data is statistically and computationally challenging and has not kept up with their generation. This has resulted in large amounts of unexplored data in public repositories. After pre-processing and quality control, the typical gene expression analysis workflow follows two main steps. First, the complexity of the data is reduced by removing the genes that are redundant or irrelevant for the biological question that motivated the experiment, using a feature selection method. Second, relevant genes are investigated to extract biological information that could aid in the interpretation of the results. Different methods, such as functional annotation, clustering, network analysis, and/or combinations thereof are useful for the latter purpose. Here, I investigated and presented solutions to three problems encountered in the expression data analysis workflow. First, I worked on reducing complexity of high-throughput expression data by selecting relevant genes in the context of the sample classification problem. The sample classification problem aims to assign unknown samples into one of the known classes, such as healthy and diseased. For this purpose, I developed the relative signal-to-noise ratio (rSNR), a novel feature selection method which was shown to perform significantly better than other methods with similar objectives. Second, to better understand complex phenotypes using high-throughput expression data, I developed a pipeline to identify the underlying biological units, as well as their interactions. These biological units were assumed to be represented by groups of genes working in synchronization to perform a given function or participate in common biological processes or pathways. Thus, to identify biological units, those genes that had been identified as relevant to the phenotype under consideration through feature selection methods were clustered based on both their functional annotations and expression profiles. Relationships between the associated biological functions, processes, and/or pathways were investigated by means of a co-expression network. The developed pipeline provides a new perspective to the analysis of high-throughput expression data by investigating interactions between biological units. Finally, I contributed to a project where a network describing pluripotency in mouse was used to infer the corresponding network in human. Biological networks are context-specific. Combining network information with high-throughput expression data can explain the control mechanisms underlying changes and maintenance of complex phenotypes. The human network was constructed on the basis of orthology between mouse and human genes and proteins. It was validated with available data in the literature. The methods and strategies proposed here were mainly trained and tested on microarray expression data. However, they can be easily adapted to next-generation sequencing and proteomics data.