Doctoral Thesis
Refine
Year of publication
Document Type
- Doctoral Thesis (36) (remove)
Language
- English (36) (remove)
Has Fulltext
- yes (36)
Is part of the Bibliography
- no (36)
Keywords
- Statistik (4)
- Algebra (2)
- Bioinformatik (2)
- Bundle Gerbes (2)
- Fraktal (2)
- Funktionalanalysis (2)
- Hierarchie (2)
- Kategorientheorie (2)
- Mathematical Physics (2)
- Selbstähnlichkeit (2)
Institute
- Institut für Mathematik und Informatik (36) (remove)
Convolutional Neural Network-based image classification models are the current state-of-the-art for solving image classification problems. However, obtaining and using such a model to solve a specific image classification problem presents several challenges in practice. To train the model, we need to find good hyperparameter values for training, such as initial model weights or learning rate. However, finding these values is usually a non-trivial process. Another problem is that the training data used for model training is often class-imbalanced in practice. This usually has a negative impact on model training. However, not only is it challenging to obtain a Convolutional Neural Network-based model, but also to use the model after model training. After training, the model might be applied to images that were drawn from a data distribution that is different from the data distribution the training data was drawn from. These images are typically referred to as out-of-distribution samples. Unfortunately, Convolutional Neural Network-based image classification models typically fail to predict the correct class for out-of-distribution samples without warning, which is problematic when such a model is used for safety-critical applications. In my work, I examined whether information from the layers of a Convolutional Neural Network-based image classification model (pixels and activations) can be used to address all of these issues. As a result, I suggest a method for initializing the model weights based on image patches, a method for balancing a class-imbalanced dataset based on layer activations, and a method for detecting out-of-distribution samples, which is also based on layer activations. To test the proposed methods, I conducted extensive experiments using different datasets. My experiments showed that layer information (pixels and activations) can indeed be used to address all of the aforementioned challenges when training and using Convolutional Neural Network-based image classification models.
Statistical Methods and Applications for Biomarker Discovery Using Large Scale Omics Data Set
(2023)
This thesis focuses on identifying genetic factors associated with human kidney disease progression, with three articles presented. Article I describes the identification of loci associated with UACR through trans-ethnic, European-ancestry-specific, and diabetes-specific meta-analyses. An approximate conditional analysis was performed to identify additional independent UACR-associated variants within identified loci. The genome-wide significance level of 𝛼=5×10−8 is used for both primary GWAS association and conditional analyses. However, unlike primary association tests, conditional tests are limited to specific genomic regions surrounding primary GWAS index signals rather than being applied on a genome-wide scale.
In article II, we hypothesized that the application of 𝛼=5×10−8 is overly strict and results in a loss of power. To address this issue, we developed a quasi-adaptive method within a weighted hypothesis testing framework. This method exploits the type I error (𝛼=0.05) by providing less conservative SNP specific 𝛼-thresholds to select secondary signals in conditional analysis. Through simulation studies and power analyses, we demonstrate that the quasi-adaptive method outperforms the established criterion 𝛼=5×10−8 as well as the equal weighting scheme (the Sidak-correction). Furthermore, our method performs well when applied to real datasets and can potentially reveal previously undetected secondary signals in existing data.
In article III, we extended our quasi-adaptive method to identify plausible multiple independent signals at each locus (a secondary signal, a tertiary signal, a signal of 4th, and beyond) and applied it to the publically available GWAS meta-analysis to detect additional multiple independent eGFR-associated signals. The improved quasi-adaptive method successfully identified additional novel replicated independent SNPs that would have gone undetected by applying too conservative genome-wide significance level of 𝛼=5× 10−8. Colocalization analysis based on the novel independent signals identified potentially functional genes across the kidney and other tissues.
Overall, these articles contribute to the understanding of genetic factors associated with human kidney disease progression and provide novel methods for identifying secondary and multiple independent signals in conditional GWAS analyses.
Gram-negative bacteria secrete lipopolysaccharides (LPS), leading to a host immune
response of proinflammatory cytokine secretion. Those proinflammatory cytokines are
TNF-α and IFN-γ, which induce the production of indoleamine 2,3-dioxygenase (IDO). IDO production is increased during severe sepsis, and septic shock. High IDO
levels are associated with increased mortality. This enzyme catalyzes the degradation of tryptophan (TRP) to kynurenine (KYN) along the kynurenine pathway (KP).
KYN is further degraded to kynurenic acid (KYNA). Increased IDO levels accompany
with increased levels of KYNA, which is associated with immunoparalysis.
Due to its central role, the KP is a potential target of therapeutic intervention.
The degradation of TRP to KYN by IDO was intervened by 1-Methyltryptophan (1-
MT), which is assumed to inhibit IDO. By administering 1-MT, the survival of
1-MT-treated mice suffering from sepsis increased compared to mice not treated with
1-MT. The levels of downstream metabolites such as KYN and KYNA were
expected to be decreased. Surprisingly, in healthy mice and pigs, an increase in KYNA
after 1-MT administration was reported. Those unexpected metabolite alterations after 1-MT administration, and the mode of action, were not the focus of recent
research. Hence, there is no explanation for KYNA increase, while KYN did not change.
This thesis aims to postulate a possible degradation pathway of 1-MT along the KP
with the help of ordinary differential equation (ODE) systems.
Moreover, the developed ODE models were used to determine the ability of 1-MT to
inhibit IDO in vivo. Therefore, a multiplicity of ODE models were developed, including
a model of the KP, an extension by lipopolysaccharide (LPS) administration, and 1-MT
administration.
Moreover, seven ODE models were developed, all considering possible degradation pathways of 1-MT. The most likely degradation pathway was combined with the ODE model
of LPS administration, including the inhibitory effects of 1-MT.
Those models consist of several dependent equations describing the dynamics of the KP.
For each component of the KP, one equation describes the alterations over time. Equations for TRP, KYN, KYNA, and quinolinic acid (QUIN) were developed.
Moreover, the alterations of serotonin (SER) were also included. All together belong
to the TRP metabolism. They include the degradation of TRP to SER and to KYN,
which is further degraded to KYNA and QUIN. Every degradation is catalyzed by an enzyme. Therefore, Michaelis-Menten (MM) equations were used employing the substrate
constant Km and the maximal degradation velocity Vmax. To reduce the complexity of
parameter calculation, Km values of the different enzymes were fixed to literature values.
The remaining parameters of the equations were determined so that the trajectories of
the calculated metabolite levels correspond to data. The parameters of different models were determined. To propose a degradation pathway of 1-MT leading to increased
KYNA levels, seven models were developed and compared. The most likely model was
extended to test whether the inhibitory effects of 1-MT on IDO can be determined.
Three different approaches determined the ODE model parameters of the different hypothesis of 1-MT degradation. In the first approach, ODE model parameters were fixed
to values fitted to an independent data set. In the second approach, parameters were
fitted to a subset of the data set, which was used for simulations of the different hypotheses. The third approach calculated ODE model parameters 100 times without
fixed parameters. The parameter set ending up in trajectories of the TRP metabolites,
which have the smallest distance to the data, was assumed to be the most likely. The
ODE model parameters were fitted to data measured in pigs. Two different
experimental models delivered data used in this thesis. The first experimental model
activates IDO by LPS administration in pigs. The second one combines the IDO
activation by LPS with the administration of 1-MT in pigs.
The most likely hypothesis, according to approach 1 was the degradation of 1-MT to
KYNA and TRP. For the second data set the most likely one was the direct degradation of 1-MT to KYNA. With approach 2 the most likely degradation pathways were
the combination of all degradation pathways and the degradation of 1-MT to TRP and
TRP to KYNA. With approach 3 the most likely way of KYNA increase was given by
the direct degradation of 1-MT to KYNA. In summary, the three approaches revealed
hypothesis 2, the direct degradation of 1-MT to KYNA most frequently. A cell-free
assay validated this result. This experiment combined 1-MT or TRP with or without
the enzyme kynurenine aminotransferase (KAT). KAT was already shown to degrade
TRP directly to KYNA. The levels of TRP, KYN and KYNA were measured. The
highest KYNA levels were yielded with an assay adding KAT to 1-MT, corresponding
to hypothesis 2. The models describing the inhibitory effects of 1-MT revealed that
the model without inhibitory effects of 1-MT on IDO was more likely for all three approaches.
The correctness of hypothesis 2 has to be confirmed by further in vitro experiments. It
also has to be investigated which reactions promote the degradation of 1-MT to KYNA.
The missing inhibitory properties of 1-MT on IDO, determined by the in silico ODE
models, align with previous research. It was shown that the saturation of 1-MT was too
low, e.g. in pigs, to inhibit IDO efficiently.
In this study, the first possible degradation pathway of 1-MT along the KP is proposed.
The reliability of the results depends on the quality of the experimental data, and the
season, when data were measured. Moreover, the results vary between the different
approaches of parameter fitting. Different approaches of parameter fitting have to be
included in the analysis to get more evidence for the correctness of the results.
Tafazzin is an acyltransferase with key functions in remodeling of the mitochondrial phospholipid cardiolipin (CL) by exchanging single fatty acids species in CL. Tafazzin-mediated CL remodeling determines the actual CL compositions and has been implicated in mitochondrial morphology and function. Thus, any deficiency of tafazzin leads to altered fatty acid composition of CL which is directly associated with impaired mitochondrial respiration and ATP production. Mutations in the tafazzin encoding gene TAZ, are the cause of the severe X-linked genetic disease, BARTH syndrome (BTHS).
Previous work provided first hints on a linkage of CL composition and subsequent limitations in the cellular ATP levels which may contribute to the restriction of growth. However, in C6 cells ATP levels remained unaltered due to compensatory activation of glycolysis. Moreover, it has been demonstrated that the substantial changes in CL composition are similarly resulting from knocking down either cardiolipin synthase (CRLS) or TAZ. This has also been shown in C6 glioma cells. Most notably only the knock down of TAZ, but not that of CRLS, compromised proliferation of C6 glioma cells. Therefore, a CL- independent role of TAZ in regulating cell proliferation is postulated.
In this study, any linkage of the lack of tafazzin to cellular proliferation should be investigated in more detail to allow first insight into underlying mechanisms.
The results of the current study demonstrate that the tafazzin knockout in C6 glioma cells show changes in global gene expression by applying transcriptome analysis using the- microarray Clarion S rat Affymetrix array. Out of 22,076 total number of genes detected, 1,099 genes were differentially expressed in C6 knockout cells which were either ≥2 and ≥4 fold up or down regulated genes. Furthermore, expression of selected target genes was validated using RT-qPCR. We have hypothesised that the changes in TAZ dependent gene expression is via PPAR transcription factor. According to eukaryotic promoter database (EPD) for selected target genes, exhibited at least one putative binding site for PPARG and PPARA transcription factors. However, pioglitazone and LG100268, synthetic ligands of PPARG and RXR, could not show any effect on changes in gene expression in C6 TAZ cells. Another class of cellular lipids, oxylipins were found to occur in significantly higher amounts in C6 TAZ cells compared to C6 cells which makes them candidates for mediating cellular effects and regulating gene expression via PPARs. A computational tool CiiiDER was used to for the prediction of transcription factor binding site. The transcription factors enriched in TAZ- regulated genes were found to be HOXA5 and PAX2, binding sites of which could be detected in 100 % of TAZ- regulated genes (>2-fold). By applying IPA to the differentially expressed genes we could identify lipid metabolism, and cholesterol superpathway in particular as the most affected pathway in C6 TAZ cells. This pathway consists of 20 genes, of which all (20/20) appeared to be differentially regulated in C6 TAZ cells. Of all the 20 genes, 4 of the differentially expressed genes were selected for further validation by RT-qPCR. By IPA it was possible to identify the upstream regulators that might be responsible for the differential expression of genes in C6 deficient cells. Some of the genes ACACA, HMGCR, FASN, ACSL1, 3 and, 5 identified was decreased by predicted activation and inhibition of the regulators. Further we have analysed the levels of cellular cholesterol content in C6 and C6 TAZ (w/o Δ5 and FL) cells. In C6 cells cholesterol is present more in its free form. C6 TAZ cells have increased amount of cholesterol compared to C6 cells. However, Δ5 and FL expressed C6 TAZ cells showed less amount of cholesterol.
Previous work established that knockout of tafazzin in C6 cells showed decreased cell proliferation in the absence of any changes in ATP content. To understand this phenomenon cellular senescence associated β-galactosidase in C6 and C6 TAZ cells was performed. C6 TAZ cells showed increased percentage of β-gal positive cells compared to C6 cells. Moreover, senescent associated secretory phenotype (SASP) represented by e.g. CXCL1, IL6, and IL1α was determined using RT-qPCR. Gene expression of these SASP factors was significantly upregulated in C6 TAZ cells.
Several human tafazzin isoforms exists due to alternate splicing. However, whether these isoforms differ in function and in CL remodelling activity or specificity, in particular, is unknown. The purpose of this work was to determine if specific isoforms, such as human isoform lacking exon 5 (Δ5), rat full length tafazzin (FL) and enzymatically dead full length tafazzin (H69L), can restore the wild type phenotype in terms of CL composition, cellular proliferation, and gene expression profile. Therefore, in the second part, it was demonstrated that expression of Δ5 to some extent and rat full length tafazzin can completely restore CL composition, in C6 TAZ cells which is naturally linked to the restoration of mitochondrial respiration. As expected, a comparable restoration of CL composition could not be seen after re-expressing an enzymatically dead full-length rat TAZ, (H69L; TAZ Mut). Furthermore, re-expression of the TAZ Mut largely failed to reverse the alterations in gene expression, in contrast re-expression of the TAZ FL and the Δ5 isoforms reversed gene expression to a larger extent. Moreover, only rat full length TAZ was able to reverse proliferation rate. Surprisingly, the expression of Δ5 in C6 TAZ cells did not promote proliferation of the wild type. Different effects of Δ5 and FL on CL composition and cell proliferation points to the specific and in part non-enzymatic functions of tafazzin isoforms, but this certainly requires further analysis.
Universal products provide an axiomatic framework to study noncommutative independences general enough to include, besides the well known "single-faced" case (i.e., tensor, free, Boolean, monotone and antimonotone independence), also more recent "multi-faced" examples like bifree independence. Questions concerning classification have been fully answered in the single-faced case, but are in general still open in the multi-faced case. In this thesis we discuss how one can use insights in the relation between universal products and their associated moment-cumulant formula as a starting point towards a combinatorial approach to (multi-faced) universal products. We define certain classes of partitions and discuss why the defining axioms are sufficient to associate to each of them a multi-faced universal product. For the two-faced case we present our result that every positive and symmetric universal product can be produced in this fashion and we outline how these results might contribute to a classification of positive and symmetric universal products.
Geometric T-Duality
(2022)
From a physicists point of view T-duality is a relation connecting string
theories on different spacetimes. Mathematically speaking, T-duality should be a symmetric relation on
the space of toroidal string backgrounds. Such a background consists of: a smooth manifold M; a torus bundle E over M - the total space modelling spacetime; a Riemannian metric g on E - modelling the field of gravity; a U(1)-bundle gerbe G with connection over E - modelling the Kalb-
Ramond field.
But as of now no complete model for T-duality exists. The three most notable
approaches for T-duality are given by the differential approaches by Buscher in the form of the Buscher rules and by Bouwknegt, Evslin and Mathai in the form of T-duality with H-flux on the one hand, and by the topological approach given by Bunke, Rumpf and Schick which is known as topological T-duality. In this thesis we combine these different approaches to form the first model for T-duality over complete geometric toroidal string backgrounds and we will introduce an example for this geometric T-duality inspired by the Hopf bundle.
Discovering Latent Structure in High-Dimensional Healthcare Data: Toward Improved Interpretability
(2022)
This cumulative thesis describes contributions to the field of interpretable machine learning in the healthcare domain. Three research articles are presented that lie at the intersection of biomedical and machine learning research. They illustrate how incorporating latent structure can provide a valuable compression of the information hidden in complex healthcare data.
Methodologically, this thesis gives an overview of interpretable machine learning and the discovery of latent structure, including clusters, latent factors, graph structure, and hierarchical structure. Different workflows are developed and applied to two main types of complex healthcare data (cohort study data and time-resolved molecular data). The core result builds on Bayesian networks, a type of probabilistic graphical model. On the application side, we provide accurate predictive or discriminative models focusing on relevant medical conditions, related biomarkers, and their interactions.
Spatial variation in survival has individual fitness consequences and influences population dynamics. It proximately and ultimately impacts space use including migratory connectivity. Therefore, knowing spatial patterns of survival is crucial to understand demography of migrating animals. Extracting information on survival and space use from observation data, in particular dead recovery data, requires explicitly identifying the observation process. The main aim of this work is to establish a modeling framework which allows estimating spatial variation in survival, migratory connectivity and observation probability using dead recovery data. We provide some biological background on survival and migration and a short methodological overview of how similar situations are modeled in literature.
Afterwards, we provide REML-like estimators for discrete space and show identifiability of all three parameters using the characteristics of the multinomial distribution. Moreover, we formulate a model in continuous space using mixed binomial point processes. The continuous model assumes a constant recovery probability over space. To drop this strict assumption, we develop an optimization procedure combining the discrete and continuous space model. Therefore, we use penalized M-splines. In simulation studies we demonstrate the performance of the estimators for all three model approaches. Furthermore, we apply the models to real-world data sets of European robins \textit{Erithacus rubecula} and ospreys \textit{Pandion haliaetus}.
We discuss how this study can be embedded in the framework of animal movement and the capture mark recapture/recovery methodology. It can be seen as a contribution and an extension to distance sampling, local stationary everyday movement and dispersal. We emphasize the importance of having a mathematically clearly formulated modeling framework for applied methods. Moreover, we comment on model assumptions and their limits. In the future, it would be appealing to extend this framework to the full annual cycle and carry-over effects.
Twisted topological K-theory is a twisted version of topological K-theory in the sense of twisted generalized cohomology theories. It was pioneered by Donavan and Karoubi in 1970 where they used bundles of central simple graded algebras to model twists of K-theory. By the end of the last century physicists realised that D-brane charges in the field of string theory may be studied in terms of twisted K-theory. This rekindled interest in the topic lead to a wave of new models for the twists and new ways to realize the respective twisted K-theory groups. The state-of-the-art models today use bundles of projective unitary operators on separable Hilbert spaces as twists and K-groups are modeled by homotopy classes of sections of certain bundles of Fredholm operators. From a physics perspective these treatments are not optimal yet: they are intrinsically infinite-dimensional and these models do not immediately allow the inclusion of differential data like forms and connections.
In this thesis we introduce the 2-stack of k-algebra gerbes. Objects, 1-morphisms and 2-morphisms consist of finite-dimensional geometric data simultaneously generalizing bundle gerbes and bundles of central simple graded k-algebras for k either the field of real numbers or the field of complex numbers. We construct an explicit isomorphism from equivalence classes of k-algebra gerbes over a space X to the full set of twists of real K-theory and complex K-theory respectively. Further, we model relative twisted K-groups for compact spaces X and closed subspaces Y twisted by algebra gerbes. These groups are modeled directly in terms of 1-morphisms and 2-morphisms of algebra gerbes over X. We exhibit a relation to the K-groups introduced by Donavan and Karoubi and we translate their fundamental isomorphism -- an isomorphism relating K-groups over Thom spaces with K-groups twisted by Clifford algebra bundles -- to the new setting. With the help of this fundamental isomorphism we construct an explicit Thom isomorphism and explicit pushforward homomorphisms for smooth maps between compact manifolds, without requiring these maps to be K-oriented. Further -- in order to treat K-groups for non-torsion twists -- we implement a geometric cocycle model, inspired by a related geometric cycle model developed by Baum and Douglas for K-homology in 1982, and construct an assembly map for this model.
A common task in natural sciences is to
describe, characterize, and infer relations between discrete
objects. A set of relations E on a set of objects V can
naturally be expressed as a graph G = (V, E). It is
therefore often convenient to formalize problems in natural
sciences as graph theoretical problems.
In this thesis we will examine a number of problems found in
life sciences in particular, and show how to use graph theoretical
concepts to formalize and solve the presented problems. The
content of the thesis is a collection of papers all
solving separate problems that are relevant to biology
or biochemistry.
The first paper examines problems found in self-assembling
protein design. Designing polypeptides, composed of concatenated
coiled coil units, to fold into polyhedra turns out
to be intimately related to the concept of 1-face embeddings in
graph topology. We show that 1-face embeddings can be
canonicalized in linear time and present algorithms to enumerate
pairwise non-isomorphic 1-face embeddings in orientable surfaces.
The second and third paper examine problems found in evolutionary
biology. In particular, they focus on
inferring gene and species trees directly from sequence data
without any a priori knowledge of the trees topology. The second
paper characterize when gene trees can be inferred from
estimates of orthology, paralogy and xenology relations when only
partial information is available. Using this characterization an
algorithm is presented that constructs a gene tree consistent
with the estimates in polynomial time, if one exists. The
shown algorithm is used to experimentally show that gene trees
can be accurately inferred even in the case that only 20$\%$ of
the relations are known. The third paper explores how to
reconcile a gene tree with a species tree in a biologically
feasible way, when the events of the gene tree are known.
Biologically feasible reconciliations are characterized using
only the topology of the gene and species tree. Using this
characterization an algorithm is shown that constructs a
biologically feasible reconciliation in polynomial time, if one
exists.
The fourth and fifth paper are concerned with with the analysis
of automatically generated reaction networks. The fourth paper
introduces an algorithm to predict thermodynamic properties of
compounds in a chemistry. The algorithm is based on
the well known group contribution methods and will automatically
infer functional groups based on common structural motifs found
in a set of sampled compounds. It is shown experimentally that
the algorithm can be used to accurately
predict a variety of molecular properties such as normal boiling
point, Gibbs free energy, and the minimum free energy of RNA
secondary structures. The fifth and final paper presents a
framework to track atoms through reaction networks generated by a
graph grammar. Using concepts found in semigroup theory, the
paper defines the characteristic monoid of a reaction network. It
goes on to show how natural subsystems of a reaction network organically
emerge from the right Cayley graph of said monoid. The
applicability of the framework is proven by applying it to the
design of isotopic labeling experiments as well as to the
analysis of the TCA cycle.
Mathematical phylogenetics provides the theoretical framework for the reconstruction and analysis of phylogenetic trees and networks. The underlying theory is based on various mathematical disciplines, ranging from graph theory to probability theory.
In this thesis, we take a mostly combinatorial and graph-theoretical position and study different problems concerning phylogenetic trees and networks.
We start by considering phylogenetic diversity indices that rank species for conservation. Two such indices for rooted trees are the Fair Proportion index and the Equal Splits index, and we analyze how different they can be from each other and under which circumstances they coincide. Moreover, we define and investigate analogues of these indices for unrooted trees.
Subsequently, we study the Shapley value of unrooted trees, another popular phylogenetic diversity index. We show that it may fail as a prioritization criterion in biodiversity conservation and is outcompeted by an existing greedy approach. Afterwards, we leave the biodiversity setting and consider the Shapley value as a tree reconstruction tool. Here, we show that non-isomorphic trees may have permutation-equivalent Shapley transformation matrices and identical Shapley values, implying that the Shapley value cannot reliably be employed in tree reconstruction.
In addition to phylogenetic diversity indices, another class of indices frequently discussed in mathematical phylogenetics, is the class of balance indices. In this thesis, we study one of the oldest and most popular of them, namely the Colless index for rooted binary trees. We focus on its extremal values and analyze both its maximum and minimum values as well as the trees that achieve them.
Having analyzed various questions regarding phylogenetic trees, we finally turn to phylogenetic networks. We focus on a certain class of phylogenetic networks, namely tree-based networks, and consider this class both in a rooted and in an unrooted setting.
First, we prove the existence of a rooted non-binary universal tree-based network with n leaves for all positive integers n, that is, we show that there exists a rooted non-binary tree-based network with $n$ leaves that has every non-binary phylogenetic tree on the same leaf set as a base tree.
Finally, we study unrooted tree-based networks and introduce a class of networks that are necessarily tree-based, namely edge-based networks. We show that edge-based networks are closely related to a family of graphs in classical graph theory, so-called generalized series-parallel graphs, and explore this relationship in full detail.
In summary, we add new insights into existing concepts in mathematical phylogenetics, answer open questions in the literature, and introduce new concepts and approaches. In doing so, we make a small but relevant contribution to current research in mathematical phylogenetics.
In this thesis, we elaborate upon Bayesian changepoint analysis, whereby our focus is on three big topics: approximate sampling via MCMC, exact inference and uncertainty quantification. Besides, modeling matters are discussed in an ongoing fashion. Our findings are underpinned through several changepoint examples with a focus on a well-log drilling data.
Given a manifold with a string structure, we construct a spinor bundle on its loop space. Our construction is in analogy with the usual construction of a spinor bundle on a spin manifold, but necessarily makes use of tools from infinite dimensional geometry. We equip this spinor bundle on loop space with an action of a bundle of Clifford algebras. Given two smooth loops in our string manifold that share a segment, we can construct a third loop by deleting this segment. If this third loop is smooth, then we say that the original pair of loops is a pair of compatible loops. It is well-known that this operation of fusing compatible loops is important if one wants to understand the geometry of a manifold through its loop space. In this work, we explain in detail how the spinor bundle on loop space behaves with respect to fusion of compatible loops. To wit, we construct a family of fusion isomorphisms indexed by pairs of compatible loops in our string manifold. Each of these fusion isomorphisms is an isomorphism from the relative tensor product of the fibres of the spinor bundle over its index pair of compatible loops to the fibre over the loop that is the result of fusing the index pair. The construction of a spinor bundle on loop space equipped with a fusion product as above was proposed by Stolz and Teichner with the goal of studying the Dirac operator on loop space". Our construction combines facets of the theory of bimodules for von Neumann algebras, infinite dimensional manifolds, and Lie groups and their representations. We moreover place our spinor bundle on loop space in the context of bundle gerbes and bundle gerbe modules.
In phylogenetics, evolutionary relationships of different species are represented by phylogenetic trees.
In this thesis, we are mainly concerned with the reconstruction of ancestral sequences and the accuracy of this reconstruction given a rooted binary phylogenetic tree.
For example, we wish to estimate the DNA sequences of the ancestors given the observed DNA sequences of today living species.
In particular, we are interested in reconstructing the DNA sequence of the last common ancestor of all species under consideration. Note that this last common ancestor corresponds to the root of the tree.
There exist various methods for the reconstruction of ancestral sequences.
A widely used principle for ancestral sequence reconstruction is the principle of parsimony (Maximum Parsimony).
This principle means that the simplest explanation it the best.
Applied to the reconstruction of ancestral sequences this means that a sequence which requires the fewest evolutionary changes along the tree is reconstructed.
Thus, the number of changes is minimized, which explains the name of Maximum Parsimony.
Instead of estimating a whole DNA sequence, Maximum Parsimony considers each position in the sequence separately. Thus in the following, each sequence position is regarded separately, and we call a single position in a sequence state.
It can happen that the state of the last common ancestor is reconstructed unambiguously, for example as A. On the other hand, Maximum Parsimony might be indecisive between two DNA nucleotides, say for example A and C.
In this case, the last common ancestor will be reconstructed as {A,C}.
Therefore we consider, after an introduction and some preliminary definitions, the following question in Section 3: how many present-day species need to be in a certain state, for example A, such that the Maximum Parsimony estimate of the last common ancestor is also {A}?
The answer of this question depends on the tree topology as well as on the number of different states.
In Section 4, we provide a sufficient condition for Maximum Parsimony to recover the ancestral state at the root correctly from the observed states at the leaves.
The so-called reconstruction accuracy for the reconstruction of ancestral states is introduced in Section 5. The reconstruction accuracy is the probability that the true root state is indeed reconstructed and always takes two processes into account: on the one hand the approach to reconstruct ancestral states, and on the other hand the way how the states evolve along the edges of the tree. The latter is given by an evolutionary model.
In the present thesis, we focus on a simple symmetric model, the Neyman model.
The symmetry of the model means for example that a change from A to C is equally likely than a change from C to A.
Intuitively, one could expect that the reconstruction accuracy it the highest when all present-day species are taken into account. However, it has long been known that the reconstruction accuracy improves when some taxa are disregarded for the estimation.
Therefore, the question if there exits at least a lower bound for the reconstruction accuracy arises, i.e. if it is best to consider all today living species instead of just one for the reconstruction.
This is bad news for Maximum Parsimony as a criterion for ancestral state reconstruction, and therefore the question if there exists at least a lower bound for the reconstruction accuracy arises.
In Section 5, we start with considering ultrametric trees, which are trees where the expected number of substitutions from the root to each leaf is the same.
For such trees, we investigate a lower bound for the reconstruction accuracy, when the number of different states at the leaves of the tree is 3 or 4.
Subsequently in Section 6, in order to generalize this result, we introduce a new method for ancestral state reconstruction: the coin-toss method.
We obtain new results for the reconstruction accuracy of Maximum Parsimony by relating Maximum Parsimony to the coin-toss method.
Some of these results do not require the underlying tree to be ultrametric.
Then, in Section 7 we investigate the influence of specific tree topologies on the reconstruction accuracy of Maximum Parsimony. In particular, we consider balanced and imbalanced trees as the balance of a tree may have an influence on the reconstruction accuracy.
We end by introducing the Colless index in Section 8, an index which measures the degree of balance a rooted binary tree can have, and analyze its extremal properties.
Self-affine tiles and fractals are known as examples in analysis and topology, as models of quasicrystals and biological growth, as unit intervals of generalized number systems, and as attractors of dynamical systems. The author has implemented a software which can find new examples and handle big databases of self-affine fractals. This thesis establishes the algebraic foundation of the algorithms of the IFStile package. Lifting and projection of algebraic and rational iterated function systems and many properties of the resulting attractors are discussed.
As the tree of life is populated with sequenced genomes ever more densely, the new challenge is the accurate and consistent annotation of entire clades of genomes. In my dissertation, I address this problem with a new approach to comparative gene finding that takes a multiple genome alignment of closely related species and simultaneously predicts the location and structure of protein-coding genes in all input genomes, thereby exploiting negative selection and sequence conservation. The model prefers potential gene structures in the different genomes that are in agreement with each other, or—if not—where the exon gains and losses are plausible given the species tree. The multi-species gene finding problem is formulated as a binary labeling problem on a graph. The resulting optimization problem is NP hard, but can be efficiently approximated using a subgradient-based dual decomposition approach.
I tested the novel approach on whole-genome alignments of 12 vertebrate and 12 Drosophila species. The accuracy was evaluated for human, mouse and Drosophila melanogaster and compared to competing methods. Results suggest that the new method is well-suited for annotation of a large number of genomes of closely related species within a clade, in particular, when RNA-Seq data are available for many of the genomes. The transfer of existing annotations from one genome to another via the genome alignment is more accurate than previous approaches that are based on protein-spliced alignments, when the genomes are at close to medium distances. The method is implemented in C++ as part of the gene finder AUGUSTUS.
We consider Iterated Function Systems (IFS) on the real line and on the complex plane. Every IFS defines a self-similar measure supported on a self-similar set. We study the transfer operator (which acts on the space of continuous functions on the self-similar set) and the Hutchinson operator (which acts on the space of Borel regular measures on the self-similar set). We show that the transfer operator has an infinitely countable set of polynomial eigenfunctions. These eigenfunctions can be regarded as generalized Bernoulli polynomials. The polynomial eigenfuctions define a polynomial approximation of the self-similar measure. We also study the moments of the self-similar measure and give recursions for computing them. Further, we develop a numerical method based on Markov chains to study the spectrum of the Hutchinson and transfer operators. This method provides numerical approximations of the invariant measure for which we give error bounds in terms of the Wasserstein-distance. The standard example in this thesis is the parametric family of Bernoulli convolutions.
The history of Mathematics has been lead in part by the desire for generalization: once an object was given and had been understood, there was the desire to find a more general version of it, to fit it into a broader framework. Noncommutative Mathematics fits into this description, as its interests are objects analoguous to vector spaces, or probability spaces, etc., but without the commonsense interpretation that those latter objects possess. Indeed, a space can be described by its points, but also and equivalently, by the set of functions on this space. This set is actually a commutative algebra, sometimes equipped with some more structure: *-algebra, C*-algebra, von Neumann algebras, Hopf algebras, etc. The idea that lies at the basis of noncommutative Mathematics is to replace such algebras by algebras that are not necessarily commutative any more and to interpret them as "algebras of functions on noncommutative spaces". Of course, these spaces do not exist independently from their defining algebras, but facts show that a lot of the results holding in (classical) probability or (classical) group theory can be extended to their noncommutative counterparts, or find therein powerful analogues. The extensions of group theory into the realm of noncommutative Mathematics has long been studied and has yielded the various quantum groups. The easiest version of them, the compact quantum groups, consist of C*-algebras equipped with a *-homomorphism &Delta with values in the tensor product of the algebra with itself and verifying some coassociativity condition. It is also required that the compact quantum group verifies what is known as quantum cancellation property. It can be shown that (classical) compact groups are indeed a particular case of compact quantum groups. The area of compact quantum groups, and of quantum groups at large, is a fruitful area of research. Nevertheless, another generalization of group theory could be envisioned, namely by taking a comultiplication &Delta taking values not in the tensor product but rather in the free product (in the category of unital *-algebras). This leads to the theory of dual groups in the sense of Voiculescu, also called H-algebras by Zhang. These objects have not been so thoroughly studied as their quantum counterparts. It is true that they are not so flexible and that we therefore do not know many examples of them and showing that some relations cannot exist in the dual group case because they do not pass the coproduct. Nevertheless, I have been interested during a great part of my PhD work by these objects and I have made some progress towards their understanding, especially regarding quantum Lévy processes defined on them and Haar states.
This thesis deals with thickness optimization of shells. The overall task is to find an optimal thickness distribution in order to minimize the deformation of a loaded shell with prescribed volume. In addition, lower and upper bounds for the thickness are given. The shell is made of elastic, isotropic, homogeneous material. The deformation is modeled using equations from Linear Elasticity. Here, a basic shell model based on the Reissner-Mindlin assumption is used. Both the stationary and the dynamic case are considered. The continuity and the Gâteaux-differentiability of the control-to-state operator is investigated. These results are applied to the reduced objective with help of adjoint theory. In addition, techniques from shape optimization are compared to the optimal control approach. In the following, the theoretical results are applied to cylindrical shells and an efficient numerical implementation is presented. Finally, numerical results are shown and analyzed for different examples.
Today the process of improving technology and software allows to create, save and explore massive data sets in little time. "Big Data" are everywhere such as in social networks, meteorology, customers’ behaviour – and in biology. The Omics research field, standing for the organism-wide data exploration and analysis, is an example of biological research that has to deal with "Big Data" challenges. Possible challenges are for instance effcient storage and cataloguing of the data sets and finally the qualitative analysis and exploration of the information. In the last decade largescale genome-wide association studies and high-throughput techniques became more effcient, more profitable and less expensive. As a consequence of this rapid development, it is easier to gather massive amounts of genomic and proteomic data. However, these data need to get evaluated, analysed and explored. Typical questions that arise in this context include: which genes are active under sever al physical states, which proteins and metabolites are available, which organisms or cell types are similar or different in their enzymes’or genes’ behaviour. For this reason and because a scientist of any "Big Data" research field wants to see the data, there is an increasing need of clear, intuitively understandable and recognizable visualization to explore the data and confirm thesis. One way to get an overview of the data sets is to cluster it. Taxonomic trees and functional classification schemes are hierarchical structures used by biologists to organize the available biological knowledge in a systematic and computer readable way (such as KEGG, GO and FUNCAT). For example, proteins and genes could be clustered according to their function in an organism. These hierarchies tend to be rather complex, and many comprise thousands of biological entities. One approach for a space-filling visualization of these hierarchical structured data sets is a treemap. Existing algorithms for producing treemaps struggle with large data sets and have several other problems. This thesis addresses some of these problems and is structured as follows. After a short review of the basic concepts from graph theory some commonly used types of treemaps and a classification of treemaps according to information visualization aspects is presented in the first chapter of this thesis. The second chapter of this thesis provides several methods to improve treemap constructions. In certain applications the researcher wants to know, how the entities in a hierarchical structure are related to each other (such as enzymes in a metabolic pathway). Therefore in the 3 third chapter of this thesis, the focus is on the construction of a suitable layout overlaying an existing treemap. This gives rise to optimization problems on geometric graphs. In addition, from a practical point of view, options for enhancing the display of the computed layout are explored to help the user perform typical tasks in this context more effciently. One important aspect of the problems on geometric graphs considered in the third chapter of the thesis is that crossings of edges in a network structure are to be minimized while certain other properties such as connectedness are maintained. Motivated by this, in the fourth chapter of this thesis, related combinatorial and computational problems are explored from a more theoretical point of view. In particular some light is shed on properties of crossing-free spanning trees in geometric graphs.