### Refine

#### Year of publication

#### Language

- English (22) (remove)

#### Keywords

- Statistik (3)
- Algebra (2)
- Bioinformatik (2)
- Fraktal (2)
- Funktionalanalysis (2)
- SelbstĂ¤hnlichkeit (2)
- fractal (2)
- self-similarity (2)
- (generalized) linear mixed model (1)
- (verallgemeinertes) lineares gemischtes Modell (1)

#### Institute

- Institut fĂĽr Mathematik und Informatik (22) (remove)

This thesis revolves around a new concept of independence of algebras. The independence nicely fits into the framework of universal products, which have been introduced to classify independence relations in quantum probability theory; the associated product is called (r,s)-product and depends on two complex parameters r and s. Based on this product, we develop a theory which works without using involutive algebras or states. The following aspects are considered: 1. Classification: Universal products are defined on the free product of algebras (the coproduct in the category of algebras) and model notions of independence in quantum probability theory. We distinguish universal products according to their behaviour on elements of length two, calling them (r,s)-universal products with complex parameters r and s respectively. In case r and s equal 1, Muraki was able to show that there exist exactly five universal products (Murakiâ€™s five). For r equals s nonzero we get five one parameter families (q-Murakiâ€™s five). We prove that in the case r not equal to s the (r,s)-product, a two parameter deformation of the Boolean product, is the only universal product satisfying our set of axioms. The corresponding independence is called (r,s)-independence. 2. Dual pairs and GNS construction: By use of the GNS construction, one can associate a product of representations with every positive universal product. Since the (r,s)-product does not preserve positivity, we need a substitute for the usual GNS construction for states on involutive algebras. In joint work with M. Gerhold, the product of representations associated with the (r,s)-product was determined, whereby we considered representations on dual pairs instead of Hilbert spaces. This product of representations is - as we could show - essentially different from the Boolean product. 3. Reduction and quantum LĂ©vy processes: U. Franz introduced a category theoretical concept which allows a reduction of the Boolean, monotone and antimonotone independence to the tensor independence. This existing reduction could be modified in order to apply to the (r,s)-independence. Quantum LĂ©vy processes with (r,s)-independent increments can, in analogy with the tensor case, be realized as solutions of quantum stochastic differential equations. To prove this theorem, the previously mentioned reduction principle in the sense of U. Franz and a generalization of M. SchĂĽrmannâ€™s theory for symmetric Fock spaces over dual pairs are used. As the main result, we obtain the realization of every (r,s)-LĂ©vy process as solution of a quantum stochastic differential equation. When one, more generally, defines LĂ©vy processes in a categorial way using U. Franzâ€™s definition of independence for tensor categories with inclusions, compatibility of the inclusions with the tensor category structure plays an important role. For this thesis such a compatibility condition was formulated and proved to be equivalent to the characterization proposed by M. Gerhold. 4. Limit distributions: We work with so-called dual semigroups in the sense of D. V. Voiculescu (comonoids in the tensor category of algebras with free product). The polynomial algebra with primitive comultiplication is an example for such a dual semigroup. We use a "weakened" reduction which we call reduction of convolution and which essentially consists of a cotensor functor constructed from the symmetric tensor algebra. It turns dual semigroups into commutative bialgebras and also translates the convolution exponentials. This method, which can be nicely described in the categorial language, allows us to formulate central limit theorems for the (r,s)-independence and to calculate the correponding limit distributions (convergence in moments). We calculate the moments appearing in the central limit theorem for the (r,s)-product: The even moments are homogeneous polynomials in r and s with the Eulerian numbers as coefficients; the odd moments vanish. The moment sequence that we get from the central limit theorem for an arbitrary universal product is the moment sequence of a probability measure on the real line if and only if r equals s greater or equal to 1. In this case we present an explicit formula for the probability measure.

We consider Iterated Function Systems (IFS) on the real line and on the complex plane. Every IFS defines a self-similar measure supported on a self-similar set. We study the transfer operator (which acts on the space of continuous functions on the self-similar set) and the Hutchinson operator (which acts on the space of Borel regular measures on the self-similar set). We show that the transfer operator has an infinitely countable set of polynomial eigenfunctions. These eigenfunctions can be regarded as generalized Bernoulli polynomials. The polynomial eigenfuctions define a polynomial approximation of the self-similar measure. We also study the moments of the self-similar measure and give recursions for computing them. Further, we develop a numerical method based on Markov chains to study the spectrum of the Hutchinson and transfer operators. This method provides numerical approximations of the invariant measure for which we give error bounds in terms of the Wasserstein-distance. The standard example in this thesis is the parametric family of Bernoulli convolutions.

Self-affine tiles and fractals are known as examples in analysis and topology, as models of quasicrystals and biological growth, as unit intervals of generalized number systems, and as attractors of dynamical systems. The author has implemented a software which can find new examples and handle big databases of self-affine fractals. This thesis establishes the algebraic foundation of the algorithms of the IFStile package. Lifting and projection of algebraic and rational iterated function systems and many properties of the resulting attractors are discussed.

Approaches to the Analysis of Proteomics and Transcriptomics Data based on Statistical Methodology
(2014)

Recent developments in genomics and molecular biology led to the generation of an enormous amount of complex data of different origin. This is demonstrated by a number of published results from microarray experiments in Gene Expression Omnibus. The number was growing in exponential pace over the last decade. The challenge of interpreting these vast amounts of data from different technologies led to the development of new methods in the fields of computational biology and bioinformatics. Researchers often want to represent biological phenomena in the most detailed and comprehensive way. However, due to the technological limitations and other factors like limited resources this is not always possible. On one hand, more detailed and comprehensive research generates data of high complexity that is very often difficult to approach analytically, however, giving bioinformatics a chance to draw more precise and deeper conclusions. On the other hand, for low-complexity tasks the data distribution is known and we can fit a mathematical model. Then, to infer from this mathematical model, researchers can use well-known and standard methodologies. In return for using standard methodologies, the biological questions we are answering might not be unveiling the whole complexity of the biological meaning. Nowadays it is a standard that a biological study involves generation of large amounts of data that needs to be analyzed with a statistical inference. Sometimes data challenge researchers with low complexity task that can be performed with standard and popular methodologies as in Proteomic analysis of mouse oocytes reveals 28 candidate factors of the "reprogrammome". There, we established a protocol for proteomics data that involves preprocessing of the raw data and conducting Gene Ontology overrepresentation analysis utilizing hypergeometric distribution. In cases, where the data complexity is high and there are no published frameworks a researcher could follow, randomization can be an approach to exploit. In two studies by The mouse oocyte proteome escapes maternal aging and CellFateScout - a bioinformatics tool for elucidating small molecule signaling pathways that drive cells in a specific direction we showed how randomization can be performed for distinct complex tasks. In The mouse oocyte proteome escapes maternal aging we constructed a random sample of semantic similarity score between oocyte transcriptome and random transcriptome subset of oocyte proteome size. Therefore, we could calculate whether the proteome is representative of the trancriptome. Further, we established a novel framework for Gene Ontology overrepresentation that involves randomization testing. Every Gene Ontology term is tested whether randomly reassigning all gene labels of belonging to or not belonging to this term will decrease the overall expression level in this term. In CellFateScout - a bioinformatics tool for elucidating small molecule signaling pathways that drive cells in a specific direction we validated CellFateScout against other well-known bioinformatics tools. We stated the question whether our plugin is able to predict small molecule effects better in terms of expression signatures. For this, we constructed a protocol that uses randomization testing. We assess here if the small molecule effect described as a (set of) active signaling pathways, as detected by our plugin or other bioinformatics tools, is significantly closer to known small molecule targets than a random path.

A slice is an intersection of a hyperplane and a self-similar set. The main purpose of this work is the mathematical description of slices. A suitable tool to describe slices are branching dynamical systems. Such systems are a generalisation of ordinary discrete dynamical systems for multivalued maps. Simple examples are systems arising from Bernoulli convolutions and beta-representations. The connection between orbits of branching dynamical systems and slices is demsonstrated and conditions are derived under which the geometry of a slice can be computed. A number of interesting 2-d and 3-d slices through 3-d and 4-d fractals is discussed.

We present classical and hybrid modeling approaches for genetic regulatory networks focusing on promoter analysis for negatively and positively autoregulated networks. The main aim of this thesis is to introduce an alternative mathematical approach to model gene regulatory networks based on piecewise deterministic Markov processes (PDMP). During somitogenesis, a process describing the early segmentation in vertebrates, molecular oscillators play a crucial role as part of a segmentation clock. In mice, these oscillators are called Hes1 and Hes7 and are commonly modeled by a system of two delay differential equations including a Hill function, which describes gene repression by their own gene products. The Hill coefficient, which is a measure of nonlinearity of the binding processes in the promoter, is assumed to be equal to two, based on the fact that Hes1 and Hes7 form dimers.However, by standard arguments applied to binding analysis, we show that a higher Hill coefficient is reasonable. This leads to results different from those in literature which requires a more sophisticated model. For the Hes7 oscillator we present a system of ordinary differential equations including a Michaelis-Menten term describing a nonlinear degradation of the proteins by the ubiquitinpathway. As demonstrated by the Hes1 and Hes7 oscillator, promoter behavior can have strong influence on the dynamical behavior of genetic networks. Since purely deterministic systems cannot reveal phenomenons caused by the inherent random fluctuations, we propose a novel approach based on PDMPs. Such models allow to model binding processes of transcription factors to binding sites in a promoter as random processes, where all other processes like synthesis, degradation or dimerization of the gene products are modeled in deterministic manner. We present and discuss a simulation algorithm for PDMPs and apply it to three types of genetic networks: an unregulated gene, a toggle switch, and a positively autoregulated network. The different regulation characteristics are analyzed and compared by numerical means. Furthermore, we determine analytical solutions of the stationary distributions of one negatively, and three positively autoregulated networks. Based on these results, we analyze attenuation of noise in a negative feedback loop, and the question of graded or binary response in autocatalytic networks.

The goal of this doctoral thesis is to create and to implement methods for fully automatic segmentation applications in magnetic resonance images and datasets. The work introduces into technical and physical backgrounds of magnetic resonance imaging (MRI) and summarizes essential segmentation challenges in MRI data including technical malfunctions and ill-posedness of inverse segmentation problems. Theoretical background knowledge of all the used methods that are adapted and extended to combine them for problem-specific segmentation applications are explained in more detail. The first application for the implemented solutions in this work deals with two-dimensional tissue segmentation of atherosclerotic plaques in cardiological MRI data. The main part of segmentation solutions is designed for fully automatic liver and kidney parenchyma segmentation in three-dimensional MRI datasets to ensure computer-assisted organ volumetry in epidemiological studies. The results for every application are listed, described and discussed before important conclusions are drawn. Among several applied methods, the level set method is the main focus of this work and is used as central segmentation concept in the most applications. Thus, its possibilities and limitations for MRI data segmentation are analyzed. The level set method is extended by several new ideas to overcome possible limitations and it is combined as important part of modularized frameworks. Additionally, a new approach for probability map generation is presented in this thesis, which reduces data dimensionality of multiple MR-weightings and incorporates organ position probabilities in a probabilistic framework. It is shown, that essential organ features (i.e. MR-intensity distributions, locations) can be well represented in the calculated probability maps. Since MRI data are produced by using multiple MR- weightings, the used dimensionality reduction technique is very helpful to generate a single probability map, which can be used for further segmentation steps in a modularized framework.

Background: Computational tools for the investigation of transcriptional regulation, in particular of transcription factor binding sites (TFBS), in evolutionary context are developed. Existing sequence based tools prediction such binding sites do not consider their actual functionality, although it is known that besides the base sequence many other aspects are relevant for binding and for the effects of that binding. In particular in Eukaryotes a perfectly matching sequence motif is neither necessary nor sufficient for a functional transcription factor binding site. Published work in the field of transcriptional regulation frequently focus on the prediction of putative transcription factor binding sites based on sequence similarity to known binding sites. Furthermore, among the related software, only a small number implements visualization of the evolution of transcription factor binding sites or the integration of other regulation related data. The interface of many tools is made for computer scientists, although the actual interpretation of their outcome needs profound biological background knowledge. Results and Discussion: The tool presented in this thesis, "ReXSpecies" is a web application. Therefore, it is ready to use for the end user without installation providing a graphical user interface. Besides extensive automation of analyses of transcriptional regulation (the only necessary input are the genomic coordinates of a regulatory region), new techniques to visualize the evolution of transcription factor binding sites were developed. Furthermore, an interface to genome browsers was implemented to enable scientists to comprehensively analyze their regulatory regions with respect to other regulation relevant data. ReXSpecies contains a novel algorithm that searches for evolutionary conserved patterns of transcription factor binding sites, which could imply functionality. Such patterns were verified using some known transcription factor binding sites of genes involved in pluripotency. In the appendix, efficiency and correctness of the used algorithm are discussed. Furthermore, a novel algorithm to color phylogenetic trees intuitively is presented. In the thesis, new possibilities to render evolutionary conserved sets of transcription factor binding sites are developed. The thesis also discusses the evolutionary conservation of regulation and its context dependency. An important source of errors in the analysis of regulatory regions using comparative genetics is probably to find and to align homologous regulatory regions. Some alternatives to using sequence similarity alone are discussed. Outlook: Other possibilities to find (functional) homologous regulatory regions (besides whole-genome-alignments currently used) are BLAST searches, local alignments, homology databases and alignment-free approaches. Using one ore more of these alternatives could reduce the number of artifacts by reduction of the number of regions that are erroneously declared homologous. To achieve more robust predictions of transcription, the author suggests to use other regulation related data besides sequence data only. Therefore, the use and extension of existing tools, in particular of systems biology, is proposed.

In the PhD-thesis a conditional random field approach and its implementation is presented to predict the interaction sites of protein homo- and heterodimers using the spatial structure of one protein partner from a complex. The method includes a substantially simple edge feature model. A novel node feature class is introduced that is called -change in free energy-. The Online Large-Margin algorithm is adapted in order to train the model parameters given a classified reference set of proteins. A significantly higher prediction accuracy is achieved by combining our new node feature class with the standard node feature class relative accessible surface area. The quality of the predictions is measured by computing the area under the receiver operating characteristic.

Independence is a basic concept of probability theory and statistics. In a lot of fields of sciences, dependency of different variables is gained lots of attention from scientists. A measure, named information dependency, is proposed to express the dependency of a group of random variables. This measure is defined as the Kullback-Leibler divergence of a joint distribution with respect to a product-marginal distribution of these random variables. In the bivariate case, this measure is known as mutual information of two random variables. Thus, the measure information dependency has a strong relationship with the Information Theory. The thesis aims to give a thorough study of the information dependency from both mathematical and practical viewpoints. Concretely, we would like to research three following problems: 1. Proving that the information dependency is a useful tool to express the dependency of a group of random variables by comparing it with other measures of dependency. 2. Studying the methods to estimate the information dependency based on the samples of a group of random variables. 3. Investigating how the Independent Component Analysis problem, an interesting problem in statistics, can be solved using information dependency.

High-throughput expression data have become the norm in molecular biology research. However, the analysis of expression data is statistically and computationally challenging and has not kept up with their generation. This has resulted in large amounts of unexplored data in public repositories. After pre-processing and quality control, the typical gene expression analysis workflow follows two main steps. First, the complexity of the data is reduced by removing the genes that are redundant or irrelevant for the biological question that motivated the experiment, using a feature selection method. Second, relevant genes are investigated to extract biological information that could aid in the interpretation of the results. Different methods, such as functional annotation, clustering, network analysis, and/or combinations thereof are useful for the latter purpose. Here, I investigated and presented solutions to three problems encountered in the expression data analysis workflow. First, I worked on reducing complexity of high-throughput expression data by selecting relevant genes in the context of the sample classification problem. The sample classification problem aims to assign unknown samples into one of the known classes, such as healthy and diseased. For this purpose, I developed the relative signal-to-noise ratio (rSNR), a novel feature selection method which was shown to perform significantly better than other methods with similar objectives. Second, to better understand complex phenotypes using high-throughput expression data, I developed a pipeline to identify the underlying biological units, as well as their interactions. These biological units were assumed to be represented by groups of genes working in synchronization to perform a given function or participate in common biological processes or pathways. Thus, to identify biological units, those genes that had been identified as relevant to the phenotype under consideration through feature selection methods were clustered based on both their functional annotations and expression profiles. Relationships between the associated biological functions, processes, and/or pathways were investigated by means of a co-expression network. The developed pipeline provides a new perspective to the analysis of high-throughput expression data by investigating interactions between biological units. Finally, I contributed to a project where a network describing pluripotency in mouse was used to infer the corresponding network in human. Biological networks are context-specific. Combining network information with high-throughput expression data can explain the control mechanisms underlying changes and maintenance of complex phenotypes. The human network was constructed on the basis of orthology between mouse and human genes and proteins. It was validated with available data in the literature. The methods and strategies proposed here were mainly trained and tested on microarray expression data. However, they can be easily adapted to next-generation sequencing and proteomics data.

Interactive Visualization for the Exploration of Aligned Biological Networks and Their Evolution
(2011)

Network Visualization is a widely used tool in biology. The biological networks, as protein-interaction-networks are important for many aspects in life. Today biologists use the comparison of networks of different species (network alignment) to understand the networks in more detail and to understand the underlying evolution. The goal of this work is to develop a visualization software that is able to visualize network alignments and also their evolution. The presented software is the first software for such visualization tasks. It uses 3D graphics and also animations for the dynamic visualization of evolution. This work consists of a review of the Related Work, a chapter about our Graph-based Approach for Interactive Visualization of Evolving Network Alignments, an explanation of the Graph Layout Algorithm and some hints for the Software System.

The history of Mathematics has been lead in part by the desire for generalization: once an object was given and had been understood, there was the desire to find a more general version of it, to fit it into a broader framework. Noncommutative Mathematics fits into this description, as its interests are objects analoguous to vector spaces, or probability spaces, etc., but without the commonsense interpretation that those latter objects possess. Indeed, a space can be described by its points, but also and equivalently, by the set of functions on this space. This set is actually a commutative algebra, sometimes equipped with some more structure: *-algebra, C*-algebra, von Neumann algebras, Hopf algebras, etc. The idea that lies at the basis of noncommutative Mathematics is to replace such algebras by algebras that are not necessarily commutative any more and to interpret them as "algebras of functions on noncommutative spaces". Of course, these spaces do not exist independently from their defining algebras, but facts show that a lot of the results holding in (classical) probability or (classical) group theory can be extended to their noncommutative counterparts, or find therein powerful analogues. The extensions of group theory into the realm of noncommutative Mathematics has long been studied and has yielded the various quantum groups. The easiest version of them, the compact quantum groups, consist of C*-algebras equipped with a *-homomorphism &Delta with values in the tensor product of the algebra with itself and verifying some coassociativity condition. It is also required that the compact quantum group verifies what is known as quantum cancellation property. It can be shown that (classical) compact groups are indeed a particular case of compact quantum groups. The area of compact quantum groups, and of quantum groups at large, is a fruitful area of research. Nevertheless, another generalization of group theory could be envisioned, namely by taking a comultiplication &Delta taking values not in the tensor product but rather in the free product (in the category of unital *-algebras). This leads to the theory of dual groups in the sense of Voiculescu, also called H-algebras by Zhang. These objects have not been so thoroughly studied as their quantum counterparts. It is true that they are not so flexible and that we therefore do not know many examples of them and showing that some relations cannot exist in the dual group case because they do not pass the coproduct. Nevertheless, I have been interested during a great part of my PhD work by these objects and I have made some progress towards their understanding, especially regarding quantum LĂ©vy processes defined on them and Haar states.

We introduce a multi-step machine learning approach and use it to classify data from EEG-based brain computer interfaces. This approach works very well for high-dimensional EEG data. First all features are divided into subgroups and linear discriminant analysis is used to obtain a score for each subgroup. Then it is applied to subgroups of the resulting scores. This procedure is iterated until there is only one score remaining and this one is used for classification. In this way we avoid estimation of the high-dimensional covariance matrix of all features. We investigate the classifification performance with special attention to the small sample size case. For the normal model, we study the asymptotic error rate when dimension p and sample size n tend to infinity. This indicates how to defifine the sizes of subgroups at each step. In addition we present a theoretical error bound for the spatio-temporal normal model with separable covariance matrix, which results in a recommendation on how subgroups should be formed for this kind of data. Finally some techniques, for example wavelets and independent component analysis, are used to extract features of some kind of EEG-based brain computer interface data.

The geometric arena here is a smooth manifold of dimension n equipped with a Riemannian or pseudo-Riemannian metric and an affine connection. Field theories following from a variational principle are considered on this basis. In this context, all invariants which are quadratic in the curvature are determined. The work derives several manifestly covariant formulas for the Euler-Lagrange derivatives or the field equations. Some of these field theories can be interpreted as gravitational theories alternatively to EinsteinÂ´s general relativity theory. The work also touches the difficult problem to define and to calculate energy and momentum of a gravitational field.

The constructions of LĂ©vy processes from convolution semigroups and of product systems from subproduct systems respectively, are formally quite similar. Since there are many more comparable situations in quantum stochastics, we formulate a general categorial concept (comonoidal systems), construct corresponding inductive systems and show under suitable assumptions general properties of the corresponding inductive limits. Comonoidal systems in different tensor categories play a role in all chapters of the thesis. Additive deformations are certain comonoidal systems of algebras. These are obtained by deformation of the algebra structure of a bialgebra. If the bialgebra is even a Hopf algebra, then compatibility with the antipode automatically follows. This remains true also in the case of braided Hopf algebras. Subproduct systems are comonoidal systems of Hilbert spaces. In the thesis we deal with the question, what are the possible dimensions of finite-dimensional subproduct systems. In discrete time, this can be reduced to the combinatorial problem of determining the complexities of factorial languages. We also discuss the rational and continuous time case. A further source for comonoidal systems are universal products, which are used in quantum probability to model independence. For the (r,s)-products, which were recently introduced by S. Lachs, we determine the corresponding product of representations by use of a generalized GNS-construction.

This thesis deals with thickness optimization of shells. The overall task is to find an optimal thickness distribution in order to minimize the deformation of a loaded shell with prescribed volume. In addition, lower and upper bounds for the thickness are given. The shell is made of elastic, isotropic, homogeneous material. The deformation is modeled using equations from Linear Elasticity. Here, a basic shell model based on the Reissner-Mindlin assumption is used. Both the stationary and the dynamic case are considered. The continuity and the GĂ˘teaux-differentiability of the control-to-state operator is investigated. These results are applied to the reduced objective with help of adjoint theory. In addition, techniques from shape optimization are compared to the optimal control approach. In the following, the theoretical results are applied to cylindrical shells and an efficient numerical implementation is presented. Finally, numerical results are shown and analyzed for different examples.

Today the process of improving technology and software allows to create, save and explore massive data sets in little time. "Big Data" are everywhere such as in social networks, meteorology, customersâ€™ behaviour â€“ and in biology. The Omics research field, standing for the organism-wide data exploration and analysis, is an example of biological research that has to deal with "Big Data" challenges. Possible challenges are for instance effcient storage and cataloguing of the data sets and finally the qualitative analysis and exploration of the information. In the last decade largescale genome-wide association studies and high-throughput techniques became more effcient, more profitable and less expensive. As a consequence of this rapid development, it is easier to gather massive amounts of genomic and proteomic data. However, these data need to get evaluated, analysed and explored. Typical questions that arise in this context include: which genes are active under sever al physical states, which proteins and metabolites are available, which organisms or cell types are similar or different in their enzymesâ€™or genesâ€™ behaviour. For this reason and because a scientist of any "Big Data" research field wants to see the data, there is an increasing need of clear, intuitively understandable and recognizable visualization to explore the data and confirm thesis. One way to get an overview of the data sets is to cluster it. Taxonomic trees and functional classification schemes are hierarchical structures used by biologists to organize the available biological knowledge in a systematic and computer readable way (such as KEGG, GO and FUNCAT). For example, proteins and genes could be clustered according to their function in an organism. These hierarchies tend to be rather complex, and many comprise thousands of biological entities. One approach for a space-filling visualization of these hierarchical structured data sets is a treemap. Existing algorithms for producing treemaps struggle with large data sets and have several other problems. This thesis addresses some of these problems and is structured as follows. After a short review of the basic concepts from graph theory some commonly used types of treemaps and a classification of treemaps according to information visualization aspects is presented in the first chapter of this thesis. The second chapter of this thesis provides several methods to improve treemap constructions. In certain applications the researcher wants to know, how the entities in a hierarchical structure are related to each other (such as enzymes in a metabolic pathway). Therefore in the 3 third chapter of this thesis, the focus is on the construction of a suitable layout overlaying an existing treemap. This gives rise to optimization problems on geometric graphs. In addition, from a practical point of view, options for enhancing the display of the computed layout are explored to help the user perform typical tasks in this context more effciently. One important aspect of the problems on geometric graphs considered in the third chapter of the thesis is that crossings of edges in a network structure are to be minimized while certain other properties such as connectedness are maintained. Motivated by this, in the fourth chapter of this thesis, related combinatorial and computational problems are explored from a more theoretical point of view. In particular some light is shed on properties of crossing-free spanning trees in geometric graphs.

Parsimonious Histograms
(2010)

The dissertation is concerned with the construction of data driven histograms. Histograms are the most elementary density estimators at all. However, they require the specification of the number and width of the bins. This thesis provides two new construction methods delivering adaptive histograms where the required parameters are determined automatically. Both methods follow the principle of parsimony, i.e. the histograms are solutions of predetermined optimization problems. In both cases, but under different aspects, the number of bins is minimized. The dissertation presents the algorithms that solve the optimization problems and illustrates them by a number of numerical experiments. Important properties of the estimators are shown. Finally, the new developed methods are compared with standard methods by an extensive simulation study. By means of synthetic samples of different size and distribution the histograms are evaluated by special performance criteria. As one main result, the proposed methods yield histograms with considerably fewer bins and with an excellent ability of peak detection.

Self-similar sets are a class of fractals which can be rigorously defined and treated by mathematical methods. Their theory has been developed in n-dimensional space, but we have just a few good examples of self-similar sets in three-dimensional space. This thesis has two different aims. First, to extend fractal constructions from two-dimensional space to three-dimensional space. Second, to study some of the properties of these fractals such as finite type, disk-likeness, ball-likeness, and the Hausdorff dimension of boundaries. We will use the neighbor graph tool for creating new fractals, and studying their properties.

The study of sow reproduction traits is important in livestock science and production to increase animal survival and economic efficiency. This work deals with the detection of different effects on within-litter variance of birth weight by applying different statistical models with different distributional assumptions. The piglets within one litter were separated by sex. The trait of sow was formed from the sample variances of birth weights within litter separated by sex to consider the sex effect on mean birth weight. A linear mixed model (LMM) approach was fitted to the logarithmized sample variance and the sample standard deviation. A generalized linear mixed model with gamma distributed residuals and log-link function was applied to the untransformed sample variance. Appropriate weights were constructed to account for individual litter sizes. Models were compared by analysing data from Landrace and Large White. The estimates of heritability for the different traits ranged from 6-14%. The LMM for the weighted standard deviation of birth weights was identified as most suitable in terms of residual normality. Furthermore, the impact of pigletsÂ´ sex on birth weight variability was tested, but it was only proved for one practical dataset. Additionally, we analysed the influence of including or not including birth weights of stillborn piglets on the estimates of variance components of birth weight variability. With omitted stillborns the estimates of heritability resulted in about 2% higher values than in investigations of total born piglets. We were interested in the presence of the random boar effect on birth weight variability. The corresponding variance component was tested via restricted likelihood ratio test. Among others, the null distribution of the test statistic was approximated by parametric bootstrap simulations which were computational intensive. We picked up a two-parametric approach from literature and proposed a three-parametric approach to approximate the null distribution of the test statistic. We have analysed correlated data in balanced (simulated data) and unbalanced (empirical data) designs. The two-parametric approach using a scaled mixture of chisquare-distributions as well as a three-parametric approach, that uses a mixture of the point mass at zero and a gamma distribution, behaved most solid in all investigations and were most powerful in the simulation study.

As the tree of life is populated with sequenced genomes ever more densely, the new challenge is the accurate and consistent annotation of entire clades of genomes. In my dissertation, I address this problem with a new approach to comparative gene finding that takes a multiple genome alignment of closely related species and simultaneously predicts the location and structure of protein-coding genes in all input genomes, thereby exploiting negative selection and sequence conservation. The model prefers potential gene structures in the different genomes that are in agreement with each other, orâ€”if notâ€”where the exon gains and losses are plausible given the species tree. The multi-species gene finding problem is formulated as a binary labeling problem on a graph. The resulting optimization problem is NP hard, but can be efficiently approximated using a subgradient-based dual decomposition approach.
I tested the novel approach on whole-genome alignments of 12 vertebrate and 12 Drosophila species. The accuracy was evaluated for human, mouse and Drosophila melanogaster and compared to competing methods. Results suggest that the new method is well-suited for annotation of a large number of genomes of closely related species within a clade, in particular, when RNA-Seq data are available for many of the genomes. The transfer of existing annotations from one genome to another via the genome alignment is more accurate than previous approaches that are based on protein-spliced alignments, when the genomes are at close to medium distances. The method is implemented in C++ as part of the gene finder AUGUSTUS.