### Refine

#### Year of publication

#### Language

- English (22) (remove)

#### Keywords

- Statistik (3)
- Algebra (2)
- Bioinformatik (2)
- Fraktal (2)
- Funktionalanalysis (2)
- SelbstĂ¤hnlichkeit (2)
- fractal (2)
- self-similarity (2)
- (generalized) linear mixed model (1)
- (verallgemeinertes) lineares gemischtes Modell (1)

#### Institute

- Institut fĂĽr Mathematik und Informatik (22) (remove)

As the tree of life is populated with sequenced genomes ever more densely, the new challenge is the accurate and consistent annotation of entire clades of genomes. In my dissertation, I address this problem with a new approach to comparative gene finding that takes a multiple genome alignment of closely related species and simultaneously predicts the location and structure of protein-coding genes in all input genomes, thereby exploiting negative selection and sequence conservation. The model prefers potential gene structures in the different genomes that are in agreement with each other, orâ€”if notâ€”where the exon gains and losses are plausible given the species tree. The multi-species gene finding problem is formulated as a binary labeling problem on a graph. The resulting optimization problem is NP hard, but can be efficiently approximated using a subgradient-based dual decomposition approach.
I tested the novel approach on whole-genome alignments of 12 vertebrate and 12 Drosophila species. The accuracy was evaluated for human, mouse and Drosophila melanogaster and compared to competing methods. Results suggest that the new method is well-suited for annotation of a large number of genomes of closely related species within a clade, in particular, when RNA-Seq data are available for many of the genomes. The transfer of existing annotations from one genome to another via the genome alignment is more accurate than previous approaches that are based on protein-spliced alignments, when the genomes are at close to medium distances. The method is implemented in C++ as part of the gene finder AUGUSTUS.

The study of sow reproduction traits is important in livestock science and production to increase animal survival and economic efficiency. This work deals with the detection of different effects on within-litter variance of birth weight by applying different statistical models with different distributional assumptions. The piglets within one litter were separated by sex. The trait of sow was formed from the sample variances of birth weights within litter separated by sex to consider the sex effect on mean birth weight. A linear mixed model (LMM) approach was fitted to the logarithmized sample variance and the sample standard deviation. A generalized linear mixed model with gamma distributed residuals and log-link function was applied to the untransformed sample variance. Appropriate weights were constructed to account for individual litter sizes. Models were compared by analysing data from Landrace and Large White. The estimates of heritability for the different traits ranged from 6-14%. The LMM for the weighted standard deviation of birth weights was identified as most suitable in terms of residual normality. Furthermore, the impact of pigletsÂ´ sex on birth weight variability was tested, but it was only proved for one practical dataset. Additionally, we analysed the influence of including or not including birth weights of stillborn piglets on the estimates of variance components of birth weight variability. With omitted stillborns the estimates of heritability resulted in about 2% higher values than in investigations of total born piglets. We were interested in the presence of the random boar effect on birth weight variability. The corresponding variance component was tested via restricted likelihood ratio test. Among others, the null distribution of the test statistic was approximated by parametric bootstrap simulations which were computational intensive. We picked up a two-parametric approach from literature and proposed a three-parametric approach to approximate the null distribution of the test statistic. We have analysed correlated data in balanced (simulated data) and unbalanced (empirical data) designs. The two-parametric approach using a scaled mixture of chisquare-distributions as well as a three-parametric approach, that uses a mixture of the point mass at zero and a gamma distribution, behaved most solid in all investigations and were most powerful in the simulation study.

Self-similar sets are a class of fractals which can be rigorously defined and treated by mathematical methods. Their theory has been developed in n-dimensional space, but we have just a few good examples of self-similar sets in three-dimensional space. This thesis has two different aims. First, to extend fractal constructions from two-dimensional space to three-dimensional space. Second, to study some of the properties of these fractals such as finite type, disk-likeness, ball-likeness, and the Hausdorff dimension of boundaries. We will use the neighbor graph tool for creating new fractals, and studying their properties.

Parsimonious Histograms
(2010)

The dissertation is concerned with the construction of data driven histograms. Histograms are the most elementary density estimators at all. However, they require the specification of the number and width of the bins. This thesis provides two new construction methods delivering adaptive histograms where the required parameters are determined automatically. Both methods follow the principle of parsimony, i.e. the histograms are solutions of predetermined optimization problems. In both cases, but under different aspects, the number of bins is minimized. The dissertation presents the algorithms that solve the optimization problems and illustrates them by a number of numerical experiments. Important properties of the estimators are shown. Finally, the new developed methods are compared with standard methods by an extensive simulation study. By means of synthetic samples of different size and distribution the histograms are evaluated by special performance criteria. As one main result, the proposed methods yield histograms with considerably fewer bins and with an excellent ability of peak detection.

Today the process of improving technology and software allows to create, save and explore massive data sets in little time. "Big Data" are everywhere such as in social networks, meteorology, customersâ€™ behaviour â€“ and in biology. The Omics research field, standing for the organism-wide data exploration and analysis, is an example of biological research that has to deal with "Big Data" challenges. Possible challenges are for instance effcient storage and cataloguing of the data sets and finally the qualitative analysis and exploration of the information. In the last decade largescale genome-wide association studies and high-throughput techniques became more effcient, more profitable and less expensive. As a consequence of this rapid development, it is easier to gather massive amounts of genomic and proteomic data. However, these data need to get evaluated, analysed and explored. Typical questions that arise in this context include: which genes are active under sever al physical states, which proteins and metabolites are available, which organisms or cell types are similar or different in their enzymesâ€™or genesâ€™ behaviour. For this reason and because a scientist of any "Big Data" research field wants to see the data, there is an increasing need of clear, intuitively understandable and recognizable visualization to explore the data and confirm thesis. One way to get an overview of the data sets is to cluster it. Taxonomic trees and functional classification schemes are hierarchical structures used by biologists to organize the available biological knowledge in a systematic and computer readable way (such as KEGG, GO and FUNCAT). For example, proteins and genes could be clustered according to their function in an organism. These hierarchies tend to be rather complex, and many comprise thousands of biological entities. One approach for a space-filling visualization of these hierarchical structured data sets is a treemap. Existing algorithms for producing treemaps struggle with large data sets and have several other problems. This thesis addresses some of these problems and is structured as follows. After a short review of the basic concepts from graph theory some commonly used types of treemaps and a classification of treemaps according to information visualization aspects is presented in the first chapter of this thesis. The second chapter of this thesis provides several methods to improve treemap constructions. In certain applications the researcher wants to know, how the entities in a hierarchical structure are related to each other (such as enzymes in a metabolic pathway). Therefore in the 3 third chapter of this thesis, the focus is on the construction of a suitable layout overlaying an existing treemap. This gives rise to optimization problems on geometric graphs. In addition, from a practical point of view, options for enhancing the display of the computed layout are explored to help the user perform typical tasks in this context more effciently. One important aspect of the problems on geometric graphs considered in the third chapter of the thesis is that crossings of edges in a network structure are to be minimized while certain other properties such as connectedness are maintained. Motivated by this, in the fourth chapter of this thesis, related combinatorial and computational problems are explored from a more theoretical point of view. In particular some light is shed on properties of crossing-free spanning trees in geometric graphs.

This thesis deals with thickness optimization of shells. The overall task is to find an optimal thickness distribution in order to minimize the deformation of a loaded shell with prescribed volume. In addition, lower and upper bounds for the thickness are given. The shell is made of elastic, isotropic, homogeneous material. The deformation is modeled using equations from Linear Elasticity. Here, a basic shell model based on the Reissner-Mindlin assumption is used. Both the stationary and the dynamic case are considered. The continuity and the GĂ˘teaux-differentiability of the control-to-state operator is investigated. These results are applied to the reduced objective with help of adjoint theory. In addition, techniques from shape optimization are compared to the optimal control approach. In the following, the theoretical results are applied to cylindrical shells and an efficient numerical implementation is presented. Finally, numerical results are shown and analyzed for different examples.

The constructions of LĂ©vy processes from convolution semigroups and of product systems from subproduct systems respectively, are formally quite similar. Since there are many more comparable situations in quantum stochastics, we formulate a general categorial concept (comonoidal systems), construct corresponding inductive systems and show under suitable assumptions general properties of the corresponding inductive limits. Comonoidal systems in different tensor categories play a role in all chapters of the thesis. Additive deformations are certain comonoidal systems of algebras. These are obtained by deformation of the algebra structure of a bialgebra. If the bialgebra is even a Hopf algebra, then compatibility with the antipode automatically follows. This remains true also in the case of braided Hopf algebras. Subproduct systems are comonoidal systems of Hilbert spaces. In the thesis we deal with the question, what are the possible dimensions of finite-dimensional subproduct systems. In discrete time, this can be reduced to the combinatorial problem of determining the complexities of factorial languages. We also discuss the rational and continuous time case. A further source for comonoidal systems are universal products, which are used in quantum probability to model independence. For the (r,s)-products, which were recently introduced by S. Lachs, we determine the corresponding product of representations by use of a generalized GNS-construction.

The geometric arena here is a smooth manifold of dimension n equipped with a Riemannian or pseudo-Riemannian metric and an affine connection. Field theories following from a variational principle are considered on this basis. In this context, all invariants which are quadratic in the curvature are determined. The work derives several manifestly covariant formulas for the Euler-Lagrange derivatives or the field equations. Some of these field theories can be interpreted as gravitational theories alternatively to EinsteinÂ´s general relativity theory. The work also touches the difficult problem to define and to calculate energy and momentum of a gravitational field.

We introduce a multi-step machine learning approach and use it to classify data from EEG-based brain computer interfaces. This approach works very well for high-dimensional EEG data. First all features are divided into subgroups and linear discriminant analysis is used to obtain a score for each subgroup. Then it is applied to subgroups of the resulting scores. This procedure is iterated until there is only one score remaining and this one is used for classification. In this way we avoid estimation of the high-dimensional covariance matrix of all features. We investigate the classifification performance with special attention to the small sample size case. For the normal model, we study the asymptotic error rate when dimension p and sample size n tend to infinity. This indicates how to defifine the sizes of subgroups at each step. In addition we present a theoretical error bound for the spatio-temporal normal model with separable covariance matrix, which results in a recommendation on how subgroups should be formed for this kind of data. Finally some techniques, for example wavelets and independent component analysis, are used to extract features of some kind of EEG-based brain computer interface data.

The history of Mathematics has been lead in part by the desire for generalization: once an object was given and had been understood, there was the desire to find a more general version of it, to fit it into a broader framework. Noncommutative Mathematics fits into this description, as its interests are objects analoguous to vector spaces, or probability spaces, etc., but without the commonsense interpretation that those latter objects possess. Indeed, a space can be described by its points, but also and equivalently, by the set of functions on this space. This set is actually a commutative algebra, sometimes equipped with some more structure: *-algebra, C*-algebra, von Neumann algebras, Hopf algebras, etc. The idea that lies at the basis of noncommutative Mathematics is to replace such algebras by algebras that are not necessarily commutative any more and to interpret them as "algebras of functions on noncommutative spaces". Of course, these spaces do not exist independently from their defining algebras, but facts show that a lot of the results holding in (classical) probability or (classical) group theory can be extended to their noncommutative counterparts, or find therein powerful analogues. The extensions of group theory into the realm of noncommutative Mathematics has long been studied and has yielded the various quantum groups. The easiest version of them, the compact quantum groups, consist of C*-algebras equipped with a *-homomorphism &Delta with values in the tensor product of the algebra with itself and verifying some coassociativity condition. It is also required that the compact quantum group verifies what is known as quantum cancellation property. It can be shown that (classical) compact groups are indeed a particular case of compact quantum groups. The area of compact quantum groups, and of quantum groups at large, is a fruitful area of research. Nevertheless, another generalization of group theory could be envisioned, namely by taking a comultiplication &Delta taking values not in the tensor product but rather in the free product (in the category of unital *-algebras). This leads to the theory of dual groups in the sense of Voiculescu, also called H-algebras by Zhang. These objects have not been so thoroughly studied as their quantum counterparts. It is true that they are not so flexible and that we therefore do not know many examples of them and showing that some relations cannot exist in the dual group case because they do not pass the coproduct. Nevertheless, I have been interested during a great part of my PhD work by these objects and I have made some progress towards their understanding, especially regarding quantum LĂ©vy processes defined on them and Haar states.