### Refine

#### Keywords

- Clusteranalyse (1)
- Mathematical Phylogenetic (1)
- Mathematical Phylogenetics (1)
- Mathematische Phylogenetik (1)
- Tollwut (1)
- clustering (1)
- phylogenetic (1)
- rabies (1)

Objektive Eingruppierung sequenzierter Tollwutisolate mithilfe des Affinity Propagation Clusterings.
(2018)

Das International Committee on Taxonomy of Viruses (ICTV) reguliert die Nomenklatur von Viren sowie die Entstehung neuer Taxa (dazu gehÃ¶ren: Ordnung, Familie, Unterfamilie, Gattung und Art/Spezies). Dank dieser Anstrengungen ist die Einteilung fÃ¼r verschiedenste Viren in diese Kategorien klar und transparent nachvollziehbar. In den vergangenen Jahrzehnten sind insgesamt mehr als 21.000 DatensÃ¤tze der Spezies â€žrabies lyssavirusâ€œ (RABV) sequenziert worden. Eine weiterfÃ¼hrende Unterteilung der sequenzierten Virusisolate dieser Spezies ist bislang jedoch nicht einheitlich vorgeschlagen. Die groÃŸe Anzahl an sequenzierten Isolaten fÃ¼hrte auf Basis von phylogenetischen BÃ¤umen zu uneindeutigen Ergebnissen bei der Einteilung in Cluster. Inhalt meiner Dissertation ist daher ein Vorschlag, diese Problematik mit der Anwendung einer partitionierenden Clusteringmethode zu lÃ¶sen. Dazu habe ich erstmals die Methodik des affinity propagation clustering (AP) fÃ¼r solche Fragestellungen eingesetzt. Als Datensatz wurden alle verfÃ¼gbaren sequenzierten Vollgenomisolate der Spezies RABV analysiert. Die Analysen des Datensatzes ergaben vier Hauptcluster, die sich geographisch separieren lieÃŸen und entsprechend als â€žArcticâ€œ, â€žCosmopolitainâ€œ, â€žAsianâ€œ und â€žNew Worldâ€œ bezeichnet wurden. WeiterfÃ¼hrende Analysen erlaubten auch eine weitere Aufteilung dieser Hauptcluster in 12-13 Untercluster. ZusÃ¤tzlich konnte ich einen Workflow generieren, der die MÃ¶glichkeit bietet, die mittels AP definierten Cluster mit den Ergebnissen der phylogenetischen Auswertungen zu kombinieren. Somit lassen sich sowohl VerwandtschaftsverhÃ¤ltnisse erkennen als auch eine objektive Clustereinteilung vornehmen. Dies kÃ¶nnte auch ein mÃ¶glicher Analyseweg fÃ¼r weitere Virusspezies oder andere vergleichende Sequenzanalysen sein.

In phylogenetics, evolutionary relationships of different species are represented by phylogenetic trees.
In this thesis, we are mainly concerned with the reconstruction of ancestral sequences and the accuracy of this reconstruction given a rooted binary phylogenetic tree.
For example, we wish to estimate the DNA sequences of the ancestors given the observed DNA sequences of today living species.
In particular, we are interested in reconstructing the DNA sequence of the last common ancestor of all species under consideration. Note that this last common ancestor corresponds to the root of the tree.
There exist various methods for the reconstruction of ancestral sequences.
A widely used principle for ancestral sequence reconstruction is the principle of parsimony (Maximum Parsimony).
This principle means that the simplest explanation it the best.
Applied to the reconstruction of ancestral sequences this means that a sequence which requires the fewest evolutionary changes along the tree is reconstructed.
Thus, the number of changes is minimized, which explains the name of Maximum Parsimony.
Instead of estimating a whole DNA sequence, Maximum Parsimony considers each position in the sequence separately. Thus in the following, each sequence position is regarded separately, and we call a single position in a sequence state.
It can happen that the state of the last common ancestor is reconstructed unambiguously, for example as A. On the other hand, Maximum Parsimony might be indecisive between two DNA nucleotides, say for example A and C.
In this case, the last common ancestor will be reconstructed as {A,C}.
Therefore we consider, after an introduction and some preliminary definitions, the following question in Section 3: how many present-day species need to be in a certain state, for example A, such that the Maximum Parsimony estimate of the last common ancestor is also {A}?
The answer of this question depends on the tree topology as well as on the number of different states.
In Section 4, we provide a sufficient condition for Maximum Parsimony to recover the ancestral state at the root correctly from the observed states at the leaves.
The so-called reconstruction accuracy for the reconstruction of ancestral states is introduced in Section 5. The reconstruction accuracy is the probability that the true root state is indeed reconstructed and always takes two processes into account: on the one hand the approach to reconstruct ancestral states, and on the other hand the way how the states evolve along the edges of the tree. The latter is given by an evolutionary model.
In the present thesis, we focus on a simple symmetric model, the Neyman model.
The symmetry of the model means for example that a change from A to C is equally likely than a change from C to A.
Intuitively, one could expect that the reconstruction accuracy it the highest when all present-day species are taken into account. However, it has long been known that the reconstruction accuracy improves when some taxa are disregarded for the estimation.
Therefore, the question if there exits at least a lower bound for the reconstruction accuracy arises, i.e. if it is best to consider all today living species instead of just one for the reconstruction.
This is bad news for Maximum Parsimony as a criterion for ancestral state reconstruction, and therefore the question if there exists at least a lower bound for the reconstruction accuracy arises.
In Section 5, we start with considering ultrametric trees, which are trees where the expected number of substitutions from the root to each leaf is the same.
For such trees, we investigate a lower bound for the reconstruction accuracy, when the number of different states at the leaves of the tree is 3 or 4.
Subsequently in Section 6, in order to generalize this result, we introduce a new method for ancestral state reconstruction: the coin-toss method.
We obtain new results for the reconstruction accuracy of Maximum Parsimony by relating Maximum Parsimony to the coin-toss method.
Some of these results do not require the underlying tree to be ultrametric.
Then, in Section 7 we investigate the influence of specific tree topologies on the reconstruction accuracy of Maximum Parsimony. In particular, we consider balanced and imbalanced trees as the balance of a tree may have an influence on the reconstruction accuracy.
We end by introducing the Colless index in Section 8, an index which measures the degree of balance a rooted binary tree can have, and analyze its extremal properties.

Mathematical phylogenetics provides the theoretical framework for the reconstruction and analysis of phylogenetic trees and networks. The underlying theory is based on various mathematical disciplines, ranging from graph theory to probability theory.
In this thesis, we take a mostly combinatorial and graph-theoretical position and study different problems concerning phylogenetic trees and networks.
We start by considering phylogenetic diversity indices that rank species for conservation. Two such indices for rooted trees are the Fair Proportion index and the Equal Splits index, and we analyze how different they can be from each other and under which circumstances they coincide. Moreover, we define and investigate analogues of these indices for unrooted trees.
Subsequently, we study the Shapley value of unrooted trees, another popular phylogenetic diversity index. We show that it may fail as a prioritization criterion in biodiversity conservation and is outcompeted by an existing greedy approach. Afterwards, we leave the biodiversity setting and consider the Shapley value as a tree reconstruction tool. Here, we show that non-isomorphic trees may have permutation-equivalent Shapley transformation matrices and identical Shapley values, implying that the Shapley value cannot reliably be employed in tree reconstruction.
In addition to phylogenetic diversity indices, another class of indices frequently discussed in mathematical phylogenetics, is the class of balance indices. In this thesis, we study one of the oldest and most popular of them, namely the Colless index for rooted binary trees. We focus on its extremal values and analyze both its maximum and minimum values as well as the trees that achieve them.
Having analyzed various questions regarding phylogenetic trees, we finally turn to phylogenetic networks. We focus on a certain class of phylogenetic networks, namely tree-based networks, and consider this class both in a rooted and in an unrooted setting.
First, we prove the existence of a rooted non-binary universal tree-based network with n leaves for all positive integers n, that is, we show that there exists a rooted non-binary tree-based network with $n$ leaves that has every non-binary phylogenetic tree on the same leaf set as a base tree.
Finally, we study unrooted tree-based networks and introduce a class of networks that are necessarily tree-based, namely edge-based networks. We show that edge-based networks are closely related to a family of graphs in classical graph theory, so-called generalized series-parallel graphs, and explore this relationship in full detail.
In summary, we add new insights into existing concepts in mathematical phylogenetics, answer open questions in the literature, and introduce new concepts and approaches. In doing so, we make a small but relevant contribution to current research in mathematical phylogenetics.