- AutorIn
- Manuela Geiß
- Titel
- From Best Match Graphs to Gene Trees
- Untertitel
- A new perspective on graph-based orthology inference
- Zitierfähige Url:
- https://nbn-resolving.org/urn:nbn:de:bsz:15-qucosa2-360324
- Datum der Einreichung
- 26.07.2019
- Datum der Verteidigung
- 29.10.2019
- Abstract (EN)
- Orthology detection is an important task within the context of genome an- notation, gene nomenclature, and the understanding of gene evolution. With the rapidly accelerating pace at which new genomes become available, highly efficient methods are urgently required. As demonstrated in a large body of literature, reciprocal best match (RBH) methods are reasonably accurate and scale to large data sets. Nevertheless, they are far from perfect and prone to both, false positive and false negative, orthology calls. This work gives a complete characterization of best match as well as reciprocal best match graphs (BMGs and RBMGs) that arise at the first step of RBH methods. While BMGs as well as RBMGs with at most three species can be recognized in polynomial time, RBMGs with more than three species have a surprisingly complicated structure and it remains an open problem whether there exist polynomial time algorithms for the recognition of these RBMGs. In contrast to RBMGs, for which many (often mutually inconsistent) least re- solved trees may exist, there is a unique least resolved tree for BMGs. This tree is a homeomorphic image of the true, but typically unknown, gene tree. Furthermore, in the absence of horizontal gene transfer (HGT), the reciprocal best match graph contains the orthology relation suggesting that RBMGs can only contain false positive but no false negative orthology assignments. Simu- lation scenarios reveal that so-called good quartets, a certain graph pattern on four vertices in BMGs, can be used to successfully identify almost all false pos- itive edges in RBMGs. Together with the existence of a unique least resolved tree, this suggests that BMGs contain a lot of valuable information for orthol- ogy inference that would be lost by exclusively considering RBMGs. These insights motivate to include additional BMG and RBMG editing steps in or- thology detection pipelines based on the presented theoretical insights. Moreover, a workflow is introduced to infer best matches from sequence data by retrieving quartet structures from local information instead of reconstructing the whole gene tree. A crucial prerequisite for this pipeline is the choice of suitable outgroups. However, the empirical simulations also reveal that HGT events cause strong deviations of the orthology relation from the RBMG as well as good quartets that are no longer associated with false positive orthologs, suggesting the need for further investigation of the xenology relation. The directed Fitch’s xenology relation is characterized in terms of forbidden 3-vertex subgraphs and moreover, a polynomial time algorithm for the recog- nition and the reconstruction of a unique least resolved tree is presented. The undirected Fitch relation, in contrast, is shown to be a complete multipartite graph, which does not provide any interesting phylogenetic information. In summary, the results of this work can be used to develop new methods for inferring orthology, paralogy, and HGT. They promise major improvements in the accuracy and the computational performance of RBH-based approaches.
- Freie Schlagwörter (EN)
- phylogenetics, orthology detection, graph theory, best matches
- Klassifikation (DDC)
- 000
- Den akademischen Grad verleihende / prüfende Institution
- Universität Leipzig, Leipzig
- Version / Begutachtungsstatus
- angenommene Version / Postprint / Autorenversion
- URN Qucosa
- urn:nbn:de:bsz:15-qucosa2-360324
- Veröffentlichungsdatum Qucosa
- 11.11.2019
- Dokumenttyp
- Dissertation
- Sprache des Dokumentes
- Englisch