Katharina T. Huber’s research while affiliated with University of East Anglia and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (140)


Squirrel : Reconstructing Semi-directed Phylogenetic Level-1 Networks from Four-Leaved Networks or Sequence Alignments
  • Article

March 2025

·

1 Read

·

2 Citations

Molecular Biology and Evolution

Niels Holtgrefe

·

Katharina T Huber

·

·

[...]

·

Vincent Moulton

With the increasing availability of genomic data, biologists aim to find more accurate descriptions of evolutionary histories influenced by secondary contact, where diverging lineages reconnect before diverging again. Such reticulate evolutionary events can be more accurately represented in phylogenetic networks than in phylogenetic trees. Since the root location of phylogenetic networks can not be inferred from biological data under several evolutionary models, we consider semi-directed (phylogenetic) networks: partially directed graphs without a root in which the directed edges represent reticulate evolutionary events. By specifying a known outgroup, the rooted topology can be recovered from such networks. We introduce the algorithm Squirrel (Semi-directed Quarnet-based Inference to Reconstruct Level-1 Networks) which constructs a semi-directed level-1 network from a full set of quarnets (four-leaf semi-directed networks). Our method also includes a heuristic to construct such a quarnet set directly from sequence alignments. We demonstrate Squirrel's performance through simulations and on real sequence data sets, the largest of which contains 29 aligned sequences close to 1.7 Mbp long. The resulting networks are obtained on a standard laptop within a few minutes. Lastly, we prove that Squirrel is combinatorially consistent: given a full set of quarnets coming from a triangle-free semi-directed level-1 network, it is guaranteed to reconstruct the original network. Squirrel is implemented in Python, has an easy-to-use graphical user-interface that takes sequence alignments or quarnets as input, and is freely available at https://github.com/nholtgrefe/squirrel


Figure 1: (a) A subtree representation of a distance d on X = {v, w, x, y, z}. The subtrees representing w, x and y each consist of a single leaf of the underlying edge-weighted tree. The subtree representing v consists of two edges and the subtree representing z consists of three edges (only edge-weights ̸ = 1 are shown). (b) The table of the distances between the subtrees in (a). We have d(v, z) = 0 and the triangle inequality is violated since d(x, y) > d(x, z) + d(z, y).
Subtree Distances, Tight Spans and Diversities
  • Preprint
  • File available

January 2025

·

14 Reads

Metric embeddings are central to metric theory and its applications. Here we consider embeddings of a different sort: maps from a set to subsets of a metric space so that distances between points are approximated by minimal distances between subsets. Our main result is a characterization of when a set of distances d(x,y) between elements in a set X have a subtree representation, a real tree T and a collection {Sx}xX\{S_x\}_{x \in X} of subtrees of~T such that d(x,y) equals the length of the shortest path in~T from a point in SxS_x to a point in SyS_y for all x,yXx,y \in X. The characterization was first established for {\em finite} X by Hirai (2006) using a tight span construction defined for distance spaces, metric spaces without the triangle inequality. To extend Hirai's result beyond finite X we establish fundamental results of tight span theory for general distance spaces, including the surprising observation that the tight span of a distance space is hyperconvex. We apply the results to obtain the first characterization of when a diversity -- a generalization of a metric space which assigns values to all finite subsets of X, not just to pairs -- has a tight span which is tree-like.

Download

Phylogenetic Trees Defined by at Most Three Characters

November 2024

·

5 Reads

·

1 Citation

The Electronic Journal of Combinatorics

In evolutionary biology, phylogenetic trees are commonly inferred from a set of characters (partitions) of a collection of biological entities (e.g., species or individuals in a population). Such characters naturally arise from molecular sequences or morphological data. Interestingly, it has been known for some time that any binary phylogenetic tree can be (convexly) defined by a set of at most four characters, and that there are binary phylogenetic trees for which three characters are not enough. Thus, it is of interest to characterise those phylogenetic trees that are defined by a set of at most three characters. In this paper, we provide such a characterisation, in particular proving that a binary phylogenetic tree T is defined by a set of at most three dcharacters precisely if T has no internal subtree isomorphic to a certain tree.


Figure 6: (a): Phylogenetic tree inferred by SQUIRREL (using the δ-heuristic to create tf-quarnets) from a multiple sequence alignment of the primate data set under consideration, with the edges leading to the outgroup Mus musculus in grey. The different shaded areas indicate different taxonomical groups as they appear in Vanderpool et al. 2020. The two non-primate species are Tupaia Chinensis and Galeopterus variegatus. (b): In conjunction with the δ-heuristic to create tf-quarnets, SQUIRREL inferred two networks with very close weighted tf-quarnet consistency scores from the considered multiple sequence alignment of the subfamily of Cercopithecinae (using Colobus angolensis palliatus as outgroup). One of them is the depicted network and the other is the phylogenetic tree obtained from that network by ignoring the curved reticulation edge.
Figure 8: (a): A blobtree on some leaf set X with an internal vertex v inducing the partition Y 1 | . . . |Y s of X . (b): Illustration of the mapping f which maps every leaf x of X to a leaf in { y 1 , . . . , y s }, depending on which set Y i contains x. (c): Illustration of Step B2 and B3 of SQUIRREL, where the single internal vertex is replaced by a cycle. (d): Illustration of how the cycle on the leaves y i is mapped back to a cycle on the sets Y i with the inverse function f −1 .
Figure 9: Two tf-quarnets q with leaf set {a, b, c, d}: a quartet tree (a) and 4-cycle (b). The values τ q (as defined by eq. (6), assuming the quarnets have weight 1) between any two leaves are illustrated by the two complete graphs, where the thin grey edges have length 1 and the thick black length 2.
Squirrel: Reconstructing semi-directed phylogenetic level-1 networks from four-leaved networks or sequence alignments

November 2024

·

36 Reads

·

1 Citation

Phylogenetic networks model the evolutionary history of taxa while allowing for reticulate events such as hybridization and horizontal gene transfer. As is the case for phylogenetic trees, it is often not possible to infer the root location of such a network directly from biological data for several evolutionary models. Hence, we consider semi-directed (phylogenetic) networks: partially directed graphs without a root in which the directed edges represent reticulate evolutionary events. By specifying a known outgroup, the rooted topology can be recovered from such networks. We introduce the algorithm Squirrel (Semi-directed Quarnet-based Inference to Reconstruct Level-1 Networks) which constructs a semi-directed level-1 network from a full set of quarnets (four-leaf semi-directed networks). Our method also includes a heuristic to construct such a quarnet set directly from sequence alignments. To build a network from quarnets, Squirrel first builds a tree, after which it repeatedly solves the Travelling Salesman Problem (TSP) to replace each high-degree vertex by a cycle. We demonstrate Squirrel's performance on randomly generated networks and on real sequence data sets, the largest of which contains 29 aligned sequences close to 1.7 Mpb long. The resulting networks are obtained on a standard laptop within a few minutes. Lastly, we prove that Squirrel is combinatorially consistent: given a full set of quarnets coming from a triangle-free semi-directed level-1 network, it is guaranteed to reconstruct the original network. Squirrel is implemented in Python, has an easy-to-use graphical user-interface that takes sequence alignments or quarnets as input, and is freely available at https://github.com/nholtgrefe/squirrel.



Encoding Semi-Directed Phylogenetic Networks with Quarnets

August 2024

·

7 Reads

Phylogenetic networks are graphs that are used to represent evolutionary relationships between different taxa. They generalize phylogenetic trees since for example, unlike trees, they permit lineages to combine. Recently, there has been rising interest in semi-directed phylogenetic networks, which are mixed graphs in which certain lineage combination events are represented by directed edges coming together, whereas the remaining edges are left undirected. One reason to consider such networks is that it can be difficult to root a network using real data. In this paper, we consider the problem of when a semi-directed phylogenetic network is defined or encoded by the smaller networks that it induces on the 4-leaf subsets of its leaf set. These smaller networks are called quarnets. We prove that semi-directed binary level-2 phylogenetic networks are encoded by their quarnets, but that this is not the case for level-3. In addition, we prove that the so-called blob tree of a semi-directed binary network, a tree that gives the coarse-grained structure of the network, is always encoded by the quarnets of the network.



A distance-based model for convergent evolution

January 2024

·

31 Reads

Journal of Mathematical Biology

Convergent evolution is an important process in which independent species evolve similar features usually over a long period of time. It occurs with many different species across the tree of life, and is often caused by the fact that species have to adapt to similar environmental niches. In this paper, we introduce and study properties of a distance-based model for convergent evolution in which we assume that two ancestral species converge for a certain period of time within a collection of species that have otherwise evolved according to an evolutionary clock. Under these assumptions it follows that we obtain a distance on the collection that is a modification of an ultrametric distance arising from an equidistant phylogenetic tree. As well as characterising when this modified distance is a tree metric, we give conditions in terms of the model’s parameters for when it is still possible to recover the underlying tree and also its height, even in case the modified distance is not a tree metric.



FIGURE 1. A proper forest-based network N on ten leaves. Each of the three phylogenetic trees in the underlying forest represents a hypothetical butterfly lineage with main wing pattern indicated next to the root of the tree. The network N is the result of adding dashed arcs in between pairs of trees in the forest. Each added arc corresponds to some genetic material being introduced into a lineage from one of the others, which results in a wing pattern change for the descendants.
FIGURE 2. For m = 2, the m-network N for X = {a, b, c, d, e} and C = {{a, b, c}, {a, c, e}, {b, c, d}}, as described in the proof of Theorem 4.1. Since m = 2, there are no Gen0 vertices. GenI, GenII, and GenIII vertices are indicated as vertices in a band labelled Gen I, GenII, and GenIII, respectively. For clarity purposes, we have indicated the reticulation with the set in C its three parents correspond to and not the parents themselves. A 2-coloring σ : V (N) → {•, •} associated to the solution A = {a, b}, B = {c, d, e} of the SET-SPLITTING problem for (X, C ).
FIGURE 3. The 3-network N constructed from the graph G with vertex set X = {a, b, c, d, e} and edge set E(G) = {{a, b}, {a, c}, {b, c}, {b, d}, {c, d}, {d, e}}, as described in the proof of Proposition 5.1. GenI, GenII and GenIII vertices are indicated as described in Figure 2. A 3-coloring σ : V (N) → {•, •, ×} associated to the proper 3-coloring κ of G given by κ(a) = κ(d) = •, κ(b) = κ(e) = • and κ(c) = ×.
Is this network proper forest-based?

August 2023

·

36 Reads

In evolutionary biology, networks are becoming increasingly used to represent evolutionary histories for species that have undergone non-treelike or reticulate evolution. Such networks are essentially directed acyclic graphs with a leaf set that corresponds to a collection of species, and in which non-leaf vertices with indegree 1 correspond to speciation events and vertices with indegree greater than 1 correspond to reticulate events such as gene transfer. Recently forest-based networks have been introduced, which are essentially (multi-rooted) networks that can be formed by adding some arcs to a collection of phylogenetic trees (or phylogenetic forest), where each arc is added in such a way that its ends always lie in two different trees in the forest. In this paper, we consider the complexity of deciding whether or not a given network is proper forest-based, that is, whether it can be formed by adding arcs to some underlying phylogenetic forest which contains the same number of trees as there are roots in the network. More specifically, we show that it can be decided in polynomial time whether or not a binary, tree-child network with m2m \ge 2 roots is proper forest-based in case m=2, but that this problem is NP-complete for m3m\ge 3. We also give a fixed parameter tractable (FPT) algorithm for deciding whether or not a network in which every vertex has indegree at most 2 is proper forest-based. A key element in proving our results is a new characterization for when a network with m roots is proper forest-based which is given in terms of the existence of certain m-colorings of the vertices of the network.


Citations (57)


... However, if no restriction is placed on the size of the sets S i , then N (T ) turns out to be independent of |X |; in fact N (T ) ≤ 4 (Huber et al. 2005). A recent paper (Huber et al. 2023) exactly characterised the set of binary trees T for which N (T ) = 4: they are precisely the trees that contain a 'snowflake' (defined shortly). The authors of Huber et al. (2023) then posed the problem of determining the asymptotic proportion of binary trees that contain a snowflake as |X | → ∞. ...

Reference:

0-1 Laws for Pattern Occurrences in Phylogenetic Trees and Networks
Phylogenetic Trees Defined by at Most Three Characters
  • Citing Article
  • November 2024

The Electronic Journal of Combinatorics

... To start filling this gap, multiple-rooted phylogenetic networks such as the one depicted in Figure 1(i) have been introduced and studied in the literature in the form of Overlaid Species Forests [20], forest-based networks [21,24,27] and arboreal networks [22]. Although distinct combinatorial objects all of whose set of leaves is a pre-given set X of organisms, they all can be thought of as a forest S of rooted phylogenetic trees (each representing, for example, the evolutionary past of a set of bacteria inhabiting an ecological niche) along with a set A of additional arcs (each representing, for example, a horizontal gene transfer event) such that each arc in A joins two distinct trees in S. ...

Shared Ancestry Graphs and Symbolic Arboreal Maps
  • Citing Article
  • October 2024

SIAM Journal on Discrete Mathematics

... To start filling this gap, multiple-rooted phylogenetic networks such as the one depicted in Figure 1(i) have been introduced and studied in the literature in the form of Overlaid Species Forests [20], forest-based networks [21,24,27] and arboreal networks [22]. Although distinct combinatorial objects all of whose set of leaves is a pre-given set X of organisms, they all can be thought of as a forest S of rooted phylogenetic trees (each representing, for example, the evolutionary past of a set of bacteria inhabiting an ecological niche) along with a set A of additional arcs (each representing, for example, a horizontal gene transfer event) such that each arc in A joins two distinct trees in S. ...

Is this network proper forest-based?
  • Citing Article
  • May 2024

Information Processing Letters

... Note that, from a conceptual level, such trees are unrooted phylogenetic trees T along with a symbolic map from the vertex set of T into a set of symbols which, in our case, are • and • (see e.g. [4,16,19,23] and [30,Section 7.6] for other cases where such maps have been used in phylogenetics). Throughout the paper, we will refer to them as augmented trees. ...

Orienting undirected phylogenetic networks
  • Citing Article
  • October 2023

Journal of Computer and System Sciences

... Two things are important to notice at this point. First, since networks are not uniquely determined by the trees they contain [33], there may exist a large number of different optimal solutions, and our algorithm does not attempt to enumerate them all: in fact, how to summarize all equally good networks is still an open practical problem [20,16]. In particular, no method to solve Hybridization (whose goal is to minimize the number of reticulations only) can guarantee to reconstruct a specific network: all networks that display the input trees with the minimum possible number of reticulation nodes are optimal solutions. ...

Computing consensus networks for collections of 1-nested phylogenetic networks

Journal of Graph Algorithms and Applications

... More specifically, we show that this space is a so-called CAT(0)-orthant space, which implies that the distance between any two ETCNs can be computed efficiently. Note that Billera et al. (2001) presented a similar approach to compare unrooted edge-weighted phylogenetic trees, and that our space of ETCNs generalizes the more recently introduced spaces of ultrametric trees (Gavryushkin and Drummond, 2016) and equidistant cactuses (Huber et al., 2024). ...

The Space of Equidistant Phylogenetic Cactuses

Annals of Combinatorics

... Figure 5. A spinal network with cover 1, 3 | 5 | 2, 6 | 5, 7 | 4, 6, 8 | 7, 9. Note that n = 4 and the cover has one set in [4], two in [5], three in [6], four in [7], five in [8], and six in [9]. There is a path from the elements of the set that is in [4], namely 1 and 3, to the root, that traverses every non-leaf vertex. ...

Encoding and ordering X-cactuses
  • Citing Article
  • January 2023

Advances in Applied Mathematics

... Studying combinatorial and model theoretic properties of metric and ultrametric spaces has a great tradition (see, e.g., Section 6.4 of [13], [14,15], and the references therein). In this section, we investigate model theoretic stability of certain ultrametric spaces. ...

Optimal realizations and the block decomposition of a finite metric space
  • Citing Article
  • October 2021

Discrete Applied Mathematics

... Again there are different variants: rooted and unrooted, binary and non-binary, a fixed or unbounded number of input trees. For some applications, the definition of an embedding has to be relaxed (allowing, for example, multiple tree arcs embedded into the same network arc) [HMSW16,HLM21]. Other interesting candidate problems for treewidth-based algorithms include phylogenetic network drawing [KS20], orienting phylogenetic networks [HvIJ + 19] and phylogenetic tree inference with duplications [vIJJ + 19]. ...

The rigid hybrid number for two phylogenetic trees

Journal of Mathematical Biology