Katharina T. Huber’s research while affiliated with University of East Anglia and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (141)


Cherry picking in forests: A new characterization for the unrooted hybrid number of two phylogenetic trees
  • Article

May 2025

·

5 Reads

Discrete Mathematics & Theoretical Computer Science

Katharina T. Huber

·

Simone Linz

·

Vincent Moulton

Phylogenetic networks are a special type of graph which generalize phylogenetic trees and that are used to model non-treelike evolutionary processes such as recombination and hybridization. In this paper, we consider {\em unrooted} phylogenetic networks, i.e. simple, connected graphs N=(V,E)\mathcal{N}=(V,E) with leaf set X, for X some set of species, in which every internal vertex in N\mathcal{N} has degree three. One approach used to construct such phylogenetic networks is to take as input a collection P\mathcal{P} of phylogenetic trees and to look for a network N\mathcal{N} that contains each tree in P\mathcal{P} and that minimizes the quantity r(N)=E(V1)r(\mathcal{N}) = |E|-(|V|-1) over all such networks. Such a network always exists, and the quantity r(N)r(\mathcal{N}) for an optimal network N\mathcal{N} is called the hybrid number of P\mathcal{P}. In this paper, we give a new characterization for the hybrid number in case P\mathcal{P} consists of two trees. This characterization is given in terms of a cherry picking sequence for the two trees, although to prove that our characterization holds we need to define the sequence more generally for two forests. Cherry picking sequences have been intensively studied for collections of rooted phylogenetic trees, but our new sequences are the first variant of this concept that can be applied in the unrooted setting. Since the hybrid number of two trees is equal to the well-known tree bisection and reconnection distance between the two trees, our new characterization also provides an alternative way to understand this important tree distance.


Squirrel : Reconstructing Semi-directed Phylogenetic Level-1 Networks from Four-Leaved Networks or Sequence Alignments

March 2025

·

2 Reads

·

4 Citations

Molecular Biology and Evolution

Niels Holtgrefe

·

Katharina T Huber

·

·

[...]

·

Vincent Moulton

With the increasing availability of genomic data, biologists aim to find more accurate descriptions of evolutionary histories influenced by secondary contact, where diverging lineages reconnect before diverging again. Such reticulate evolutionary events can be more accurately represented in phylogenetic networks than in phylogenetic trees. Since the root location of phylogenetic networks can not be inferred from biological data under several evolutionary models, we consider semi-directed (phylogenetic) networks: partially directed graphs without a root in which the directed edges represent reticulate evolutionary events. By specifying a known outgroup, the rooted topology can be recovered from such networks. We introduce the algorithm Squirrel (Semi-directed Quarnet-based Inference to Reconstruct Level-1 Networks) which constructs a semi-directed level-1 network from a full set of quarnets (four-leaf semi-directed networks). Our method also includes a heuristic to construct such a quarnet set directly from sequence alignments. We demonstrate Squirrel's performance through simulations and on real sequence data sets, the largest of which contains 29 aligned sequences close to 1.7 Mbp long. The resulting networks are obtained on a standard laptop within a few minutes. Lastly, we prove that Squirrel is combinatorially consistent: given a full set of quarnets coming from a triangle-free semi-directed level-1 network, it is guaranteed to reconstruct the original network. Squirrel is implemented in Python, has an easy-to-use graphical user-interface that takes sequence alignments or quarnets as input, and is freely available at https://github.com/nholtgrefe/squirrel


Figure 1: (a) A subtree representation of a distance d on X = {v, w, x, y, z}. The subtrees representing w, x and y each consist of a single leaf of the underlying edge-weighted tree. The subtree representing v consists of two edges and the subtree representing z consists of three edges (only edge-weights ̸ = 1 are shown). (b) The table of the distances between the subtrees in (a). We have d(v, z) = 0 and the triangle inequality is violated since d(x, y) > d(x, z) + d(z, y).
Subtree Distances, Tight Spans and Diversities
  • Preprint
  • File available

January 2025

·

14 Reads

Metric embeddings are central to metric theory and its applications. Here we consider embeddings of a different sort: maps from a set to subsets of a metric space so that distances between points are approximated by minimal distances between subsets. Our main result is a characterization of when a set of distances d(x,y) between elements in a set X have a subtree representation, a real tree T and a collection {Sx}xX\{S_x\}_{x \in X} of subtrees of~T such that d(x,y) equals the length of the shortest path in~T from a point in SxS_x to a point in SyS_y for all x,yXx,y \in X. The characterization was first established for {\em finite} X by Hirai (2006) using a tight span construction defined for distance spaces, metric spaces without the triangle inequality. To extend Hirai's result beyond finite X we establish fundamental results of tight span theory for general distance spaces, including the surprising observation that the tight span of a distance space is hyperconvex. We apply the results to obtain the first characterization of when a diversity -- a generalization of a metric space which assigns values to all finite subsets of X, not just to pairs -- has a tight span which is tree-like.

Download

Phylogenetic Trees Defined by at Most Three Characters

November 2024

·

5 Reads

·

1 Citation

The Electronic Journal of Combinatorics

In evolutionary biology, phylogenetic trees are commonly inferred from a set of characters (partitions) of a collection of biological entities (e.g., species or individuals in a population). Such characters naturally arise from molecular sequences or morphological data. Interestingly, it has been known for some time that any binary phylogenetic tree can be (convexly) defined by a set of at most four characters, and that there are binary phylogenetic trees for which three characters are not enough. Thus, it is of interest to characterise those phylogenetic trees that are defined by a set of at most three characters. In this paper, we provide such a characterisation, in particular proving that a binary phylogenetic tree T is defined by a set of at most three dcharacters precisely if T has no internal subtree isomorphic to a certain tree.


Figure 6: (a): Phylogenetic tree inferred by SQUIRREL (using the δ-heuristic to create tf-quarnets) from a multiple sequence alignment of the primate data set under consideration, with the edges leading to the outgroup Mus musculus in grey. The different shaded areas indicate different taxonomical groups as they appear in Vanderpool et al. 2020. The two non-primate species are Tupaia Chinensis and Galeopterus variegatus. (b): In conjunction with the δ-heuristic to create tf-quarnets, SQUIRREL inferred two networks with very close weighted tf-quarnet consistency scores from the considered multiple sequence alignment of the subfamily of Cercopithecinae (using Colobus angolensis palliatus as outgroup). One of them is the depicted network and the other is the phylogenetic tree obtained from that network by ignoring the curved reticulation edge.
Figure 8: (a): A blobtree on some leaf set X with an internal vertex v inducing the partition Y 1 | . . . |Y s of X . (b): Illustration of the mapping f which maps every leaf x of X to a leaf in { y 1 , . . . , y s }, depending on which set Y i contains x. (c): Illustration of Step B2 and B3 of SQUIRREL, where the single internal vertex is replaced by a cycle. (d): Illustration of how the cycle on the leaves y i is mapped back to a cycle on the sets Y i with the inverse function f −1 .
Figure 9: Two tf-quarnets q with leaf set {a, b, c, d}: a quartet tree (a) and 4-cycle (b). The values τ q (as defined by eq. (6), assuming the quarnets have weight 1) between any two leaves are illustrated by the two complete graphs, where the thin grey edges have length 1 and the thick black length 2.
Squirrel: Reconstructing semi-directed phylogenetic level-1 networks from four-leaved networks or sequence alignments

November 2024

·

36 Reads

·

1 Citation

Phylogenetic networks model the evolutionary history of taxa while allowing for reticulate events such as hybridization and horizontal gene transfer. As is the case for phylogenetic trees, it is often not possible to infer the root location of such a network directly from biological data for several evolutionary models. Hence, we consider semi-directed (phylogenetic) networks: partially directed graphs without a root in which the directed edges represent reticulate evolutionary events. By specifying a known outgroup, the rooted topology can be recovered from such networks. We introduce the algorithm Squirrel (Semi-directed Quarnet-based Inference to Reconstruct Level-1 Networks) which constructs a semi-directed level-1 network from a full set of quarnets (four-leaf semi-directed networks). Our method also includes a heuristic to construct such a quarnet set directly from sequence alignments. To build a network from quarnets, Squirrel first builds a tree, after which it repeatedly solves the Travelling Salesman Problem (TSP) to replace each high-degree vertex by a cycle. We demonstrate Squirrel's performance on randomly generated networks and on real sequence data sets, the largest of which contains 29 aligned sequences close to 1.7 Mpb long. The resulting networks are obtained on a standard laptop within a few minutes. Lastly, we prove that Squirrel is combinatorially consistent: given a full set of quarnets coming from a triangle-free semi-directed level-1 network, it is guaranteed to reconstruct the original network. Squirrel is implemented in Python, has an easy-to-use graphical user-interface that takes sequence alignments or quarnets as input, and is freely available at https://github.com/nholtgrefe/squirrel.



Encoding Semi-Directed Phylogenetic Networks with Quarnets

August 2024

·

7 Reads

Phylogenetic networks are graphs that are used to represent evolutionary relationships between different taxa. They generalize phylogenetic trees since for example, unlike trees, they permit lineages to combine. Recently, there has been rising interest in semi-directed phylogenetic networks, which are mixed graphs in which certain lineage combination events are represented by directed edges coming together, whereas the remaining edges are left undirected. One reason to consider such networks is that it can be difficult to root a network using real data. In this paper, we consider the problem of when a semi-directed phylogenetic network is defined or encoded by the smaller networks that it induces on the 4-leaf subsets of its leaf set. These smaller networks are called quarnets. We prove that semi-directed binary level-2 phylogenetic networks are encoded by their quarnets, but that this is not the case for level-3. In addition, we prove that the so-called blob tree of a semi-directed binary network, a tree that gives the coarse-grained structure of the network, is always encoded by the quarnets of the network.



A distance-based model for convergent evolution

January 2024

·

34 Reads

Journal of Mathematical Biology

Convergent evolution is an important process in which independent species evolve similar features usually over a long period of time. It occurs with many different species across the tree of life, and is often caused by the fact that species have to adapt to similar environmental niches. In this paper, we introduce and study properties of a distance-based model for convergent evolution in which we assume that two ancestral species converge for a certain period of time within a collection of species that have otherwise evolved according to an evolutionary clock. Under these assumptions it follows that we obtain a distance on the collection that is a modification of an ultrametric distance arising from an equidistant phylogenetic tree. As well as characterising when this modified distance is a tree metric, we give conditions in terms of the model’s parameters for when it is still possible to recover the underlying tree and also its height, even in case the modified distance is not a tree metric.



Citations (57)


... Our upper-bound results suggest that the networks generated by tools like SNAQ [22], PhyNEST [13], and Squirrel [9] with a constant network level can be efficiently decomposed into trees with small treewidth. As the level of generated networks grows [18], we may still expect the treewidth to stay low and treewidth-parametrized algorithms to be efficient on higher-level networks. ...

Reference:

Bounds on the Treewidth of Level-k Rooted Phylogenetic Networks
Squirrel : Reconstructing Semi-directed Phylogenetic Level-1 Networks from Four-Leaved Networks or Sequence Alignments
  • Citing Article
  • March 2025

Molecular Biology and Evolution

... However, if no restriction is placed on the size of the sets S i , then N (T ) turns out to be independent of |X |; in fact N (T ) ≤ 4 (Huber et al. 2005). A recent paper (Huber et al. 2023) exactly characterised the set of binary trees T for which N (T ) = 4: they are precisely the trees that contain a 'snowflake' (defined shortly). The authors of Huber et al. (2023) then posed the problem of determining the asymptotic proportion of binary trees that contain a snowflake as |X | → ∞. ...

Phylogenetic Trees Defined by at Most Three Characters
  • Citing Article
  • November 2024

The Electronic Journal of Combinatorics

... To start filling this gap, multiple-rooted phylogenetic networks such as the one depicted in Figure 1(i) have been introduced and studied in the literature in the form of Overlaid Species Forests [20], forest-based networks [21,24,27] and arboreal networks [22]. Although distinct combinatorial objects all of whose set of leaves is a pre-given set X of organisms, they all can be thought of as a forest S of rooted phylogenetic trees (each representing, for example, the evolutionary past of a set of bacteria inhabiting an ecological niche) along with a set A of additional arcs (each representing, for example, a horizontal gene transfer event) such that each arc in A joins two distinct trees in S. ...

Shared Ancestry Graphs and Symbolic Arboreal Maps
  • Citing Article
  • October 2024

SIAM Journal on Discrete Mathematics

... To start filling this gap, multiple-rooted phylogenetic networks such as the one depicted in Figure 1(i) have been introduced and studied in the literature in the form of Overlaid Species Forests [20], forest-based networks [21,24,27] and arboreal networks [22]. Although distinct combinatorial objects all of whose set of leaves is a pre-given set X of organisms, they all can be thought of as a forest S of rooted phylogenetic trees (each representing, for example, the evolutionary past of a set of bacteria inhabiting an ecological niche) along with a set A of additional arcs (each representing, for example, a horizontal gene transfer event) such that each arc in A joins two distinct trees in S. ...

Is this network proper forest-based?
  • Citing Article
  • May 2024

Information Processing Letters

... Note that, from a conceptual level, such trees are unrooted phylogenetic trees T along with a symbolic map from the vertex set of T into a set of symbols which, in our case, are • and • (see e.g. [4,16,19,23] and [30,Section 7.6] for other cases where such maps have been used in phylogenetics). Throughout the paper, we will refer to them as augmented trees. ...

Orienting undirected phylogenetic networks
  • Citing Article
  • October 2023

Journal of Computer and System Sciences

... Two things are important to notice at this point. First, since networks are not uniquely determined by the trees they contain [33], there may exist a large number of different optimal solutions, and our algorithm does not attempt to enumerate them all: in fact, how to summarize all equally good networks is still an open practical problem [20,16]. In particular, no method to solve Hybridization (whose goal is to minimize the number of reticulations only) can guarantee to reconstruct a specific network: all networks that display the input trees with the minimum possible number of reticulation nodes are optimal solutions. ...

Computing consensus networks for collections of 1-nested phylogenetic networks

Journal of Graph Algorithms and Applications

... More specifically, we show that this space is a so-called CAT(0)-orthant space, which implies that the distance between any two ETCNs can be computed efficiently. Note that Billera et al. (2001) presented a similar approach to compare unrooted edge-weighted phylogenetic trees, and that our space of ETCNs generalizes the more recently introduced spaces of ultrametric trees (Gavryushkin and Drummond, 2016) and equidistant cactuses (Huber et al., 2024). ...

The Space of Equidistant Phylogenetic Cactuses

Annals of Combinatorics

... Figure 5. A spinal network with cover 1, 3 | 5 | 2, 6 | 5, 7 | 4, 6, 8 | 7, 9. Note that n = 4 and the cover has one set in [4], two in [5], three in [6], four in [7], five in [8], and six in [9]. There is a path from the elements of the set that is in [4], namely 1 and 3, to the root, that traverses every non-leaf vertex. ...

Encoding and ordering X-cactuses
  • Citing Article
  • January 2023

Advances in Applied Mathematics

... Studying combinatorial and model theoretic properties of metric and ultrametric spaces has a great tradition (see, e.g., Section 6.4 of [13], [14,15], and the references therein). In this section, we investigate model theoretic stability of certain ultrametric spaces. ...

Optimal realizations and the block decomposition of a finite metric space
  • Citing Article
  • October 2021

Discrete Applied Mathematics