PreprintPDF Available

Finding agreement cherry-reduced subnetworks in level-1 networks

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Phylogenetic networks are increasingly being considered as better suited to represent the complexity of the evolutionary relationships between species. One class of phylogenetic networks that has received a lot of attention recently is the class of orchard networks, which is composed of networks that can be reduced to a single leaf using cherry reductions. Cherry reductions, also called cherry-picking operations, remove either a leaf of a simple cherry (sibling leaves sharing a parent) or a reticulate edge of a reticulate cherry (two leaves whose parents are connected by a reticulate edge). In this paper, we present a fixed-parameter tractable algorithm to solve the problem of finding a maximum agreement cherry-reduced subnetwork (MACRS) between two rooted binary level-1 networks. This is first exact algorithm proposed to solve the MACRS problem. As proven in earlier work, there is a direct relationship between finding an MACRS and calculating a distance based on cherry operations. As a result, the proposed algorithm also provides a distance that can be used for the comparison of level-1 networks.
Content may be subject to copyright.
arXiv:2305.00033v1 [cs.DS] 28 Apr 2023
Finding agreement cherry-reduced subnetworks
in level-1 networks
Kaari Landry*1, Olivier Tremblay-Savard1, and Manuel Lafond2
1University of Manitoba, Winnipeg MB, Canada * landryk1@cs.umanitoba.ca
2Universit´e de Sherbrooke, Sherbrooke QC, Canada
Abstract. Phylogenetic networks are increasingly being considered as
better suited to represent the complexity of the evolutionary relation-
ships between species. One class of phylogenetic networks that has re-
ceived a lot of attention recently is the class of orchard networks, which
is composed of networks that can be reduced to a single leaf using cherry
reductions. Cherry reductions, also called cherry-picking operations, re-
move either a leaf of a simple cherry (sibling leaves sharing a parent) or a
reticulate edge of a reticulate cherry (two leaves whose parents are con-
nected by a reticulate edge). In this paper, we present a fixed-parameter
tractable algorithm to solve the problem of finding a maximum agree-
ment cherry-reduced subnetwork (MACRS) between two rooted binary
level-1 networks. This is first exact algorithm proposed to solve the
MACRS problem. As proven in earlier work, there is a direct relationship
between finding an MACRS and calculating a distance based on cherry
operations. As a result, the proposed algorithm also provides a distance
that can be used for the comparison of level-1 networks.
Keywords: Cherry operations ·Graphs and networks ·Trees ·Net-
work problems ·Algorithm design and analysis ·Biology and genetics ·
Phylogenetic Networks
1 Introduction
Phylogenetic trees have been used extensively throughout the years to represent
simple evolutionary relationships between species. Because of this, many tools
and techniques are readily available to efficiently build, compare and evaluate
trees. Phylogenetic networks on the other hand are much better suited to repre-
sent more complex relationships, such as the ones resulting from hybridization,
recombination and lateral gene transfer events [11]. In the last 15 years or so,
bioinformatics research has focused increasingly on solving problems related to
phylogenetic networks, such as network construction [24,23,25,29,26,1,22], mini-
mum hybridization number [2,10,27,15,8,9,3], tree/network containment [16,17,28],
and distance calculation between networks [5,21,19].
One crucial concept that has been shown to be a very useful tool in solving
several of the important phylogenetic network problems mentioned above is the
one of cherry-picking sequences [10,20]. A cherry-picking sequence is made up
of operations that can reduce a network by either removing one leaf of a simple
(tree-like) cherry (i.e. two leaf siblings descending from the same parent vertex),
2 K. Landry et al.
or removing one reticulate edge of a reticulated cherry (two leaves whose parent
vertices are connected by a reticulate edge). The concept of cherry-picking has
been so valuable that it led to the definition of orchard networks, also known
as cherry-picking networks, which are simply phylogenetic networks that can be
reduced to a single leaf by cherry-picking operations [6,17]. Recent work has been
focusing on further characterizing and classifying different subtypes of orchard
networks [14,18,13].
Lately, we have used a generalized definition of cherry operations to describe
both cherry reductions (i.e. cherry picking) and cherry expansions (the reverse
of a reduction, which adds a simple or reticulate cherry) [19]. We have then
defined four novel distances between orchard networks that are based on cherry
operations, with three of them being different formulations of an equivalent
distance (construction, deconstruction and tail distances) and the fourth one
(mixed distance) being a lower bound for the other three. In the process of
describing these distances, the concept of a maximum agreement cherry-reduced
subnetwork (MACRS note that we replace cherry-picking used in [19] by cherry-
reduced here for clarity) was defined to represent a network contained in both
networks being compared that maximizes the number of vertices. We showed
that finding an MACRS of two orchard networks was NP-hard, and this was
analogous to the problem of calculating the three equivalent distances.
In this work, we present an exact fixed-parameter tractable (FPT) algorithm
to compute an MACRS of two rooted binary level-1 networks that is exponential
in the sum of reticulations present in both networks. More precisely, our algo-
rithm runs in O(3rn3), where ris the sum of reticulations and n represents the
maximum number of vertices of the input networks. Our approach essentially
consists of enumerating a certain set of subnetworks of the input networks in
which all possible combinations of reticulation edges have been removed. Then,
it makes use of a dynamic programming algorithm that finds whether there is
an MACRS (and what it is, if it exists) or not between two level-l networks in
which reticulations that are remaining cannot be removed (we call this problem
MACRS-Simple). We prove that the initial MACRS problem can be solved by
solving the MACRS-Simple problem on all combinations of enumerated subnet-
works.
It is worth noting another important difference between the previous defining
work on MACRS and this article is the definition of networks. Specifically, we
allow leaves of the network to have multiple labels. In fact, we force all leaf labels
to be conserved as the network is trimmed by cherry reductions by subsuming
labels of a removed leaf onto its cherry sibling that remains. In this way, we keep
a “memory” of reductions and this compressed representation of networks allows
to restore all possible alternative network (bijective) leaf labelings from it.
Finally, we conclude the paper by discussing how the enumeration step could
be optimized by considering the relationships between the reticulations of both
input networks. We also briefly present a preliminary idea of how the proposed
algorithm could be extended to higher level binary networks. Even though the
proposed approach applies to orchard networks and not to general networks,
Finding agreement cherry-reduced subnetworks in level-1 networks 3
the orchard network class actually contains network types that are of interest
to the research community, such as the tree-child networks [4] and tree-sibling
time-consistent networks [6]. The tree-child networks in particular, in addition to
having been studied extensively in the literature, are biologically relevant, since
all ancestral species (internal vertices) have a path that can go to a leaf using
only tree vertices. This reflects the idea that ancestral species have descendants
that will perdure through mutation and speciation events, and that hybridization
events are not as common as speciation events [18].
2 Preliminaries
We first introduce the notions regarding networks, then proceed to defining
cherry operations and our problem of interest.
2.1 Networks
Aphylogenetic network N, or a network for short, is an acyclic directed graph
without vertices of in-degree and out-degree 1, and whose vertices and edges are
denoted V(N) and E(N), respectively. We assume that all networks are binary.
For vV(N), we use vand v+to denote the in-degree and out-degree of v,
respectively. The set V(N) contains
the root ρ(N), which is the unique node satisfying ρ(N)= 0 and ρ(N)+=
2. In the case that |V(N)|= 2, ρ(N)+= 1;
the leaves L(N), which satisfy l= 1 and l+= 0 for all lL(N);
the internal vertices V(N)\(L(N) {ρ(N)}), which contains:
the tree vertices T(N), which satisfy v= 1 and v+= 2 for all vT(N);
the reticulation vertices R(N), or simply reticulations, which satisfy
v= 2 and v+= 1 for all vR(N).
We use Xto denote the set of all taxa. For our purposes, the leaves of a network
Nare labeled by one or more taxa. For lL(N), we will use X(l) to denote
the set of taxa that label l. We require that X(l)6=, and that for any distinct
leaves l1, l2L(N), X(l1)X(l2) = .
The edges directed into a reticulation vertex are called reticulation edges,
denoted ER(N). For vV(N), the out-neighbors of vare called its children.
If vhas a single in-neighbor, we denote it by p(v) and call it the parent of v
(if v {ρ(N)} R(N), then p(v) is undefined). Vertices uand vare siblings
if p(u), p(v) are defined and p(u) = p(v). When there is a directed path from
vertex vto vertex u, we call van ancestor of uand we call uadescendant of v.
The descendants of vare denoted reach(v , N) while its ancestors are denoted
reach(v, N) (note that vitself is in both sets). The union of the labels in
reach(v, N)L(N) is denoted X(v). We denote by R(v) the set of reticulations
in reach(v, N).
Two networks N1,N2are weakly isomorphic if there exists a bijection σ:
V(N1)V(N2) such that (u, v)E(N1) if and only if (σ(u), σ (v)) E(N2),
4 K. Landry et al.
and such that for each lL(N1), X(l)X(σ(l)) 6=. For this we use the
notation N N . If, for each lL(N1), X(l) = X(σ(l)), then we say N1and
N2are strongly isomorphic which we denote by N1=N2.
A network Nmay have only one edge whose endpoints are ρ(N) and a leaf.
Then Nis a single-leaf network or singleton. We say ρ(N)roots N. If, for a
vertex v, and for all vertices vreach(v, N), if every path from ρ(N) to v
goes through v, then we say vroots the subnetwork below it.
While a network Nis directed, there is an undirected version of Non the
same vertex set and with an undirected edge {u, v}present for every (u, v)
E(N) which we call the underlying graph. It is on this underlying graph that we
identify the set of biconnected components of N. Such a component is a maximal
subgraph Bthat cannot be disconnected by the removal of an edge therein. Note
that every individual leaf and some tree vertices alone constitute a biconnected
component, we refer to such single vertex components as trivial, and all others
as non-trivial. For a set of biconnected components B1...Bbon a network N, a
bridge is an edge (u, v) such that uBi,vBjfor any arbitrary 1 i6=jb.
The level of a network is the maximum number of reticulations across all
biconnected components of a network. A level-knetwork has no biconnected
component with more than kreticulations. A level-1 network has every bicon-
nected component with either 0 or 1 reticulations. Note that this does not limit
the number of reticulations over the whole network, just in each biconnected
component.
2.2 Cherries and cherry reductions
Acherry is a pair of leaves that are siblings or that have a reticulation joining
their parents. More specifically, a pair (x, y)L(N)×L(N) is called a cherry if
either p(x) = p(y), in which case (x, y) is called a simple cherry, or p(x)R(N)
and (p(y), p(x)) E(N), in which case (x, y) is called a reticulated cherry.
Let Nbe a network and let (x, y) be a pair of vertices. Then applying the
cherry reduction (x, y) on Ncreates a new network as follows:
If (x, y) is a simple cherry of N, then the (x, y)reduction consists of removing
the leaf xand the edge (p(x), x), suppressing the resulting node of in and out-
degree 1 if any, and re-assigning X(y) = X(y)X(x). Note that the operation
we introduce here differs from the cherry reduction operation described in
previous work, where both the leaf xand the set X(x) are deleted. The
purpose of our new definition is to preserve a reference to which label could
have been assigned to y. This is to say that the labels on a given leaf are
interchangeable [19, Lemma 3].
If (x, y) is a reticulated cherry of N, then we remove the reticulation edge
(p(y), p(x)) and the resulting vertices of in and out-degree 1 are suppressed.
In this case, we say that the reticulation edge (p(y), p(x)) is removed by the
cherry reduction (x, y).
If (x, y) is not a cherry of N, then Nis unchanged.
Finding agreement cherry-reduced subnetworks in level-1 networks 5
The resulting graph is a network, and always has a cherry unless it is a
singleton (true of all orchard networks by definition [6,17]).
Cherry reductions often occur in batches, and a sequence Sof pairs of leaves
is called a cherry sequence (CS ). The number of elements in Sis denoted |S|.
The cherry at position iof a CS Sis referred to by Si. We use N hSito denote
the network obtained from Nby first applying cherry reduction S1on N, then
S2on the resulting network, and so on until S|S|is applied. Note that we al-
low Sto contain pairs that do not modify the network (e.g. non-cherries). The
subsequence from (including) the first cherry to (excluding) the ith cherry in S
is S(0:i). When a CS Sreduces a network Nto a singleton, then we say Sis
complete for N. We assume networks are orchard networks hereafter.
See Figure 1 for an illustration of the two cherry reduction operations, and
the concepts of isomorphism.
a={a}b={b}
r1
c={c}f={f}
u v
w
N1
x
r2
d={d}e={e}
a={a}b={b}
r1
c={c}f={f}
u v
w
N2
x
r2
e={d, e}
a={a}b={b}
r1
c={c}f={f}
u v
w
N3
e={d, e}
a={a}b={b}
N4
e={c, d, e, f }
a={a}b={b}
N5
c={c}
Fig. 1. In this figure, leaves are represented by open circles, tree vertices as filled circles,
reticulations as filled squares, and the root of the network as a filled, inverted trian-
gle. Network N1is a level-1 network with |R(N)|= 2. N1is a reticulation-trimmed
subnetwork of N1with respect to F=. Network N2=N1h(d, e)i, where (d, e)
is a simple cherry/reduction. Network N3=N2h(e, f)iwhere (e, f ) is a reticulated
cherry/reduction. N3is reticulation-trimmed subnetwork of N1and of N2with re-
spect to F={(x, r2)}. Network N4=N3h(c, e)·(f, e)·(e, b)iand is a reticulation-
trimmed subnetwork of N1and of N2with respect to F={x, r2),(v, r1)}or to
F={(w, r2),(v, r1)}. Network N5 N4, in fact, there are CSs that may head lead to
leaf ebeing any of leaves c,d,e, or f. Each of these networks would have the same
label set on that leaf, and all are weakly isomorphic with N5.
6 K. Landry et al.
Cherries on a network can be reduced in any order. We restate a theorem
of [16] that we adapt to our formalism3.
Theorem 1. Let Nbe a network, let (x, y)be a cherry of N, and let Sbe a CS
that contains (x, y). Then there exists a CS Ssuch that N hSi=N hSi, and
whose first element is (x, y).
2.3 Maximum agreement cherry-reduced subnetworks
For networks Nand N, when there exists a CS Ssuch that N hSi N , we
say that Nis a cherry-reduced subnetwork (CRS ) of N, denoted by Ncr N.
We can now define the main problem of focus.
The Maximum Agreement Cherry-Reduced Subnetwork (MACRS) problem.
Input: Two orchard networks N1and N2
Find: A network Nwith the maximum number of vertices that satisfies Ncr
N1and Ncr N2
A solution Nto the above problem will be called an MACRS of N1and
N2.
3 An MACRS algorithm on level-1 networks
We show that the MACRS problem can be solved in time O(3rn3) for n=
max(|V(N1)|,|V(N2)|), and r=|R(N1)|+R(N2)|on level-1 networks. We em-
ploy a two-step strategy. We first enumerate a number of inputs that have been
specially reduced to a selected set of remaining reticulations. Second, these in-
puts are provided to a cubic time dynamic programming algorithm on an easier
version of MACRS that uses only simple reductions. Because of the number of
special inputs is limited by 3r, we get an FPT algorithm. MACRS is thus split
into two subproblems. We first introduce them and show how they can be used
to solve MACRS. The later sections then focus on each problem separately.
Let Nbe a network and let FER(N) be a subset of reticulation edges.
We wish to generate all the maximal cherry-reduced subnetworks of Nunder
the restriction that the reticulation edges removed by cherry operations coincide
with F. Thus, we say that a network Nis a reticulation-trimmed subnetwork of
Nwith respect to Fif there exists a CS Ssuch that N hSi=N, and such that
(u, v)Fif and only if Scontains a reticulated cherry reduction that removes
(u, v), and Sis of minimum length i.e. we require that there is no other CS S
with |S|<|S|that satisfies the same properties.
Furthermore, we say that Nis a reticulation-trimmed subnetwork of Nif
there exists a set FER(N) such that Nis a reticulation-trimmed subnetwork
of Nwith respect to F.
3Note that the authors prove the statement under the assumption that Sis complete,
and that leaves are single-labeled. However the proof is easy to adapt to our context.
Finding agreement cherry-reduced subnetworks in level-1 networks 7
The Reticulation-Trimmed Enumeration problem:
Input: An orchard network N.
Find: the set of all reticulation-trimmed subnetworks of N.
Note that the size of the set of reticulation-trimmed subnetworks depends
heavily on the network structure. For instance, it is possible to show that it is
linear when all reticulations are arranged in a path, and exponential when all
reticulations are independent (none is an ancestor of the other). It is possible to
calculate the size of this set exactly by algorithmic means though an abstraction
of the network structure. However, we reserve the analysis of the impact of this
parameter on our algorithm for future work.
Once the set of edges to remove by reticulation have been guessed, it remains
to infer the set of non-reticulated cherry operations. A simple CS is a CS that
contains only simple cherries. In this way, R(N) = R(N hSi) for any simple CS S.
For networks Nand N, when there exists a simple CS Ssuch that N hSi N
we say that Nis a CRS-SIMPLE of N. Note that owing to our definition of
weak isomorphism, N hSi N does not mean that Stransforms Ninto N. A
better intuition would rather be that after applying Son N, we could choose
one label in the label set of each leaf of NhSiand of N, such that the resulting
networks would be isomorphic in the traditional sense.
The Simple Maximum Agreement Cherry-Reduced Subnetwork (MACRS-Simple)
problem.
Input: Two orchard networks N1and N2.
Find: a network Nwith a maximum number of vertices such that Nis a
CRS-SIMPLE of N1and a CRS-SIMPLE of N2.
A solution Nto the above problem will be called a MACRS-Simple of N1
and N2.
For the standard MACRS problem on networks N1and N2, there is always
a solution as long as X(N1)X(N2)6=, however since reticulations can-
not be removed by simple CS, the MACRS-SIMPLE problem may not have a
solution (for instance when the two networks have different number of reticula-
tion vertices). We can now describe our main algorithm, where we assume that
the MACRS-Simple routine correctly returns an optimal solution to the above
problem.
8 K. Landry et al.
Algorithm 1 MACRS Finder
Input Two networks N1and N2
Output A MACRS of N1and N2
1: ˜
N empty network
2: for each reticulation-trimmed subnetwork N
1of N1do
3: for each reticulation-trimmed subnetwork N
2of N2do
4: Let Nbe a MACRS-Simple of N
1and N
2
5: if Nexists and |V(N)|>|V(˜
N)|then ˜
N N
6: end for
7: end for
8: return ˜
N
An optimization technique is evident here: as we mentioned, there is only
a solution to MACRS-Simple (N1,N2) when |R(N1)|=|R(N2)|since only
simple reductions will be performed. Thus, we need only test such pairs. This
optimization is not currently formalized into the algorithm and complexity anal-
ysis presented here, but rather will make for future work.
In the remainder of this section, we focus on proving that this algorithm works
correctly. We will deal with the complexity of the algorithm once we have dealt
with the Reticulation-Trimmed Enumeration and MACRS-Simple sub-
problems. We begin by showing that one can always obtain a subnetwork by
first going through a reticulated-trimmed subnetwork, and then using only sim-
ple cherry reductions.
Lemma 1. Let Nbe a network. Then for any Ncr N, there exists a reticulation-
trimmed subnetwork N′′ of Nand a simple CS Ssuch that N′′hSi=N.
For proof of Lemma 1, see Appendix.
Theorem 2. Algorithm 1 correctly finds a MACRS of N1and N2.
Proof. Let Nbe a MACRS of N1and N2. Let ˜
Nbe the network returned
by Algorithm 1. We first claim that if ˜
Nis non-empty, it does satisfy ˜
N cr
N1,N2. To see this, note that every pair N
1,N
2of networks enumerated by
Algorithm 1 satisfy N
1cr N1and N
2cr N2, by the definition of reticulation-
trimmed subnetworks. Moreover, if a MACRS-Simple Nof N
1,N
2exists, then
by transitivity, Ncr N
1cr N1and Ncr N
2cr N2. Since ˜
Nis one of
those N, this proves our claim.
Let us now focus on the optimality of ˜
N. First note that |V(N)| |V(˜
N)|:
if ˜
Nis an empty network, this is obvious, and otherwise, by our above claim, ˜
N
is a cherry-reduced subnetwork of N1and N2and can thus not be larger than
N.
Let us now show that |V(N)| |V(˜
N)|. By Lemma 1, there exists a
reticulation-trimmed subnetwork N
1of N1(resp. N
2of N2) such that Ncan
be obtained from N
1(resp. N
2) using only simple CSs. Thus, Nis a CRS-
SIMPLE of N
1and N
2. Algorithm 1 will eventually enumerate N
1and N
2and
Finding agreement cherry-reduced subnetworks in level-1 networks 9
find a MACRS-Simple Nof them, which is of maximum size and thus has at
least as many vertices as N. Since the returned ˜
Nis the Nof maximum size
found by the algorithm, it follows that |V(N)| |V(˜
N)|.
4 Subroutines
4.1 Enumerating the set of reticulation-trimmed subnetworks
We now show how to enumerate the set of all reticulation-trimmed subnetworks
of a network Nin time O(3|R(N)||V(N)|). The reticulation-trimmed subnetworks
are characterized by having no more reductions than what sufficiently removes
the desired reticulation edges. Luckily, we will see that at most one such network
can exist; we must only remove the complete subnetwork under both endpoints
of the reduced reticulation edge. This is guaranteed possible by cherry reduc-
tions, assuming all reticulations below these endpoints have also been specified
for removal. Algorithm 2 shows how to enumerate the relevant edges, and uses
Algorithm 3 as a subroutine, which finds the reticulation-trimmed subnetwork
with respect to a given edge set. We show that the reticulation-trimmed subnet-
work of Nwith respect to FR(N) is uniquely defined in Lemma 4. We say
that a set of edges Fis disjoint if, for any two distinct edges (u, v),(x, y)F,
{u, v} {x, y }=.
Algorithm 2 REDUCED-SET-FINDER
Input A network N
Output The set of all reticulation-trimmed subnetworks of N
1: for each F {P(ER(N)) : (a, b),(c, b)/Ffor any a, b, c}do
2: NNRT-SUBNET-MAKER(N, F )
3: end for
4: return N
Lemma 2. Let Nbe a network, FER(N)be a set, and Nbe a network that
is a reticulation-trimmed subnetwork of Nwith respect to F. Then Fis disjoint.
For proof of Lemma 2, see Appendix. Next, for FER(N), a topological
sort of Fis an ordering of its element such that for distinct edges e1, e2F, if
there is a path from a vertex of e1to a vertex of e2in N, then e1comes later
than e2in this ordering.
Lemma 3. Let Nbe a network and FER(N)be a set such that there exists
a reticulation-trimmed subnetwork of Nwith respect to F. Then there exists a
topological sort of F.
For proof of Lemma 3, see Appendix. The next lemma is crucial, as it shows
that reticulation-trimmed subnetworks with respect to a given Fare either
unique, or do not exist. This allows us to enumerate in reasonable time.
10 K. Landry et al.
Lemma 4. Let Nbe a network and let FER(N). Then there does not exist
two non-strongly isomorphic reticulation-trimmed subnetworks of Nwith respect
to F.
For proof of Lemma 4, see Appendix. For an example, given a network N,
of an FER(N) that does not admit a reticulation-trimmed subnetwork of
N, consider Nwith 2 reticulations, r1,r2such that r1reach(r2). Choosing
F={(p1, r1)}, for p1chosen arbitrarily between r1’s parents, will not admit a
reticulation-trimmed subnetwork since reticulation r2must have leaves below its
endpoints to be in a cherry, but this choice of Fhas no corresponding reticulated
reductions of r1making it impossible to construct a CS Sthat reduces only r2.
We next describe Algorithm 3, which produces the reticulation-trimmed net-
works with respect to some given F, see Figure 2 for an illustration.
p(u)
u
vv
u
p(v)
(1) p(u)
u
vv
u
p(v)
(2)
(3)
uv
Fig. 2. In this figure, leaves are represented by open circles, tree vertices as filled circles,
reticulations as filled squares. A subnetwork without reticulations is represented by a
large open triangle, a subnetwork that may be reticulated is represented by a large
open blob. This Figure shows an example of the operation of Algorithm 3, note how
R(u)R(v)\ {v}=in this example. Subnetwork under label (1) is an example
network at line 7, the dotted line represents the removed reticulation edge (u, v) by
line 5 and both leaves uand vhave been constructed (leaf labels are not shown). The
network under label (2) shows the state of network (1) at line 8 when edges (p(u), u)
and (p(v), v) have been added.The network under label (3) shows the state of the
network under (1) at line 9 when vertices in reach(u, N)reach(v , N) are removed.
Finding agreement cherry-reduced subnetworks in level-1 networks 11
Algorithm 3 RT-SUBNET-MAKER
Input A network Nand a disjoint set FER(N)
Output the reticulation-trimmed subnetwork of Nwith respect to F, or
NULL if it does not exist
1: N N
2: Find a topological sort For F
3: for each (u, v)Fin order do
4: if R(u)R(v)\ {v}=then
5: delete edge (u, v)
6: construct leaf usuch that X(u) = X(u)
7: construct leaf vsuch that X(v) = X(v)
8: add edges (p(u), u) and (p(v), v) to N
9: remove all vertices in reach(u, N)reach(v , N)
10: else
11: return NULL
12: end if
13: end for
14: return N
Lemma 5. Algorithm 3 on (N,F) returns the reticulation-trimmed subnetwork
Nof Nwith respect to Fif it exists, and NULL if not, and runs in time
O(|V(N)|).
For proof of Lemma 5, see Appendix.
Theorem 3. Algorithm 2 correctly enumerates all reticulation-trimmed subnet-
works of a network N, and runs in time O(3|R(N)||V(N)|).
Proof. It is already proved (Lemma 2) that non-disjoint Fdoes not admit a
reticulation-trimmed subnetwork, so it is correct to filter those. The remaining
correctness follows from the exhaustive nature of the construction of all Fand
by the correctness of Algorithm 3
As for the time complexity, filtering non-disjoint Fimplies a threefold choice
on each reticulation (we either include one, or none of its incoming edges, but
not both by disjointness). Thus the size of the set is O(3|R(N)|). Recalling that
Algorithm 3 can be implemented in time O(|V(N)|), the total runtime for Al-
gorithm 2 is in O(3|R(N)||V(N)|).
4.2 An algorithm for MACRS-Simple
A dynamic programming algorithm that solves the MACRS-Simple problem
in cubic time is given and proved in this section.
Assume we have networks N1and N2as input to the MACRS-Simple prob-
lem. We assume that we have computed the set of biconnected components of
N1and N2in a preprocessing step, along with the bridge edges. This can be
12 K. Landry et al.
done in time O(|V(N)|), see [7]. Since the networks considered are level-1, each
biconnected component Bcontains exactly one vertex uthat has no in-neighbor
in B, and exactly one vertex rthat has no out-neighbor in B. If Bis trivial, then
u=r, and otherwise ris a reticulation vertex and there are two edge-disjoint
paths from uto rin B[12] We refer to these two paths as component paths, The
vertex uwill be called the root of Band denoted ρ(B), and rwill be called
the bottom of B. We let B1be the set of biconnected components of N1and B2
be the set of biconnected components of N2. Finally for i {1,2}, we denote
ρ(Bi) = {ρ(B) : B Bi}, i.e. the set of roots in Bi.
Using dynamic programming, we construct a table Mwhose rows are the
roots in ρ(B1) and whose columns are the roots in ρ(B2). For uρ(B1), v
ρ(B2), we define Nuas the subnetwork of N1rooted at u, and Nvas the sub-
network of N2rooted at v. We then define M[u, v] as the number of leaves in a
MACRS-Simple of Nuand Nv. If uis a tree vertex, its children are denoted
u1and u2.
In N1, we denote the two component paths on the same non-trivial bicon-
nected component by π1
l=p1
l,1... and π1
r=p1
r,1... and in N2these paths will
be denoted π2
l=p2
l,1... and π2
r=p2
r,1.... For a vertex pion path π=p1..., let
h(pi) be the child vertex of pisuch that h(pi)6=pi+1. In other words, the edge
(pi, h(pi)) is a bridge pendant πleading to a different biconnected component
where h(pi) is rooting a distinct subnetwork. See Figure 3 for an illustration of
the component paths and the described labelings for an example N1network.
We use Algorithm 4 to compute M[u, v] for each uρ(B1), v ρ(B2) in
postorder. We seek the result M[ρ(N1), ρ(N2)] + |R(N1)|as Mrecords only the
number of leaves in an MACRS-Simple of N1and N2. From this information
we can calculate more about the general size of the network because they are
binary, |V(N)|= 2|L(N)|+ 2|R(N)| 1. Luckily, the number of reticulations
in the solution is known ahead of time since it must have the same number of
reticulations as each of the inputs. Note that we can also reconstruct the network
that corresponds to the optimal size of the MACRS-Simple of N1and N2by
performing a traceback in the dynamic programming table.
Theorem 4. Algorithm 4 runs in time O(|V(N1)||V(N2)|(|V(N1)|+|V(N2)|)).
Proof. The algorithm fills a table M, a table of maximum size |V(N1)||V(N2)|,
thus if we can show each table entry is calculated in at most linear (|V(N1)|+
|V(N2)|) time, then the algorithm is cubic as claimed.
The preprocessing step to determine and label biconnected components is
linear as it requires a modified depth-first search [7]. Then, the calculations
being performed for lines 1 through 9 consist of finding and checking the labelled
components (linear), and checking up to 12 set intersections (linear) of a vertex’s
descendants leaves (linear to find). Lines 10 and on perform a linear number of
table lookups/calls. The paths themselves are also linear to find as they are
simply the paths that leave each child of the rooting vertex of the biconnected
component and end on the next reticulation, the length of which can also be
calculated on a single pass. Thus the claim holds.
Finding agreement cherry-reduced subnetworks in level-1 networks 13
Algorithm 4
Input: Two multi-networks N1,N2, vertices uρ(B1), v ρ(B2)
Output: M[u, v]
1: if both uand vare trivial components then
2: if uor vis a leaf then
3:
M[u, v] = (1 if X(u)X(v)6=and R(u)R(v) =
−∞ otherwise
4: else
5: for each i {1,2}, j {1,2}, define Xij =X(ui)X(vj)
6: M[u, v] = max(M1, M2) where
M1=
1 if X11 6=and X22 =and R(u)R(v) =
1 if X11 =and X22 6=and R(u)R(v) =
M[u1, v1] + M[u2, v2] if X11 6=and X22 6=
−∞ otherwise
M2=
1 if X12 6=and X21 =and R(u)R(v) =
1 if X12 =and X21 6=and R(u)R(v) =
M[u1, v2] + M[u2, v1] if X12 6=and X21 6=
−∞ otherwise
7: end if
8: else if uis a trivial biconnected component and vis in a non-trivial bicon-
nected component (or vice versa) then
9: M[u, v] = −∞
10: else uand vare in non-trivial components with reticulations r1,r2respec-
tively and complement paths π1
l,π1
r,π2
l,π2
r
11: M1=−∞
12: M2=−∞
13: if |π1
l|=|π2
l|and |π1
r|=|π2
r|then
14:
M1=M[r1, r2] +
i=|π1
l|
X
i=1
M[h(p1
l,i), h(p2
l,i)] +
i=|π1
r|
X
i=1
M[h(p1
r,i), h(p2
r,i)]
15: end if
16: if |π1
l|=|π2
r|and |π1
r|=|π2
l|then
17:
M2=M[r1, r2] +
i=|π1
l|
X
i=1
M[h(p1
l,i), h(p2
r,i)] +
i=|π1
r|
X
i=1
M[h(p1
r,i), h(p2
l,i)]
18: end if
19: M[u, v] = max(M1, M2)
20: end if
14 K. Landry et al.
p1
l,1
r1
p1
r,1
p1
r,2
h(p1
l,1)h(p1
r,2)
h(p1
r,1)
ρ(Bi)
v
Fig. 3. In this figure, tree vertices as filled circles and reticulations as filled squares.
A subnetwork is represented by a large open blob. vertices in red are in the same non-
trivial biconnected component. Yellow edges are path π1
1and green edges are path π1
r.
Tree vertex vis a trivial biconnected component itself such that R(v)6=.
Theorem 5. The entry M[ρ(N1), ρ(N2)] correctly contains |L(N)|for N=
MACRS-Simple(N1,N2)if one exists, and −∞ otherwise.
See Appendix for proof of Theorem 5.
4.3 Complexity of Algorithm 1
Theorem 6. Let N1,N2be two networks, let n= max(|V(N1)|,|V(N2)|), and
r=|R(N1)|+|R(N2)|. Then the MACRS problem can be solved in time O(3rn3).
Proof. By Theorem 3, Algorithm 2 can enumerate all reticulation-trimmed sub-
networks of N1and N2in total time O(3|R(N1)|n+ 3|R(N2)|n) = O(3rn). The
number of pairs of such networks for which we compute a MACRS-Simple is
O(3r), each of which can be handled in time O(n3) by Theorem 4. The total
running time is thus O(3rn+ 3rn3) = O(3rn3).
5 Conclusion and discussion
In this paper, we presented the first exact algorithm to find an MACRS of two
rooted binary level-1 networks. The proposed approach starts by enumerating all
reticulation-trimmed subnetworks for both input networks, and then compares
Finding agreement cherry-reduced subnetworks in level-1 networks 15
all the possible pairs produced for each input network using a dynamic pro-
gramming algorithm for the MACRS-Simple problem. The enumeration step
presented here is currently exponential in the sum of reticulation numbers of
both input networks, and the MACRS-Simple algorithm takes cubic time in
the maximum number of vertices contained in the input networks.
In addition to the benefit of being able to extract a common subnetwork
structure of maximum size from two orchard networks, the proposed algorithm
permits to find a measure of the amount of differences between them. As shown
in our previous work [19], there is a direct correspondence between finding an
MACRS (more specifically, its size) and calculating one of the three equivalent
distances presented in that work. As such, the algorithm presented here provides
a first method to calculate exactly these distances. This can be used in the future
to compare this distance with other distances (such as the mixed distance) or to
evaluate the accuracy of different heuristic approaches.
Future extensions
There is an obvious optimization that can be applied to the approach presented
in this work related to the enumeration of the reticulation-trimmed subnetworks.
Since the MACRS-Simple algorithm by definition does not remove reticula-
tions, comparing two input reticulation-trimmed subnetworks that do not share
the same reticulation number or topology (in the sense that no mapping of the
components containing reticulations can be made) will result in no solution. An
obvious improvement to the enumeration step is to compare the topological rela-
tionships of the reticulations in both input networks (which, in the case of level-1
networks, can be modelled by trees), find the largest common reticulation topol-
ogy between them, and start enumerating from there by gradually removing all
possible reticulations. While this strategy does not achieve any additional formal
bounding, it may reduce greatly the number of reticulation-trimmed subnetwork
pairs to consider on many real inputs (potentially bringing it down to a linear
number of pairs).
Another interesting avenue of work is to generalize our algorithm to higher
level networks. A brief overview of a possible strategy would be to extend the
MACRS-Simple dynamic programming to consider, for each pair of bicon-
nected components, all possible isomorphisms, find the maximum value and
then summing to it the values of the exterior nodes that are matched in the
isomorphism.
Attaching leaves to a non-orchard network was used previously to extend
an approach to solve the minimum hybridization problem on any rooted phylo-
genetic network [20]. Exploring if and how a similar idea could be employed to
generalize our proposed algorithm to non-orchard networks should be considered.
Finally, as mentioned earlier, the complexity of our method is exponential in
the sum of the number of reticulations in both input networks because of the
enumeration step. Ideally, we could find an approach for which the complexity
would depend only on the level of the two input networks, which we leave as an
open problem.
16 K. Landry et al.
References
1. Allen-Savietta, C.: Estimating Phylogenetic Networks from Concatenated Se-
quence Alignments. The University of Wisconsin-Madison (2020)
2. Baroni, M., Semple, C., Steel, M.: A framework for representing reticulate evolu-
tion. Annals of Combinatorics 8, 391–408 (2005)
3. Bernardini, G., van Iersel, L., Julien, E., Stougie, L.: Reconstructing phylogenetic
networks via cherry picking and machine learning. In: WABI 2022-2nd Interna-
tional Workshop on Algorithms in Bioinformatics (2022)
4. Bordewich, M., Semple, C.: Determining phylogenetic networks from inter-taxa
distances. Journal of mathematical biology 73(2), 283–303 (2016)
5. Cardona, G., Llabr´es, M., Rossell´o, F., Valiente, G.: Metrics for phylogenetic net-
works i: Generalizations of the robinson-foulds metric. IEEE/ACM Transactions
on Computational Biology and Bioinformatics 6(1), 46–61 (2008)
6. Erd˝os, P.L., Semple, C., Steel, M.: A class of phylogenetic networks reconstructable
from ancestral profiles. Mathematical biosciences 313, 33–40 (2019)
7. Hopcroft, J.E., Tarjan, R.E.: Dividing a graph into triconnected components. SIAM
Journal on computing 2(3), 135–158 (1973)
8. Huber, K.T., Linz, S., Moulton, V.: The rigid hybrid number for two phylogenetic
trees. Journal of Mathematical Biology 82, 1–29 (2021)
9. Huber, K.T., Linz, S., Moulton, V.: Cherry picking in forests: A new characteri-
zation for the unrooted hybrid number of two phylogenetic trees. arXiv preprint
arXiv:2212.08145 (2022)
10. Humphries, P.J., Linz, S., Semple, C.: Cherry picking: a characterization of the
temporal hybridization number for a set of phylogenies. Bulletin of mathematical
biology 75(10), 1879–1890 (2013)
11. Huson, D.H., Bryant, D.: Application of Phylogenetic Net-
works in Evolutionary Studies. Molecular Biology and Evolution
23(2), 254–267 (10 2005). https://doi.org/10.1093/molbev/msj030,
https://doi.org/10.1093/molbev/msj030
12. Huson, D.H., Rupp, R., Scornavacca, C.: Phylogenetic networks: concepts, algo-
rithms and applications. Cambridge University Press (2010)
13. van Iersel, L., Janssen, R., Jones, M., Murakami, Y.: Orchard networks are trees
with additional horizontal arcs. Bulletin of Mathematical Biology 84(8), 76 (2022)
14. van Iersel, L., Janssen, R., Jones, M., Murakami, Y., Zeh, N.: A unifying character-
ization of tree-based networks and orchard networks using cherry covers. Advances
in Applied Mathematics 129, 102222 (2021)
15. Janssen, R., Jones, M., Murakami, Y.: Combining networks using cherry picking
sequences. In: Algorithms for Computational Biology: 7th International Confer-
ence, AlCoB 2020, Missoula, MT, USA, April 13–15, 2020, Proceedings. pp. 77–92.
Springer (2020)
16. Janssen, R., Murakami, Y.: Linear time algorithm for tree-child network contain-
ment. In: Algorithms for Computational Biology: 7th International Conference,
AlCoB 2020, Missoula, MT, USA, April 13–15, 2020, Proceedings 7. pp. 93–107.
Springer (2020)
17. Janssen, R., Murakami, Y.: On cherry-picking and network containment. Theoret-
ical Computer Science 856, 121–150 (2021)
18. Kong, S., Pons, J.C., Kubatko, L., Wicke, K.: Classes of explicit phylogenetic net-
works and their biological and mathematical significance. Journal of Mathematical
Biology 84(6), 47 (2022)
Finding agreement cherry-reduced subnetworks in level-1 networks 17
19. Landry, K., Teodocio, A., Lafond, M., Tremblay-Savard, O.: Defining phylogenetic
network distances using cherry operations. IEEE/ACM Transactions on Compu-
tational Biology and Bioinformatics (2022)
20. Linz, S., Semple, C.: Attaching leaves and picking cherries to characterise the
hybridisation number for a set of phylogenies. Advances in Applied Mathematics
105, 102–129 (2019)
21. Lu, B., Zhang, L., Leong, H.W.: A program to compute the soft robinson–foulds
distance between phylogenetic networks. BMC genomics 18, 1–10 (2017)
22. Lutteropp, S., Scornavacca, C., Kozlov, A.M., Morel, B., Stamatakis, A.: Netrax:
accurate and fast maximum likelihood phylogenetic network inference. Bioinfor-
matics 38(15), 3725–3733 (2022)
23. Nguyen, Q., Roos, T.: Likelihood-based inference of phylogenetic networks from
sequence data by phylodag. In: Algorithms for Computational Biology: Second
International Conference, AlCoB 2015, Mexico City, Mexico, August 4-5, 2015,
Proceedings 2. pp. 126–140. Springer (2015)
24. Park, H.J., Jin, G., Nakhleh, L.: Bootstrap-based support of hgt inferred by max-
imum parsimony. BMC Evolutionary Biology 10(1), 1–11 (2010)
25. Sol´ıs-Lemus, C., Bastide, P., An´e, C.: Phylonetworks: a package for phylogenetic
networks. Molecular biology and evolution 34(12), 3292–3298 (2017)
26. Tan, M., Long, H., Liao, B., Cao, Z., Yuan, D., Tian, G., Zhuang, J., Yang, J.: Qs-
net: Reconstructing phylogenetic networks based on quartet and sextet. Frontiers
in Genetics 10, 607 (2019)
27. Van Iersel, L., Janssen, R., Jones, M., Murakami, Y., Zeh, N.: Polynomial-time
algorithms for phylogenetic inference problems involving duplication and reticula-
tion. IEEE/ACM transactions on computational biology and bioinformatics 17(1),
14–26 (2019)
28. Van Iersel, L., Jones, M., Weller, M.: Embedding phylogenetic trees in networks of
low treewidth. arXiv preprint arXiv:2207.00574 (2022)
29. Wen, D., Yu, Y., Zhu, J., Nakhleh, L.: Inferring phylogenetic networks using phy-
lonet. Systematic biology 67(4), 735–740 (2018)
18 K. Landry et al.
Appendix
1 Proof of Lemma 1(page 8)
Lemma 1. Let Nbe a network. Then for any Ncr N, there exists a reticulation-
trimmed subnetwork N′′ of Nand a simple CS Ssuch that N′′hSi=N.
Proof. Let Nand Nbe networks such that Ncr N. We use induction on
|E(N)|to prove a slightly stronger statement. We show that for any FER(N)
such that there exists a CS Ssatisfying N hSi=Nthat removes the set of
reticulation edges F, there exists a reticulation-trimmed subnetwork N′′ of N
with respect to Fand a CS Ssuch that N′′ hSi=N.
The base case is |E(N)|= 1. In this case we have a singleton network on a
root, a single leaf, and an edge between them. There are no cherries and only
F=is possible, and so all Ncr Nhave N=Nand S=for N hSi=N.
So the claim is trivially true.
For the induction step, assume that the claim holds for all networks whose
number of edges is strictly smaller than |E(N)|.
Let FER(N), and suppose there is a CS Ssuch that N hSi=N,
and that the set of reticulation edges removed by Sis F. If every reduction
in Sis simple, then F=and Nis itself a reticulation-trimmed subnetwork
of Nwith respect to F=, in which case the statement holds. Otherwise, let
(u, v)Fbe the first reticulation edge of Nremoved by a reduction in S, say
(u, v) is removed by cherry S
i. On network N hS
(0:i)i,uand vmust both have
leaf children. This implies that there are no reticulations in reach(u, N). Thus,
all cherries (x, y) of S(0:i)such that {x, y } reach(u, N) are simple.
Assume for now that at least one such simple reduction exist, and let S
h=
(x, y) be the first of them. With this, we see that (x, y) is a cherry on Nthrough-
out every step of the reduction of Nby S
(0:h), including on N. We may there-
fore assume that (x, y) is the first reduction of Sby Theorem 1. Thus Nis
obtained from N h(x, y)iby applying the CS S
(1:|S|), which we know removes the
set of reticulation edges F. By induction, there exists a reticulation-trimmed
subnetwork Nof N h(x, y)iwith respect to Fand a simple CS Ssuch that
NhSi=N. Let Tbe a smallest CS such that results in Nafter applying it
on N h(x, y)i. Then Ncan be obtained from Nby applying the CS (x, y )·T.
We note that reduction (x, y) is required in any CS that results in a trimmed-
reticulation subnetwork of Nwith respect to F(since the cherry (x, y) is below
uand (u, v) needs to be removed) and it follows by the minimality of Ton
N h(x, y)ithat (x, y)·Tis minimum on N. Thus Nis a reticulation-trimmed
subnetwork of Nwith respect to Fand the claim holds.
It is also possible that (x, y ) does not exist as a simple reduction, which
occurs when uand veach already have a leaf child in N. In this case, (x, y) is
a reticulated cherry on Nsuch that p(x) = vand p(y) = u. As in the previous
case, we may assume that (x, y) is the first reduction of S. Let F=F\ {(u, v )}.
By induction there is a reticulation-trimmed subnetwork Nof N h(x, y)iwith
respect to Fand there exists Ssuch that NhSi=N. We also have a CS T
Finding agreement cherry-reduced subnetworks in level-1 networks 19
that is minimum and contains a reticulated reduction if and only if it removes an
edge in F. As before, the reduction (x, y) is required in any reticulation-trimmed
subnetwork of Nwith respect to F, and it follows by the minimality of Tthat
(x, y)·Tyields such a network. Again, Nis a reticulation-reduced subnetwork
of Nwith respect to Fand there is the simple CS Ssuch that NhSi=N.
2 Proofs from Section 4.1: Lemma 2(page 9),
Lemma 3(page 9), Lemma 4(page 10),
Lemma 5(page 11)
Lemma 2. Let Nbe a network, FER(N)be a set, and Nbe a network that
is a reticulation-trimmed subnetwork of Nwith respect to F. Then Fis disjoint.
Proof. Say Fis not disjoint, then there are two cases. First, say Fcontains
edges (u, v) and (w, v ). The reduction on the network must first reduce one of
these edges, say (u, v). This reduction removes the vertex v, and thus the edge
(w, v) can no longer be in the network. In this way no single CS can reduce
both the reticulation edges leading into the same reticulation. Next, assume F
is not disjoint because it contains edges (u, v) and (u, w). This is not possible
in a level-1 network, because reticulations v , w would be contained in the same
biconnected component.
Lemma 3. Let Nbe a network and FER(N)be a set such that there exists
a reticulation-trimmed subnetwork of Nwith respect to F. Then there exists a
topological sort of F.
Proof. This claim is evidence by the topological partial ordering on R(N), which
exists on any network. Then, note that Fis disjoint by Lemma 2.
Lemma 4. Let Nbe a network and let FER(N). Then there does not exist
two non-strongly isomorphic reticulation-trimmed subnetworks of Nwith respect
to F.
Proof. Let Nbe a network and FER(N) be a set.
We claim there are not two non-strongly isomorphic reticulation-trimmed
subnetworks with respect to F. We proceed by induction on |V(N)|+|E(N)|.
In the base case, |V(N)|+|E(N)| 3 and Nis a singleton network on one
edge, with a root and a leaf as endpoints. In this case, F=and the reticulation-
trimmed subnetwork of Nwith respect to Fis Nand N=N. There are no
cherries on Nso there is not more than one reticulation-trimmed subnetwork of
Nwith respect to F.
Assume the claim holds for all networks with strictly less vertices and edges
than N. If there is no reticulation-trimmed subnetwork of Nwith respect to F,
then the claim holds. Assume such a network, N, exists. If F=then Nis the
only reticulation-trimmed subnetwork of Nwith respect to F, as making any
20 K. Landry et al.
cherry reductions would not be minimum (since N h∅i =Nis already sufficient
to remove empty F). The claim holds in this case, so we now assume F6=.
Let edge eFbe (arbitrarily) one of the lowest edge in the topological sort
on F(which exists, Lemma 3). Because Nexists, there is at least one cherry
below an endpoint of e(emay be in the cherry itself). Let one such cherry be
called (x, y).
Next assume (x, y) is such that (p(y), p(x)) 6=e. We know that (x, y) is not
reticulated, as otherwise ewould not be one of the lowest reticulations in F.
Therefore (x, y) is not reticulated, and thus simple. Because eis in F, it will
have to be removed, and must have leaves under its endpoints to do so, thus
(x, y) needs to be reduced in any CS Ssuch that N hSiis a reticulation-trimmed
subnetwork of Nwith respect to F. Moreover, by Theorem 1, we may assume
that any such CS Sstarts with (x, y) as otherwise (x, y) can be removed first
without affecting the resulting network. It follows that we may assume that, for
every CS Ssuch that N hSiis a reticulation-trimmed subnetwork of Nwith
respect to F, applying the first reduction results in N h(x, y)i. Then applying
S(1:|S|)on N h(x, y)imust yield a reticulation-trimmed subnetwork of N h(x, y)i
with respect to F(in particular, |S(1:|S|)|is minimum as otherwise, |S|would
not be minimum). By the induction hypothesis, N h(x, y)idoes not have two
non-strongly isomorphic reticulation-trimmed subnetworks with respect to F.
Since we may assume that all CSs Sapplicable to Nthat result in such a
trimmed subnetwork go through N h(x, y)i, it follows that Nalso does not have
two non-strongly isomorphic reticulation-trimmed subnetworks with respect to
F.
Finally, assume that (x, y) is reticulated and (p(y), p(x)) = e. As in the
previous case, note that (x, y ) must be present in any CS Ssuch that N hSiis
a reticulation-trimmed subnetwork of Nwith respect to F, and we may further
assume that any such CS starts with (x, y). Then, any such CS first goes through
N h(x, y)i, and then by minimality, results in a reticulation-trimmed subnetwork
of N h(x, y)iwith respect to F\ {(p(y), p(x))}. By induction, there is only one
such network, and thus also only one reticulation-trimmed subnetwork of Nwith
respect to F.
Lemma 5. Algorithm 3 on (N,F) returns the reticulation-trimmed subnetwork
Nof Nwith respect to Fif it exists, and NULL if not, and runs in time
O(|V(N)|).
Proof. First, there is always a topological sort on F(Lemma 3) and Fis disjoint
(Lemma 2).
If a network is not returned, then it must be that R(u)R(v)\ {v} 6=on
one of the (possibly) partially reduced subnetwork for some edge (u, v)F. In
this case, Fdoes not admit a reticulation-trimmed subnetwork since there is no
edge in Fthat corresponds the reduction of all reticulations below u, and v, a
requirement for the reduction of reticulation v.
Assume a network is returned, we claim the algorithm is correct in this case.
Finding agreement cherry-reduced subnetworks in level-1 networks 21
We will show this claim by constructing a CS Ssuch that N hSi=N, and
by counting the exact number of reticulation cherries which we show will remove
the desired F. Finally, we will show Sis minimum.
Say Fhas the order h(u1, v1)(u2, v2)...i. For each (ui, vi), in order, we con-
struct a CS Sion the partially reduced N, then let S=S1·S2.... Note that
this means we construct each Sion N hS1·S2·...·Si1i. There are distinct
subnetworks rooted on uiand viso there are CSs Su
i,Sv
ithat are complete for
each respective subnetwork. Note how, since we assume a network was returned,
R(ui)R(vi)\ {vi}=, so Su
iand Sv
iare simple. Say they reduce the sub-
networks to leaves lu
iand lv
irespectively (so that p(lu
i) = uand p(lv
i) = v). Let
Si=Su
i·Sv
i·(lv
i, lu
i). The cherry (lv
i, lu
i) will reduce the reticulation edge (u, v )
under these conditions.
In this way, there is a reticulation cherry in Sif and only if it reduces an
edge in F. Furthermore, Sis minimum since we construct Sion the network
N hS1·S2·...·Si1iand we have selected only the cherry reductions that are
necessary and sufficient for the reduction of the targeted reticulation edges.
The claimed running time is straightforward. We can obtain a topological
sort on Fusing a standard topological sort of Nobtained in time O(|V(N)|+
|E(N)|) = O(|V(N)|) (since our networks are binary). Then the algorithm only
iterates over Fand replaces subnetworks by multi-labeled leaves, which can be
handled in time O(|V(N)|).
3 Proof of Theorem 5, page 14
Lemma 6 and Corollary 1 is required to justify the arguments in the proof for
Theorem 5.
Lemma 6. Let Nbe a network and Sbe any CS on N. If, for rR(N),
r N hSi, then vreach(r, N) N hSi.
Proof. Assume r N , N hSi,vreach(r, N), / N hSi. First, vis not a
leaf as a leaf cannot be above any other vertex. There must be some cherry in
S, say Si= (x, y) such that v=p(x), v=p(y) or v=p(x) = p(y) in N hS[(0:i)i.
Since r N hSi, we have that r N hS(0:i)iand since vreach(r, N) we have
that vreach(r, N hS(0:i)i) thus the only orientation we can have is v=p(y),
r=p(x) in the reticulated cherry (x, y) N hS(0:i)i. But x,y,p(x), p(y) are
removed in N hS(0:i]icontradicting that r N hSi.
Note that Lemma 6 implies that after any simple reduction by a CS Son
a network N, that since all rR(N) N hSithen SrR(N)reach(r, N)
N hSi. Noting this, it follows that
Corollary 1 networks N1,N2and N =M ACRS SI M P LE(N1,N2)
such that Nis non-null, and edge-preserving bijective function
f:[
rR(N1)
reach(r, N1)[
rR(N2)
reach(r, N2)
.
22 K. Landry et al.
Theorem 5. The entry M[ρ(N1), ρ(N2)] correctly contains |L(N)|for N=
MACRS-Simple(N1,N2)if one exists, and −∞ otherwise.
Proof. Given input networks N1and N2, for any uρ(B1), v ρ(B2), let the
subnetwork of N1rooted on ube called Nuand let the subnetwork of N2rooted
on vbe called Nv. We claim that M[u, v] as we defined it always contains the
number of leaves of a MACRS-Simple between Nuand Nv. In particular, this
will show our desired result with u=ρ(N1), v =ρ(N2). The proof is by induction
on |V(Nu)|+|V(Nv)|.
As a base case, suppose that |V(Nu)|= 1 and |V(Nv)|= 1, i.e. uand vare
both leaves. If X(u)X(v)6=, we put M[u, v] = 1, which is correct since Nu
and Nvare weakly isomorphic. If X(u)X(v) = , we put M[u, v] = −∞, which
is correct since there is MACRS-Simple between networks that do not share a
leaf label.
Let us now consider the inductive step. For the rest of the proof, for any
uV(N1), vV(N2), we denote by N
u,vaMACRS-Simple between Nu
and Nv, if one exists (otherwise, N
u,vis undefined). As an inductive hypothesis,
we assume that M[u, v] is the number of leaves in N
u,v, for any u, vsuch
that |V(Nu)|+|V(Nv)|<|V(Nu)|+|V(Nv)|.
The proof is split into cases.
Case: uand vare trivial, one is a leaf. If X(u)X(v)6=and R(u)R(v) =
then M[u, v] = 1. This is correct since a complete reduction can proceed on
both networks without any reticulations and with at least one leaf in common, a
requirement for any network to be isomorphic with a singleton network. More-
over, there cannot be more than one leaf since uor vis itself a leaf. For the
same reason, when X(u)X(v) = ,M[u, v] = −∞ is obviously correct. When
R(u)R(v)6=,M[u, v] is correct because a MACRS-Simple of uand vcan
only be a leaf, but this cannot be achieved since one of the networks has an
unremovable reticulation.
Case: uand vare trivial and R(u)R(v) = ,X(u1)X(v1)6=and X(u2)
X(v2) = or X(u1)X(v1) = and X(u2)X(v2)6=or X(u1)X(v2)6=
and X(u2)X(v1) = or X(u1)X(v2) = and X(u2)X(v1)6=.In
this case, M[u, v] = 1 by line 6. Neither unor vhave reticulations below them,
thus any reduction may proceed. In fact, the reduction on Nuand Nvmust be
complete to obtain an isomorphic network since there is no shared leaf below
one child of uand one child of v, thus that child must be removed to reach any
MACRS-SIMPLE(Nu,Nv), requiring a cherry on uand v. Luckily, there is a
leaf shared below one child of uand one child of vand so a singleton isomorphic
network is possible, so this case is correct.
Case: uand vare trivial, R(u)R(v)6=, and X(u1)X(v1) = or X(u2)
X(v2) = or X(u1)X(v2) = or X(u2)X(v1) = .In this case line 2
resolves to true so we calculate M[u, v] on line 6. We find that M[u, v] = −∞
since R(u)R(v)6=. A complete reduction is required to reach the required
isomorphic singleton in this condition, but the presence of a reticulation prevents
this.
Finding agreement cherry-reduced subnetworks in level-1 networks 23
Case: uis trivial and vis not trivial or vis trivial and uis not trivial. In this
case we always resolve M[u, v] = −∞ by line 8 and 9. This is indeed the correct
case, the presence of a reticulation in Nvand not in Nu(or in Nuand not Nv)
makes an isomorphic network unreachable by simple reductions alone.
Case: uand vare trivial, both are not leaves, and X(u1)X(v1)6=and
X(u2)X(v2)6=or X(u1)X(v2)6=and X(u2)X(v1)6=. In this
case we calculate M[u, v] on line 6. Regardless of any reticulations that may be
below uor v, we put M[u, v] as the maximum between M[u1, v1] + M[u2, v2]
and M[u, v] = M[u1, v2] + M[u2, v1]. We can assume, by the inductive hy-
pothesis, that M[u1, v1] = |L(N
u1,v1)|and M[u2, v2] = |L(N
u2,v2)|(likewise
M[u1, v2] = |L(N
u1,v2)|and M[u2, v1] = |L(N
u2,v1)|). It is not difficult to see
that if N
u,v , a MACRS-Simple of Nuand Nv, exists, then it can be obtained
by joining a MACRS-Simple of Nu1,Nv1with a MACRS-Simple of Nu2,Nv2
under a common parent, or by joining a MACRS-Simple of Nu1,Nv2with
aMACRS-Simple of Nu2,Nv1under a common parent. In the current case,
M1=M[u1, v1] + M[u2, v2] and M2=M[u1, v2] + M[u2, v1] correspond to con-
structing these two possible networks, and since they contain the correct values
by induction, M= max(M1, M2) is correct.
Case: uand vare both non-trivial and M[r1, r2]6=−∞ for reticulation r1,r2
in u’s, v’s biconnected components respectively, and M[h(p1
i), h(p2
i)] 6=−∞ for
all iin any p1π1
l, π1
ror p2π2
l, π2
r). In this case we resolve M[u, v] =
M[r1, r2] + Pi=|π1
l|
i=1 M[h(p1
l,i), h(p2
l,i)] + Pi=|π1
r|
i=1 M[h(p1
r,i), h(p2
r,i)] or M[u, v] =
M[r1, r2] + Pi=|π1
l|
i=1 M[h(p1
l,i), h(p2
r,i)] + Pi=|π1
r|
i=1 M[h(p1
r,i), h(p2
l,i)].
Each table reference in this summation returns a value that is correct for
that subnetwork, by the induction hypothesis, as every subnetwork ˜
Nof Nu
(6=Nu) and Nv(6=Nv) is smaller. The summation itself is also correct. This
is evident by noting that the biconnected components on uand vonly contain
vertices along πland πr[12], so all vertices in the components are accounted for.
Furthermore, all bridges must lead to disjoint networks. Finally, by Corollary 1
the only possible networks are constructed by joining up child networks that
pair vertices in the order evident by the forbidden paths and their independent
subnetwork children/siblings. Since we consider the maximum among all such
possible networks, the solution is maximal. It is for this same reason when uand
vare non-trivial but the conditions are such that M[u, v] = −∞ by line 18, or
by an operand being −∞ in line 14 or line 16, that M[u, v] = −∞ is correctly
found.
ResearchGate has not been able to resolve any citations for this publication.
Conference Paper
Full-text available
Combining a set of phylogenetic trees into a single phylogenetic network that explains all of them is a fundamental challenge in evolutionary studies. In this paper, we apply the recently-introduced theoretical framework of cherry picking to design a class of heuristics that are guaranteed to produce a network containing each of the input trees, for practical-size datasets. The main contribution of this paper is the design and training of a machine learning model that captures essential information on the structure of the input trees and guides the algorithms towards better solutions. This is one of the first applications of machine learning to phylogenetic studies, and we show its promise with a proof-of-concept experimental study conducted on both simulated and real data consisting of binary trees with no missing taxa.
Article
Full-text available
Phylogenetic networks are used in biology to represent evolutionary histories. The class of orchard phylogenetic networks was recently introduced for their computational benefits, without any biological justification. Here, we show that orchard networks can be interpreted as trees with additional horizontal arcs. Therefore, they are closely related to tree-based networks, where the difference is that in tree-based networks the additional arcs do not need to be horizontal. Then, we use this new characterization to show that the space of orchard networks on n leaves with k reticulations is connected under the rNNI rearrangement move with diameter $$O(kn+n\log (n))$$ O ( k n + n log ( n ) ) .
Article
Full-text available
The evolutionary relationships among organisms have traditionally been represented using rooted phylogenetic trees. However, due to reticulate processes such as hybridization or lateral gene transfer, evolution cannot always be adequately represented by a phylogenetic tree, and rooted phylogenetic networks that describe such complex processes have been introduced as a generalization of rooted phylogenetic trees. In fact, estimating rooted phylogenetic networks from genomic sequence data and analyzing their structural properties is one of the most important tasks in contemporary phylogenetics. Over the last two decades, several subclasses of rooted phylogenetic networks (characterized by certain structural constraints) have been introduced in the literature, either to model specific biological phenomena or to enable tractable mathematical and computational analyses. In the present manuscript, we provide a thorough review of these network classes, as well as provide a biological interpretation of the structural constraints underlying these networks where possible. In addition, we discuss how imposing structural constraints on the network topology can be used to address the scalability and identifiability challenges faced in the estimation of phylogenetic networks from empirical data.
Article
Full-text available
Recently there has been considerable interest in the problem of finding a phylogenetic network with a minimum number of reticulation vertices which displays a given set of phylogenetic trees, that is, a network with minimum hybrid number. Such networks are useful for representing the evolution of species whose genomes have undergone processes such as lateral gene transfer and recombination that cannot be represented appropriately by a phylogenetic tree. Even so, as was recently pointed out in the literature, insisting that a network displays the set of trees can be an overly restrictive assumption when modeling certain evolutionary phenomena such as incomplete lineage sorting. In this paper, we thus consider the less restrictive notion of rigidly displaying which we introduce and study here. More specifically, we characterize when two trees can be rigidly displayed by a certain type of phylogenetic network called a temporal tree-child network in terms of fork-picking sequences. These are sequences of special subconfigurations of the two trees related to the well-studied cherry-picking sequences. We also show that, in case it exists, the rigid hybrid number for two phylogenetic trees is given by a minimum weight fork-picking sequence for the trees. Finally, we consider the relationship between the rigid hybrid number and three closely related numbers; the weak, beaded, and temporal hybrid numbers. In particular, we show that these numbers can all be different even for a fixed pair of trees, and also present an infinite family of pairs of trees which demonstrates that the difference between the rigid hybrid number and the temporal-hybrid number for two phylogenetic trees on the same set of n leaves can grow at least linearly with n .
Article
Full-text available
Phylogenetic networks are used to represent evolutionary scenarios in biology and linguistics. To find the most probable scenario, it may be necessary to compare candidate networks. In particular, one needs to distinguish different networks and determine whether one network is contained in another. In this paper, we introduce cherry-picking networks, a class of networks that can be reduced by a so-called cherry-picking sequence. We then show how to compare such networks using their sequences. We characterize reconstructible cherry-picking networks, which are the networks that are uniquely determined by the sequences that reduce them, making them distinguishable. Furthermore, we show that a cherry-picking network is contained in another cherry picking network if a sequence for the latter network reduces the former network, provided both networks can be reconstructed from their sequences in a similar way (i.e., they are in the same reconstructible class). Lastly, we show that the converse of the above statement holds for tree-child networks, thereby showing that Network Containment, the problem of checking whether a network is contained in another, can be solved by computing cherry picking sequences in linear time for tree-child networks.
Article
Full-text available
Rooted phylogenetic networks provide an explicit representation of the evolutionary history of a set X of sampled species. In contrast to phylogenetic trees which show only speciation events, networks can also accommodate reticulate processes (for example, hybrid evolution, endosymbiosis, and lateral gene transfer). A major goal in systematic biology is to infer evolutionary relationships, and while phylogenetic trees can be uniquely determined from various simple combinatorial data on X, for networks the reconstruction question is much more subtle. Here we ask when can a network be uniquely reconstructed from its ‘ancestral profile’ (the number of paths from each ancestral vertex to each element in X). We show that reconstruction holds (even within the class of all networks)for a class of networks we call ‘orchard networks’, and we provide a polynomial-time algorithm for reconstructing any orchard network from its ancestral profile. Our approach relies on establishing a structural theorem for orchard networks, which also provides for a fast (polynomial-time)algorithm to test if any given network is of orchard type. Since the class of orchard networks includes tree-sibling tree-consistent networks and tree-child networks, our result generalise reconstruction results from 2008 and 2009. Orchard networks allow for an unbounded number k of reticulation vertices, in contrast to tree-sibling tree-consistent networks and tree-child networks for which k is at most 2|X|−4 and |X|−1, respectively.
Article
Full-text available
We consider the problem of determining the topological structure of a phylogenetic network given only information about the path-length distances between taxa. In particular, one of the main results of the paper shows that binary tree-child networks are essentially determined by such information.
Article
Full-text available
Acyclic directed graphs (ADGs) are increasingly being viewed as more appropriate for representing certain evolutionary relationships, particularly in biology, than rooted trees. In this paper, we develop a framework for the analysis of these graphs which we call hybrid phylogenies. We are particularly interested in the problem whereby one is given a set of phylogenetic trees and wishes to determine a hybrid phylogeny that embeds each of these trees and which requires the smallest number of hybridisation events. We show that this quantity can be greatly reduced if additional species are involved, and investigate other combinatorial aspects of this and related questions.
Article
Phylogenetic networks are used to represent evolutionary relationships between species in biology. Such networks are often categorized into classes by their topological features, which stem from both biological and computational motivations. We study two network classes in this paper: tree-based networks and orchard networks. Tree-based networks are those that can be obtained by inserting edges between the edges of an underlying tree. Orchard networks are a recently introduced generalization of the class of tree-child networks. Structural characterizations have already been discovered for tree-based networks; this is not the case for orchard networks. In this paper, we introduce cherry covers—a unifying characterization of both network classes—in which we decompose the edges of the networks into so-called cherry shapes and reticulated cherry shapes. We show that cherry covers can be used to characterize the class of tree-based networks as well as the class of orchard networks. Moreover, we also generalize these results to non-binary networks.
Article
Recently, we have shown that calculating the minimum-temporal-hybridization number for a set [Formula: see text] of rooted binary phylogenetic trees is NP-hard and have characterized this minimum number when [Formula: see text] consists of exactly two trees. In this paper, we give the first characterization of the problem for [Formula: see text] being arbitrarily large. The characterization is in terms of cherries and the existence of a particular type of sequence. Furthermore, in an online appendix to the paper, we show that this new characterization can be used to show that computing the minimum-temporal hybridization number for two trees is fixed-parameter tractable.