ArticlePDF Available

A Note on Encodings of Phylogenetic Networks of Bounded Level

Authors:

Abstract

Driven by the need for better models that allow one to shed light into the question how life's diversity has evolved, phylogenetic networks have now joined phylogenetic trees in the center of phylogenetics research. Like phylogenetic trees, such networks canonically induce collections of phylogenetic trees, clusters, and triplets, respectively. Thus it is not surprising that many network approaches aim to reconstruct a phylogenetic network from such collections. Related to the well-studied perfect phylogeny problem, the following question is of fundamental importance in this context: When does one of the above collections encode (i.e. uniquely describe) the network that induces it? In this note, we present a complete answer to this question for the special case of a level-1 (phylogenetic) network by characterizing those level-1 networks for which an encoding in terms of one (or equivalently all) of the above collections exists. Given that this type of network forms the first layer of the rich hierarchy of level-k networks, k a non-negative integer, it is natural to wonder whether our arguments could be extended to members of that hierarchy for higher values for k. By giving examples, we show that this is not the case.
arXiv:0906.4324v1 [math.CO] 23 Jun 2009
A NOTE ON ENCODINGS OF PHYLOGENETIC NETWORKS
OF BOUNDED LEVEL
PHILIPPE GAMBETTE, KATHARINA T. HUBER
Abstract. Driven by the need for better models that allow one to shed light
into the question how life’s diversity has evolved, phylogenetic networks have
now joined phylogenetic trees in the center of phylogenetics research. Like
phylogenetic trees, such networks canonically induce collections of phyloge-
netic trees, clusters, and triplets, respectively. Thus it is not surprising that
many network approaches aim to reconstruct a phylogenetic network from such
collections. Related to the well-studied perfect phylogeny problem, the follow-
ing question is of fundamental importance in this context: When does one of
the above collections encode (i.e. uniquely describe) the network that induces
it?
In this note, we present a complete answer to this question for the special
case of a level-1 (phylogenetic) network by characterizing those level-1 net-
works for which an encoding in terms of one (or equivalently all) of the above
collections exists. Given that this type of network forms the first layer of the
rich hierarchy of level-knetworks, ka non-negative integer, it is natural to
wonder whether our arguments could be extended to members of that hierar-
chy for higher values for k. By giving examples, we show that this is not the
case.
Keywords: Phylogeny, phylogenetic networks, triplets, clusters, supernet-
work, level-1 network, perfect phylogeny problem.
1. Introduction
An improved understanding of the complex processes that drive evolution has
lent support to the idea that reticulate evolutionary events such as lateral gene
transfer or hybridization are more common than originally thought rendering a
phylogenetic tree (essentially a rooted leaf labelled graph-theoretical tree) too sim-
plistic a model to fully understand the complex processes that drive evolution.
Reflecting this, phylogenetic networks have now joined phylogenetic trees in the
center of phylogenetics research. Influenced by the diversity of questions posed by
evolutionary biologists that can be addressed with a phylogenetic networks, var-
ious alternative definitions of these types of networks have been developed over
the years [HB06]. These include split networks [BM04,BFSR95,HHML04] as
Date: June 23, 2009.
This work was supported by the French ANR projects ANR-06-BLAN-0148-01 (GRAAL) and
ANR-08-EMER-011-01 (PhylARIANE), and was initiated during the MIEP workshop in 2008.
1
2 PHILIPPE GAMBETTE, KATHARINA T. HUBER
well as ancestral recombination graphs [SH05], TOM networks [Wil06], level-knet-
works1with ka non-negative integer that in a some sense captures how complex
the network structure is, networks for studying the evolution of polyploid organisms
[MH06], tree-child and tree-sibling networks [CLRV08], to name just a few.
Apart from split networks which aim to give an implicit model of evolution and
are not the focus of this note, all other phylogenetic networks mentioned above aim
to provide an explicit model of evolution. Although slightly different in detail, they
are all based on the concept of a leaf-labelled rooted connected directed acyclic
graph (see the next section for a definition). For the convenience of the reader, we
depict an example of a phylogenetic network in the form of a level-1 network in
Fig. 1(a). Concerning these types of phylogenetic networks, it should be noted that
(a) (b) (c)
Figure 1. (a) A level-1 phylogenetic network N. (b) and (c) The
phylogenetic trees that form the tree system T(N).
they are closely related to galled trees [WZZ01,GEL03] and that, in addition to
constituting the first layer of the rich hierarchy of level-knetworks, they also form
a subclass of the large class of tree-sibling networks [AVP08].
Due to the rich combinatorial structure of phylogenetic networks, different com-
binatorial objects have been used to reconstruct them from biological data. For
a set Xof taxa (e.g. species or organisms), these include cluster systems of X,
that is, collections of subsets of X[BD89,HR08], triplet systems on X, that is,
collections of phylogenetic trees with just three leaves which are generally called
(rooted) triplets [JS04,TH09], and tree systems, that is, collections of phylogenetic
trees which all have leaf set X[Sem07]. The underlying rational being that any
phylogenetic network Ninduces a cluster system C(N), a triplet system R(N) and
a tree system T(N). Again we defer the precise definitions to later sections of
this note and remark that for the level-1 network Nwith leaf set X={a, b . . . , e}
depicted in Fig. 1(a), the cluster system C(N) consists of X, the five singleton sets
of X, and the subsets {a, b},{c, d},{b, c, d}, Y := {a, b, c, d}, and the tree system
T(N) consists of the phylogentic trees depicted in Fig. 1(b) and (c), respectively.
Denoting a phylogenetic tree ton x, y, z such that the root of tis not the parent
vertex of xand yby z|xy (or equivalently by xy|z) then the triplet system R(N)
consists of all triplets of the from e|xy where x, y Ydistinct, x|cd with x∈ {a, b},
and x|ab and a|bx with x∈ {c, d}.
Although undoubtedly highly relevant for phylogenetic network reconstruction,
the following fundamental question has however remained largely unanswered so
1Note that these networks were originally introduced in [JS04], but the definition commonly
used now is slightly different with the main difference being that every vertex of the network with
indegree 2 must have outdegree 1 (see e.g. [vIKK+08] and the references therein).
A NOTE ON ENCODINGS OF PHYLOGENETIC NETWORKS OF BOUNDED LEVEL 3
far (the main exception being the case when Nis in fact a phylogenetic tree in
which case this question is closely related to the well-studied perfect phylogeny
problem – see e.g. [GH07] for a recent overview.): When do the systems C(N),
R(N), or T(N) induced by a phylogenetic network Nencode N, that is, there is
no other phylogenetic network Nfor which the corresponding systems for Nand
Ncoincide?
Complementing the insights for when Nis a phylogenetic tree alluded to above,
answers were recently provided for R(N) in case Nis a very special type of level-k
network, k2, [vIKM09] and for T(N) for the special case that Nis a regular
network [Wil09]. Undoubtedly important first results, there are many types of
phylogenetic networks which are encoded by the tree system they induce but which
are not regular or by the triplet system they induce but do not belong to that
special class of level-2 networks. An example for both cases is the level-1 network
depicted in Fig. 1(a). Although one might be tempted to speculate that all level-1
networks enjoy this property, this is not the case since the level-1 networks depicted
in Fig. 1(a) and Fig. 2(b), respectively, induce the same tree system and the same
triplet system. The main result of this paper shows that these observations are not
a coincidence. More precisely, in Theorem 1we establish that a level-1 network N
is encoded by the triplet system R(N) (or equivalently by the tree system T(N)
or equivalently the cluster system S(N) = S(T(N)) := ST∈T (N)C(T) which arises
in the context of the softwired interpretation of N[HR08] and contains C(N)) if
and only if, when ignoring directions, Ndoes not contain a cycle of length 4.
Consequently the number of non-isomorphic (see below) phylogenetic networks N
which all induce the same tree system (or equivalently the same triplet system or
the same cluster system S(N)) grows exponentially in the number of cycles of Nof
length 4. It is of course highly tempting to speculate that a similar characterization
might hold for higher values of k. However as our examples show, establishing such
a result will require an alternative approach since our arguments cannot be extended
to level-2 networks and thus to level-knetworks with k2.
This note is organized as follows. In the next section, we present the definition
of a level-1 network plus surrounding terminology. In Section 3, we present the
definitions of the cluster system C(N) and the tree system T(N) induced by a phy-
logenetic network N. This also completes the definition of the cluster system S(N)
given in the introduction. Subsequent to this, we show that for any level-1 network
N, the cluster systems S(N) and C(N) are weak hierarchies (Proposition 1) which
are well-known in cluster analysis. In addition, we show that this property is not
enjoyed by level-2 networks and thus level-knetworks with k2. In Section 4, we
first present the definition of the triplet system R(N) induced by phylogenetic net-
work N. Subsequent to this, we turn our attention to the special case of encodings
of simple level-1 networks. In Section 5, we present our main result (Theorem 1).
To ease the presentation of our results, in all figures the (unique) root of a
network is the top vertex and all arcs are directed downwards, away from the root.
Furthermore, for any directed graph G, we denote the vertex set of Gby V(G) and
the set of arcs of Gby A(G).
4 PHILIPPE GAMBETTE, KATHARINA T. HUBER
2. Basic terminology and results concerning level-1networks
In this section we present the definitions of a phylogenetic network and of a level-
knetwork, k0. In addition we also provide the basic terminology surrounding
these structures.
Suppose Xis a finite set. A phylogenetic network Non Xis a rooted directed
acyclic graph (DAG) that satisfies the following additional properties. (i) The set
L(N) of leaves of N, that is vertices with indegee 1 and outdegree 0, is X. (ii)
Exactly one vertex of N, called the root and denoted by ρN, has indegree 0 and
outdegree 2. (iii) All vertices of Nthat are not contained in L(N) {ρN}are either
split vertices, that is, have indegree 1 and outdegree 2 or reticulation vertices, that
is, have indegree 2 and outdegree 1. The set of reticulation vertices of Nis denoted
by R(N). A phylogenetic network Nfor which R(N) is empty is called a (rooted)
phylogenetic tree (on X). Two phylogenetic networks Nand Nwhich both have
leaf set Xare said to be isomorphic if there exists a bijection from V(N) to V(N)
which is the identity on Xand induces a graph isomorphism between Nand N.
To present the definition of a level-knetwork, we need to introduce some termi-
nology concerning rooted DAGs first. Suppose Gis a rooted connected DAG with
at least 2 vertices. Then we denote the graph obtained from Gby ignoring the
directions on Gby U(G). If His a graph with at least 2 vertices then we call H
biconnected if Hdoes not contain a vertex whose removal disconnects it. A bicon-
nected component of His a maximal subgraph of Hthat is biconnected. If Gis a
phylogenetic network and Bis a rooted sub-DAG such that U(B) is a biconnected
component of U(G) then we call Bablob.
Following [vIKK+08], we call a phylogenetic network Nalevel-k network for
some non-negative integer kif each blob of Ncontains at most kreticulation ver-
tices. Note that some authors define a level-1 network Nto be a phylogenetic
network without the above outdegree requirement on the reticulation vertices of
N(see e.g. [JS04]). Also and sometimes on its own or in addition to the above,
the requirement that each blob contains at most kreticulation vertices is some-
times replaced by the requirement that the cycles in U(N) are node disjoint (see
e.g. [JS04,JS06]). Although in spirit the same definitions, the difference is that a
cycle is generally understood to have at least three vertices which implies that the
network depicted in Fig 2(a) would not be a level-1 network. However the definition
presented in [vIKK+08] would render that network a level-1 network. Having said
that, the network Ndepicted in Fig. 2(b) is a less parsimonious representation of
the same biological information (expressed in terms of the systems T(N), R(N),
C(N), and S(N)) as the level-1 network in Fig. 2(a) in the sense that the edges in
grey are redundant for displaying that information. To avoid these types of level-1
networks which cannot be encoded by any of the 4 systems of interest in this note,
we follow [vIKM09] and require that every blob in a level-1 network contains at
least 4 vertices.
For k= 1,2, it was shown in [vIKK+08] (see also [JS06] for the case k= 1)
that level-knetworks can be built up by chaining together structurally very simple
level-knetworks called simple level-knetworks. Defined for general non-negative
A NOTE ON ENCODINGS OF PHYLOGENETIC NETWORKS OF BOUNDED LEVEL 5
(a) (b)
Figure 2. The level-1 network Ndepicted in (a) induces and thus
represents the same triplet system R(N), cluster systems C(N) and
S(N), and tree system T(N) as the level-1 network Npresented
in (b). However, Nis a less parsimonious representation of these
4 systems.
integers k, these atomic building blocks are precisely those level-knetworks that
can be obtained from a level-kgenerator by applying a certain “leaf hanging”
operation [vIKK+08] to its “sides”. Such a generator is a biconnected directed
acyclic multi-graph which has a single root, precisely kpseudo-reticulation vertices
(i. e. vertices with indegree 2 and outdegree at most 1) and all other vertices are
split vertices where the root and a split vertex are defined as in the case of a
phylogenetic network. For the convenience of the reader, we present in Fig 3the
unique level-1 generator and all 4 level-2 generators which originally appeared in
slightly different form in [vIKK+08]. Regarding larger values for k, it was recently
G1G2
aG2
bG2
cG2
d
Figure 3. The unique level-1 generator G1, and the four level-2
generators: G2
a,G2
b,G2
cand G2
d.
shown in [Kel08] that there exist 65 level-3 generators. In addition, it was shown
in [GBP09] that there are 1993 level-4 generators and that the number of level-k
generators grows exponentially in k. A side of a generator Gis an arc of Gor one
of its pseudo-reticulation vertices.
From now on and unless stated otherwise, all phylogenetic networks have leaf
set X.
3. The Systems C(N),T(N), and S(N)
In this section, we introduce for a phylogenetic network Nthe associated systems
C(N), T(N), and S(N) already mentioned in the introduction. In addition, we
prove that in case Nis a level-1 network the associated systems C(N) and S(N)
are weak hierarchies. We conclude with presenting an example that shows that
level-knetworks, k2, do not enjoy this property in general. We start with some
definitions.
6 PHILIPPE GAMBETTE, KATHARINA T. HUBER
Suppose Nis a phylogenetic network. Then we say that a vertex aV(N) is
below a vertex bV(N) denoted by aNb, if there exists a path Pba (possibly of
length 0) from bto a. In this case, we also say that bis above a. Every vertex v
V(N) therefore induces a non-empty subset C(v) = CN(v) of Xwhich comprises
of all leaves of Nbelow v(see e.g. [SS03]). We collect the subsets C(v) induced by
the vertices vof Nthis way in the set C(N), i.e. we put C(N) = SvV(N){C(v)}.
For convenience, we refer to any collection Cof non-empty subsets of Xas a cluster
system (on X)and to the elements of Cas clusters of X. It should be noted that
in case Nis a binary phylogenetic tree, the cluster system C(N) is a hierarchy (on
X), that is, for any two clusters C1, C2∈ C(N) we have that C1C2∈ {∅, C1, C2}.
Hierarchies are sometimes also called laminar families, and it is well-known that the
set of clusters C(T) induced by a binary phylogenetic tree Tuniquely determines
that tree (see e.g. [SS03]).
In the context of phylogenetic network construction, the concept of a weak hi-
erarchy (on X)was introduced in [BD89]. These objects are defined as follows.
Suppose Cis a cluster system on X. Then Cis called a weak hierarchy (on X)if
C1C2C3∈ {C1C2, C2C3, C1C3}(1)
holds for any three elements C1, C2, C3∈ C. Note that the above property is
sometimes also called the weak Helly property [SS03]. Also note that any hierar-
chy is in particular a weak hierarchy and that any subset of a weak hierarchy is
again a weak hierarchy. Finally note that weak hierarchies are well-known objects
in classical hypergraph and abstract convexity theories [BD89] (see also the refer-
ence therein and [BBO04]), and that they where originally introduced into cluster
analysis as medinclus in [Bat88,Bat89].
We will establish the main result of this section (Proposition 1) by showing that
the cluster system S(N) associated to a level-1 network Nis a weak hierarchy.
To do this, we first need to complete the definition of S(N) which relies on the
definition of the system T(N). We will do this next.
Suppose Nis a phylogenetic network. Then we say that a phylogenetic tree Tis
displayed by Nif the leaf set of Tis Xand Tis a phylogenetic tree obtained from N
via the following process. For each reticulation vertex of Ndelete one incoming arc
and suppress any resulting degree 2 vertices. In case the root ρNof Nis rendered
a vertex with out-degree 1 this way, we identify ρNwith its unique child. The set
T(N) is the collection of all phylogenetic trees that are displayed by N. To every
vertex vV(N) a cluster system SN(v) defined by putting
SN(v) = {CT(v) : T∈ T (N)}
can be associated. Clearly, CN(v)∈ SN(v) and S(N) = SvV(N)SN(v).
To link clusters of Xwith level-1 networks on X, we say that a cluster Con X
is tree-consistent with a level-1network Nif C∈ S(N). More generally, we say
that a cluster system Cis tree-consistent with a level-1network Nif C ⊆ S(N)
holds. Thus, for any level-1 network Nthe cluster system S(N) equals the set of
all clusters of Xthat are tree-consistent with N. Finally, we say that a cluster
A NOTE ON ENCODINGS OF PHYLOGENETIC NETWORKS OF BOUNDED LEVEL 7
system Cis level-1-consistent if there exists a level-1 network Nsuch that Cis
tree-consistent with N.
We next establish Proposition 1. Its proof relies on a characterization of a weak
hierarchy Hon Xfrom [BD89, Lemma 1] in terms of a property of a certain H-
closure that can be canonically associated to H. More precisely, suppose ∅ 6=YX
and His a cluster system on X. Then the H-closure hYiHof Yis the intersection
TYC, C∈H C. Now a cluster system Hon Xis a weak hierarchy if and only
if for every non-empty subset AXthere exists elements a, aAsuch that
hAiH=h{a, a}iH. Note that this implies in particular that the number of elements
in a weak hierarchy is at most |X|+1
2[BD89]. With regards to this bound it
should be noted that it was recently shown in [KNTX08] that the size of a cluster
system which is tree-consistent with a level-1 network Nis linear in |X|. In view
of Proposition 1, this bound improves on the previous bound for this special kind
of weak hierarchy.
Proposition 1. A level-1-consistent cluster system is a weak hierarchy. In par-
ticular, the systems S(N)and C(N)associated to a level-1network Nare weak
hierarchies.
Proof: Since every subset of a weak hierarchy is again a weak hierarchy, it suffices
to show that for every level-1 network Nthe associated cluster system S(N) is a
weak hierarchy. To see this suppose Nis a level-1 network on X={x1,...,xn},
n1. Consider a graphical representation of Nand, starting from the left most
leaf of Nin that representation, let x1. . . xndenote the induced ordering of the
leaves of N(note that this might involve re-labelling some of the elements in X).
Suppose ∅ 6=AX. Let i, j ∈ {1,...,n}be such that xjAand every leaf in
Xsucceeding xjin that ordering is not contained in A. Similarly, let xiAbe
such that every leaf in Xpreceding xiin that ordering is not contained in A. We
claim that hAiS(N)=h{xi, xj}iS(N). To see this, note that since Nis a level-1
network, there exists a subtree Tof Nsuch that the leaf set of Tis A. Note that
Tmight contain vertices whose indegree and outdegree is one. By deleting for each
reticulation vertex below the root of Tone of its incommming arcs and supress-
ing the resulting degree 2 vertex Tcan be canonically extended to a subtree Tof
some tree T′′ T (N) such that {xi, xi+1,...,xj} ⊆ L(T) and L(T) is minimal
with regards to set inclusion. Note that L(T)∈ S(N). But then, by construction,
hAiS(N)=L(T) = h{xi, xj}iS(N)which proves the claim.
We remark in passing that to any cluster system Cof Xa similarity measure
DC:X×XRcan be associated to Cby putting DC(a, b) = |{C∈ C :a, b C}|,
a, b X. Proposition 1combined with the main result from [BD89] implies that any
tree-consistent cluster system Ccan be uniquely reconstructed from its associated
similarity measure DC. Using the well-known Farris transform (see e. g. [SS03], and
[DHM07] for a recent overview) a similarity measure can be canonically transformed
into a distance measure DCon X, that is, a map on X×Xinto the non-negative
reals that is symmetric, satisfies the triangle inequality, and vanishes on the main
diagonal. The latter measures were recently investigated in [CJLY05] from an
algorithmical point of view in the context of representing them in terms of an
8 PHILIPPE GAMBETTE, KATHARINA T. HUBER
ultrametric level-1 network. These are generalizations of ultrametric phylogenetic
trees in the sense that every path from the root of the network to any leaf is of the
same length.
We conclude this section with remarking that as the example of the level-2
network Npresented in Fig. 4(a) shows, the result analogous to Proposition 1
does not hold for level-2 networks since {{a, b, c},{a, b, d},{b, c, d}} ⊆ S(N) but
{a, b, c} ∩ {a, b, d} ∩ {b, c, d}={b}. Furthermore, we remark that as the example of
(a) (b) (c)
Figure 4. (a) A level-2 network Nfor which S(N) is not a weak
hierarchy. The phylogenetic network Ndepicted in (b) does not
display the phylogenetic tree Tdepicted in (c) but C(T) is tree-
consistent with N.
the level-2 network Ndepicted in Fig. 4(b) combined with the cluster system C(T)
induced by the phylogenetic tree Tdepicted in Fig. 4(c) shows, a cluster system
C(T) induced by a phylogenetic tree Tcan be contained in the cluster system S(N)
of a level-2 network Nand Nneed not display T.
4. Simple level-1Networks
In this section we turn our attention to studying simple level-1 networks. In
particular, we establish a fundamental property of these networks with regards to
encodings of level-1 networks. To do this, we require some more definitions. We
start with the definition of the triplet system R(N) induced by a phylogenetic
network N.
Suppose Nis a phylogenetic network. If YXis a subset of Xof size 3, then
Ninduces a triplet ton Xby taking tto be a minimal subtree of Nwith leaf set
Yand suppressing resulting degree two vertices of t. The set of triplets induced
on Xby Nthis way is the triplet system R(N). Two properties of this triplet
systems should be noted. First, every triplet in R(N) is consistent with N, where
a triplet x|yz is called consistent with a phylogenetic network Nif x, y, z Xand
there exist two vertices u, v V(N) and pairwise internally vertex-disjoint paths
in Nfrom uto y,uto z,vto uand vto x. Note that a triplet system Ris called
consistent with a phylogenetic network Nif every triplet in Ris consistent with N.
For convenience, we will sometimes say that a phylogenetic network Nis consistent
with a triplet t(or a triplet system R) if t(or R) is consistent with N. In case R
is consistent with a phylogenetic network Nand R=R(N) then we say that R
reflects N. Alternatively, we will say that Ris reflected by N. For example, the
triplet set R={a|bc, c|ab}is reflected by the three simple level-1 networks SLi
1(T),
A NOTE ON ENCODINGS OF PHYLOGENETIC NETWORKS OF BOUNDED LEVEL 9
i∈ {1,2,3}on {a, b, c}depicted in Fig. 5which appeared in slightly different form
in [JNS06].
SL1
1(T)SL2
1(T)SL3
1(T)
Figure 5. The three non-isomorphic simple level-1 networks on
{a, b, c}that all reflect the triplet system R={a|bc, c|ab}.
Second, the triplet system R(N) is always dense, where a triplet system Ron X
is called dense if for any three elements in a, b, c Xthere exits a triplet t∈ R such
that L(t) = {a, b, c}. Arguably unassumingly looking, the concept of a dense triplet
set has proven vital for level-knetwork reconstruction, k1, from triplet systems.
More precisely, the only known polynomial time algorithms for constructing level-
1 and level-2 networks Nconsistent with such triplet systems construct N, (if it
exist) by essentially building it up recursively from simple level-1 and simple level-2
networks [JS06,vIKK+08]. If the assumption that Ris dense is dropped however,
then it is NP-hard to decide if there exists a level-knetwork, k= 1,2, consistent
with R[JS06,vIKK+08]. For larger values of k, a polynomial time algorithm for
constructing a level-knetwork from a dense triple set was recently presented in
[TH09].
The next result is rather technical2but plays a crucial role in the proof of our
main result (Theorem 1) as it shows that although all three simple level-1 networks
depicted in Fig. 5reflect the same triplet set this property is lost when adding an
additional leaf to a non-cut-arc of each of them. For a directed graph Gthese arcs
are the elements in A(G) whose removal disconnect G. To establish our result, we
require some more definitions and notations.
Suppose Nis a phylogenetic network and a, b V(N) such that ais below b. If
cis a further vertex in V(N) and aNband cNbholds then we call bacommon
ancestor of aand c. A lowest common ancestor lcaN(a, c) of aand cis a common
ancestor of aand cand no other vertex below lcaN(a, c) is a common ancestor of
aand c. Note that in a level-0 or level-1 network N, the lowest common ancestor
between any two distinct leaves of Nis always unique whereas this need not be the
case for level-knetworks with larger k.
Now suppose Nis one of the simple level-1 networks SLi
1(T), i∈ {1,2,3}, on
X={a, b, c}depicted in Fig. 5. Let e=uv A(N) be a non-cut arc and suppose
that d6∈ X. Then we denote by Nedthe level-1 network obtained from Nby
adding a new vertex wto V(N) and replacing eby the arcs uw,wv, and wd. We
remark that if the knowledge of eis of no relevance to the presented argument,
then we will write Ndrather than Ned.
2A case analysis based alternative proof of this result may be found in [GBP08].
10 PHILIPPE GAMBETTE, KATHARINA T. HUBER
Lemma 1. Suppose X={a, b, c, d}and T={a|bc, c|ab}. Then, for any two
distinct i, j ∈ {1,2,3},
R(SLi
1(T)d)6=R(SLj
1(T)d).
Proof: Put Nk:= S Lk
1(T), k∈ {1,2,3}, and assume that there exist distinct
i, j ∈ {1,2,3}and non-cut-arcs eiA(Ni) and ejA(Nj) such that R(Ni
eid) =
R(Nj
ejd). By symmetry, it suffices to consider the cases (i, j)∈ {(2,1),(2,3)}. For
k∈ {1,2,3}, let uk, vkV(Nk) such that ek=ukvk. Also for k∈ {1,2,3}, let wk6∈
V(Nk) denote the new vertex in V(Nk
ekd) such that by replacing the arc ekby the
arcs ukwk,wkvk, and adding the arc wkdthe new network Nk
ekdis obtained from
Nk. Note that for all k∈ {1,2,3}, both Nkand Nkdhave the same root and the
same reticulation vertex which we denote by ρkand rk, respectively. Furthermore,
for all x, y ∈ {a, b, c}we have lcaNk(x, y) = lcaNkd(x, y) We distinguish the cases
that u2=ρ2and that u26=ρ2.
Suppose first that u2=ρ2and put l=lcaN2(a, b). Then e2∈ {ρ2r2, ρ2l}.
We first establish that j6= 1. Assume for contradiction that j= 1. For all
s, t V(N2
e2d) such that tis below sdenote a path from sto tin N2
e2d
by Pst. Observe that d|ac ∈ R(N2
e2d), e2∈ {ρ2r2, ρ2l}, holds. Indeed, since
c|ab ∈ R(N2) the paths Pla and Plc exist and do not have an internal vertex in
common. Furthermore, w26=land either the arc w2lor the arcs ρ2land ρ2w2
exist. In both cases the paths Pw2l,Pρ2land Pρ2w2, consisting of the arcs w2l,ρ2l
and ρ2w2, respectively, do not have an internal vertex in common with either Pla
or Plc. Thus, d|ac ∈ R(N2
e2d), as required. By assumption, d|ac ∈ R(N1
e1d)
follows which is impossible since lcaN1(a, c) = ρ1and so d|ac =R(N1
ed), for all
non-cut arcs eA(N1). Thus, j6= 1, as required.
If j= 3 then R(N2
e2d) = R(N3
e3d) and v2=r2or v2=l. If v2=r2
then b|cd ∈ R(N2
e2d) = R(N3
e3d) follows. But then w3cannot be a vertex
on the path in N3
e3dfrom ρ3to bor on the path from ρ3to r3which avoids
lcaN3(a, c) Thus, u3=lcaN3(a, b) and so b|da ∈ R(N3
e3d) = R(N2
e2d) which is
impossible as lcaN2d(a, d) = ρ2and thus always above l. If v2=lthen d|bc, c|bd
R(N2
e2d) = R(N3
e3d) follows which is again impossible since if w3lies on the
path from lcaN3(a, c) to r3then d|bc 6∈ R(N3
e3d) and if not then u3=ρ3and so
c|bd 6∈ R(N3
e3d). Thus, j6= 3.
Now suppose that u26=ρ2. Then u2∈ {l, lcaN2(b, c)}Observe that arguments
similar to the previous ones imply that a|cd, a|bd, c|ad, c|bd ∈ R(N2
e2d) holds for all
u∈ {l, lcaN2(b, c)}. If j= 1 then u16=ρ1as otherwise a|bd or c|bd does not belong
to R(N1
e1d) = R(N2
e2d). Thus v1=r1and u1∈ {lcaN1(b, a), lcaN1(b, c)}. If
u1=lcaN1(b, a) then a|cd 6∈ R(N1
e1d) which is impossible. Swapping the roles of
aand cin the previous argument shows that u1=lcaN1(b, c) cannot hold either.
Thus, j6= 1.
If j= 3 then again since a|cd, c|ad ∈ R(N1
e1d), it follows that e3must be
an arc on the path Pfrom lcaN3(a, c) to r3. Note that similar arguments as the
ones used above imply that either d|bc or b|cd is contained in R(N2
e2d). But
b|cd 6∈ R(N3
e3d) = R(N2
e2d) and so d|bc ∈ R(N2
e2d) must hold. But this is
A NOTE ON ENCODINGS OF PHYLOGENETIC NETWORKS OF BOUNDED LEVEL 11
impossible since then c|ad, d|bc ∈ R(N2
e2d) but there exists no non-cut-arc eon
Psuch that both triplets are simultaneously contained in R(N3
ed). Thus, j6= 3.
5. Encodings of Level-kNetworks
In this section, we characterize those level-1 networks Nthat are encoded by
the triplet system R(N), or equivalently the tree system T(N), or equivalently
the cluster system S(N) they induce. In addition, we present an example that
illustrates that our arguments cannot be extended to establish the corresponding
result for level-2 networks and therefore to level-knetworks with k3.
Bearing in mind that there exist triplet system which can be reflected by more
than one level-1 network, we denote the collection of all level-1 networks that reflect
a triplet system Rby L1(R). Clearly, if Ris reflected by a level-1 network Nthen
NL1(R(N)) and so |L1(R(N))| ≥ 1. Similarly, we denote for a tree system T
the collection of all level-1 networks Nfor which T=T(N) holds by L1(T), and
for a cluster system Cthe collection of all level-1 networks Nfor which C=S(N)
holds by L1(C). As in the case of triplet systems, there exist tree systems Tand
cluster systems Cwith |L1(T)| ≥ 1 and |L1(C)| ≥ 1, respectively.
Clearly, any cluster CXinduces a triplet system R(C) of triplets on Xdefined
by putting
R(C) = {c1c2|x:c1, c2Cand xXC}.
Thus, any non-empty cluster system Con Xinduces a triplet system R(C) defined
by putting R(C) := SC∈C R(C). The next result establishes a link between the
triplet system induced by a level-1 network Nand the triplet system R(S(N)).
Lemma 2. Suppose Nis a level-1network with at least 3 leaves. Then
R(N) = [
T∈T (N)
R(T) = [
C∈S(N)
R(C).
Proof: That ST∈T (N)R(T) = SC∈S (N)R(C) holds is trivial. Also it is straight
forward to see that ST∈T (N)R(T)⊆ R(N). To see the converse set inclusion,
suppose that t∈ R(N). Let x1, x2, x3Xsuch that t=x1x2|x3. Then with
lca(x1, x2) := lcaN(x1, x2) we have x36∈ CN(lcaN(x1, x2)) and lca(x1, x2) does not
equal the root ρNof N. Let Pidenote a path from ρNto xi,i= 1,2 and let T
denote the phylogenetic tree on Xobtained from Nby modifying all reticulation
vertices vof Nin the following way. If v6∈ V(P1)V(P2) then randomly delete
one of the incoming arcs of vand suppress the resulting degree 2 vertex. If this
results in the decrease of the outdegree of the root ρNof Nthen identify ρNwith
is unique child. If vV(Pi), i= 1,2, then delete that incoming arc of vthat is
not an arc of Piand suppress the resulting degree 2 vertex. Clearly, Tis displayed
by Nand so tST∈T (N)R(T). Thus, R(N)ST∈T (N)R(T) must hold which
implies the lemma.
12 PHILIPPE GAMBETTE, KATHARINA T. HUBER
Note that as the example of the level-2 network depicted in Fig. 6shows, the
relationship between the triplet system of a level-1 network Nand the triplet system
induced by the clusters in S(N) does not hold for level-2 networks.
Figure 6. A level-2 phylogenetic network Nwith c1c2|x1∈ R(N),
but {c1, c2} 6∈ S(N).
To prove the main result of this note (Theorem 1) which we will do next, we
require some additional definitions and notations. Suppose Nis a phylogenetic
network. Then we call a subset {x, y} ⊆ Xacherry of Nif there exists a vertex
vV(N) such that vx, vy A(N). Furthermore, if Nis a level-1 network and
xXthen we denote by Nxthe level-1 network obtained from Nby removing
x(and its incident arc) and suppressing the resulting degree 2 vertex. In addition,
we say that Nis a strict level-1 network if Nis not a phylogenetic tree. Finally,
to a triplet system Rand some xSt∈R L(t), we associate the triplet set Rx:=
{t∈ R :x6∈ L(t)}.
Armed with these definitions and notations we are now ready to establish our
main result.
Theorem 1. Suppose Nis a level-1network with at least 3 leaves. Then the
following statements are equivalent
(i) Ncontains a blob with four vertices.
(ii) |L1(R(N))|>1.
(iii) |L1(S(N))|>1.
(iv) |L1(T(N))|>1.
Proof: (i) (iv): This is an immediate consequence of the fact that all simple
level-1 networks depicted in Fig. 5induce the same set of phylogenetic trees.
(iv) (ii): Suppose that Nis a level-1 network such that |L1(T(N))|>1. Then
there exists a level-1 network Ndistinct from Nsuch that T(N) = T(N). Com-
bined with Lemma 2,R(N) = ST∈T (N)R(T) = ST∈T (N)R(T) = R(N) follows
and so NL1(R(N)). Thus, |L1(R(N))|>1, as required.
(ii) (i) We will show by induction on the number nof leaves of Nthat if every
blob in Ncontains at least 5 vertices then |L1(R(N))|= 1. Suppose Nis a level-
1 network with nleaves such that every blob of Ncontains at least 5 vertices.
Note that we may assume that Ncontains at least one blob since otherwise Nis a
phylogenetic tree and so |L1(R(N))|= 1 clearly holds. But then n4. If n= 4
then, using Lemma 1, it is straightforward to verify that |L1(R(N))|= 1.
A NOTE ON ENCODINGS OF PHYLOGENETIC NETWORKS OF BOUNDED LEVEL 13
Suppose n > 4. Assume for every level-1 network N0with n0< n leaves that
|L1(R(N0))|= 1 holds whenever N0is a phylogenetic tree or every blob in N0
contains at least 5 vertices. Suppose for contradiction that |L1(R(N))| ≥ 2. Choose
some NL1(R(N)) distinct from N. Then R:= R(N) = R(N). We distinguish
the cases that Ncontains a cherry and that it does not.
Suppose first that Ncontains a cherry {x, y}. Without loss of generality, we
may assume that this cherry is as far away from the root of Nas possible. Then
since Nis a strict level-1 network all of whose blobs contain at least 5 vertices,
Nxmust enjoy the same property with regards to its blobs (if Nxstill has
blobs). But then, by induction hypothesis, |L1(R(Nx))|= 1 and so Nxis the
unique level-1 network that reflects R(Nx) = Rx. Since by the choice of x, for
every leaf zin Ndistinct from xand y, only the triplet z|xy out of the 3 possible
triplets on {x, y, z }is contained in R=R(N), it follows that {x, y}must also be
a cherry in N. But then N=Nwhich is impossible. Thus, |L1(R(N))|= 1 must
hold in this case.
Now suppose that Ndoes not contain a cherry. Then there exists a blob Bin
Nsuch that all cut-arcs that start with a vertex in Bmust end in a leaf of N. For
each such leaf z, which we will also call a leaf of B, we denote by zthe vertex of B
such that zzis that cut-arc of N. Furthermore, denote by pthe leaf of Bsuch that
pis the reticulation vertex in B. Let y1and y2the vertices in V(N)V(B) such
that y
1and y
2are the two parent vertices of pin B. Note that the root ρ=ρBof
Bcould be y
1or y
2but not both and that whenever y
i6=ρ,i= 1,2, then yiis a
leaf of B(hence the abuse of notation). Without loss of generality, we may assume
that the path Pρy
1from ρto y
1in Bis at least as long as the path Pρy
2from ρ
to y
2in B(where we allow paths of length zero). Thus, y1must be a leaf of B.
Since Pρy
1is at least as long as Pρy
2and, by assumption on N,Bcontains at least
5 vertices, there must exist a leaf yof Bdistinct from y1such that yV(Pρy
1).
Note that we may assume without loss of generality that yis the predecessor of y
1
on that path. We distinguish the cases that |V(B)|>5 and that |V(B)|= 5.
Suppose first |V(B)|>5. Since a blob in the level-1 network Ny1has clearly at
least 5 vertices, we have |L1(R(Ny1))|= 1 by the induction hypothesis. But then
Ny1is the unique level-1 network that reflects R(Ny1) = Ry1. Consequently,
since R=R(N) we have Ny1=Ny1. To see that Nequals Nsuppose zis a
leaf of Bdistinct from y1, y, p (which must exist by assumption on B). Then either
t:= z|yp, p|yz ∈ R or t, y |zp ∈ R holds. We only discuss the case that t, p|yz ∈ R
since the case t, y|z p ∈ R is symmetric. Let Bdenote the blob in Ny1obtained
from Bby deleting y1plus its incident arc and suppressing the resulting degree
2 vertex. Since z,y, and pare leaves of Band the choice of y1implies that
y1y|p, y|y1p∈ R(N) = R(N), it follows that there exists some blob Bin Nsuch
that B=By1. Moreover, the suppressed degree 2 vertex of V(B) is adjacent
(in B) with yand p, respectively, since otherwise y1|yp ∈ R(N) = R(N) would
hold which contradicts the choice of y1. Thus N=Nand so |L1(R(N))|= 1 must
hold in case |V(B)|>5.
We conclude with analyzing the case |V(B)|= 5. Then either ρ=y2and so
Bhas, in addition to the leaves y1, y, p, precisely one more leaf z, or ρ6=y2and
14 PHILIPPE GAMBETTE, KATHARINA T. HUBER
the leaves of Bare y1, y2, y and p. We first consider the case ρ6=y2. Consider the
level-1 network N− {y1, y
1}obtained from Nby removing y1, its parent vertex
y
1and their 3 incident arcs (plus suppressing resulting degree 2 vertices) thus
effectively turning Binto a phylogenetic tree on the leaves y, p, y2, i.e. the triplet
t:= y|py2. Put Rt:= Ry1∪ {t}. Since either N− {y1, y
1}is a phylogenetic tree
or a strict level-1 network such that each of its blobs contains at least 5 vertices,
the induction hypothesis implies |L1(R(N− {y1, y
1}))|= 1. Thus, N− {y1, y
1}
is the unique level-1 network that reflects Rt. Note that the only way to turn
N−{y1, y
1}into a level-1 network that, in addition to reflecting Rt, is also consistent
with t:= y2|py ∈ R is to replace tby one of the level-1 networks SLj
1({t, t}),
jY:= {1,2,3}. Denote that element in Yby jN. Since R(N) = R(N) it follows
that the level-1 network obtained from Nby removing y1, its parent vertex, and
their 3 incident arcs (suppressing resulting degree 2 vertices) must equal N−{y1, y
1}
with treplaced by one of SLj
1({t, t}), jY. Denote that element in Yby jN.
Since {y1|py2, y2|py1, y2|y1y, p|y1y , y|y1p, y2|py, t} ⊆ R =R(N) it is easy to check
that jN=jNmust hold and so Nand Nmust be equal which is again impossible.
Thus, |L1(R(N))|= 1 must hold in case ρ6=y2. Using arguments similar to the
previous ones it is straight-forward N=Nand thus |L1(R(N))|= 1 must hold in
case ρ=y2.
(iv) (iii): Suppose that Nis a level-1 network with L1(T(N))|>1. Then there
exists a level-1 network NL1(T(N)) distinct from Nwith T(N) = T(N). But
then S(N) = ST∈T (N)C(T) = ST∈T (N)C(T) = S(N) and so NL1(S(N)).
Thus, L1(S(N))|>1.
(iii) (ii): Suppose that Nis a level-1 network with |L1(S(N))|>1. Then there
exists a level-1 network NL1(S(N))|>1 distinct from Nsuch that S(N) =
S(N). But then Lemma 2implies R(N) = SC∈S(N)R(C) = SC∈S (N)R(C) =
R(N) and so NL1(R(N)). Hence, L1(R(N))|>1.
It should be noted that Theorem 1immediately implies
Corollary 1. Let Nbe a level-1network with at least 3 leaves. The number of
non-isomorphic level-1networks Nthat reflect R(N)(or equivalently for which
T(N) = T(N)or equivalently S(N) = S(N)holds) is 3b, where bis the number
of blobs of Nof size four.
We remark that the strategy underlying the proof of Theorem 1does not imme-
diately extend to level-knetworks with k2. The main reasons for this are that,
as already mentioned above, for k2 the number of distinct level-kgenerators
grows exponentially in k[GBP09]. Also the problem of understanding when two
distinct simple level-2 networks reflect the same set of triplets is far less well under-
stood. For example, consider the two level-2 networks depicted in Figure 7. Each
one of them is a simple level-2 network obtained by hanging leaves of the sides of
the level-2 generators G2
aand G2
bdepicted in Figure 3. As can be quickly verified,
both networks reflect the same triplet set. However adding additional leaves to
both networks by subdividing the arc one of whose end vertices forms an arc with
x1and the other forms an arc with x2and attaching additional leaves results in
A NOTE ON ENCODINGS OF PHYLOGENETIC NETWORKS OF BOUNDED LEVEL 15
two distinct level-2 networks that still reflect the same triplet system. Regarding
the accurate reconstruction of level-knetworks from e.g. triplet data, this results
highlights a serious limitation of level-2 networks (and probably level-knetworks in
general) as two such network with very different structure might reflect the same
triplet set.
Figure 7. Both simple level-2 networks reflect the triplet set
{a|x1b, b|x1a, x1|ab, a|x2b, b|x2a, x2|ab, x1|x2a, a|x1x2, x1|x2b,
b|x1x2}.
We conclude with remarking that phylogenetic trees on Xcan also be viewed
as trees together with a bijective labelling map between Xand the leaf set of such
trees. Taking this point of view, phylogenetic trees were generalized in [MH06] to
MUL-trees by allowing two or more leaves of that tree to have the same label. For
example, the tree obtained from the phylogenetic tree depicted in Figure 1(c) by
replacing the leaf labelled aby the cherry labelled {a, b}is such a tree. In fact,
this is the MUL-tree induced by the level-1 network Ndepicted in Figure 1(a) that
shows all paths from the root of Nto all leaves of N. For a level-1 network Nit is
easily seen that the MUL-tree M(N) induced by Nthis way is in fact an encoding
of Nin the sense that Nis the unique level-1 network that can give rise to M(N).
References
[AVP08] Miguel Arenas, Gabriel Valiente, and David Posada. Characterization of reticulate
networks based on the coalescent. Molecular Biology and Evolution, 25:2517–2520,
2008. 2
[Bat88] A. Batbedat. Les isomorphismes hte et hts, apr`es la bijection de Benzecri-Johnson.
Metron, 46:47–59, 1988. 6
[Bat89] A. Batbedat. Les dissimilarit´es M´edas ou Arbas. Statistique et analyse des donn´ees,
14:1–18, 1989. 6
[BBO04] Jean-Pierre Barth´el´emy, Francois Brucker, and Christophe Osswald. Combinatorial
optimization and hierarchical classifications. 4OR: A Quaterly Journal of Operations
Research, 2(3):179–219, 2004. 6
[BD89] Hans-J¨urgen Bandelt and Andreas W. M. Dress. Weak hierarchies associated with sim-
ilarity measures: an additive clustering technique. Bulletin of Mathematical Biology,
51:113–166, 1989. 2,6,7
[BFSR95] Hans-J¨urgen Bandelt, Peter Forster, Bryan C. Sykes, and Martin B. Richards. Mito-
chondrial portraits of human population using median networks. Genetics, 141:743–
753, 1995. 1
16 PHILIPPE GAMBETTE, KATHARINA T. HUBER
[BM04] David Bryant and Vincent Moulton. NeighborNet: An agglomerative method for the
construction of phylogenetic networks. Molecular Biology and Evolution, 21(2):255–
265, 2004. 1
[CJLY05] Ho-Leung Chan, Jesper Jansson, Tak-Wah Lam, and Siu-Ming Yiu. Reconstructing
an ultrametric galled phylogenetic network from a distance matrix. In Proceedings of
the 30th International Symposium on Mathematical Foundations of Computer Sci-
ence (MFCS’05), volume 3618 of Lecture Notes in Computer Science, pages 224–235.
Springer Verlag, 2005. 7
[CLRV08] Gabriel Cardona, Merc`e Llabr´es, Francesc Rossell´o, and Gabriel Valiente. A distance
metric for a class of tree-sibling phylogenetic networks. Bioinformatics, 24(13):1481–
1488, 2008. 2
[DHM07] Andreas W. M. Dress, Katharina T. Huber, and Vincent Moulton. Some uses of the
Farris transform in mathematics and phylogenetics - a review. Annals of Combina-
torics, 11(1):1–37, 2007. 7
[GBP08] Philippe Gambette, Vincent Berry, and Christophe Paul. An obstruction approach to
reconstruct phylogenies and level-k networks from triplets, 2008. Manuscript. 9
[GBP09] Philippe Gambette, Vincent Berry, and Christophe Paul. The structure of level-k phy-
logenetic networks. In Proceedings of the 20th Annual Symposium on Combinatorial
Pattern Matching (CPM’09), 2009. 5,14
[GEL03] Dan Gusfield, Satish Eddhu, and Charles Langley. Efficient reconstruction of phylo-
genetic networks with constrained recombination. In Proceedings of the 2003 IEEE
Computational Systems Bioinformatics Conference (CSB2003), pages 363–374, 2003.
2
[GH07] Stefan Gr¨unewald and Katharina T. Huber. Identifying and defining trees. In Olivier
Gascuel and Mike Steel, editors, Reconstructing Evolution, New Mathematical and
Computational Advances, pages 217–246. Oxford University Press, 2007. 3
[HB06] Daniel H. Huson and David Bryant. Application of phylogenetic networks in evolu-
tionary studies. Molecular Biology and Evolution, 23(2):254–267, 2006. 1
[HHML04] Barbara R. Holland, Katharina T. Huber, Vincent Moulton, and Peter J. Lockhart.
Using consensus networks to visualize contradictory evidence for species phylogeny.
Molecular Biology and Evolution, 21(7):1459–1461, 2004. 1
[HR08] Daniel H. Huson and Regula Rupp. Summarizing multiple gene trees using gene
networks. In Proceedings of the eighth Workshop on Algorithms in Bioinformatics
(WABI’08), volume 5251 of Lecture Notes in Computer Science, pages 296–305.
Springer Verlag, 2008. 2,3
[JNS06] Jesper Jansson, Nguyen Bao Nguyen, and Wing-Kin Sung. Algorithms for combin-
ing rooted triplets into a galled phylogenetic network. SIAM Journal on Computing,
35(5):1098–1121, 2006. 9
[JS04] Jesper Jansson and Wing-Kin Sung. Inferring a level-1 phylogenetic network from a
dense set of rooted triplets. In Proceedings of the tenth Annual International Com-
puting and Combinatorics Conference (COCOON’04), volume 3106 of Lecture Notes
in Computer Science, pages 462–471. Springer Verlag, 2004. 2,4
[JS06] Jesper Jansson and Wing-Kin Sung. Inferring a level-1 phylogenetic network from a
dense set of rooted triplets. Theoretical Computer Science, 363(1):60–68, 2006. 4,9
[Kel08] Steven Kelk. http://homepages.cwi.nl/kelk/lev3gen/, 2008. 5
[KNTX08] Iyad A. Kanj, Luay Nakhleh, Cuong Than, and Ge Xia. Seeing the trees and their
branches in the network is hard. Theoretical Computer Science, 401:153–164, 2008. 7
[MH06] Vincent Moulton and Katharina T. Huber. Phylogenetic networks from multi-labeled
trees. Journal of Mathematical Biology, 52(5):613–632, 2006. 2,15
[Sem07] Charles Semple. Hybridization networks. In Olivier Gascuel and Mike Steel, editors,
Reconstructing Evolution, New Mathematical and Computational Advances, pages
277–314. Oxford University Press, 2007. 2
[SH05] Yun S. Song and Jotun Hein. Constructing minimal ancestral recombination graphs.
Journal of Computational Biology, 12(2):147–169, 2005. 2
[SS03] Charles Semple and Mike Steel. Phylogenetics. Oxford University Press, 2003. 6,7
[TH09] Thu-Hien To and Michel Habib. Level-k phylogenetic network can be constructed from
a dense triplet set in polynomial time. In Proceedings of the 20th Annual Symposium
on Combinatorial Pattern Matching (CPM’09), 2009. 2,9
A NOTE ON ENCODINGS OF PHYLOGENETIC NETWORKS OF BOUNDED LEVEL 17
[vIKK+08] Leo van Iersel, Judith Keijsper, Steven Kelk, Leen Stougie, Ferry Hagen, and Teun
Boekhout. Constructing level-2 phylogenetic networks from triplets. In Proceedings of
the twelfth Annual International Conference on Research in Computational Molecular
Biology (RECOMB’08), volume 4955 of Lecture Notes in Computer Science, pages
450–462. Springer Verlag, 2008. 2,4,5,9
[vIKM09] Leo van Iersel, Steven Kelk, and Matthias Mnich. Uniqueness, intractability and exact
algorithms: reflections on level-k phylogenetic networks. Journal of Bioinformatics
and Computational Biology, 2009. In press. 3,4
[Wil06] Stephen J. Willson. Unique solvability of certain hybrid networks from their distances.
Annals of Combinatorics, 10(1):165–178, 2006. 2
[Wil09] Steven Willson. Regular networks are determined by their trees, 2009. 3
[WZZ01] Lusheng Wang, Kaizhong Zhang, and Louxin Zhang. Perfect phylogenetic networks
with recombination. In Proceedings of the 16th ACM Symposium on Applied Com-
puting (SAC’01), pages 46–50, 2001. 2
... Nous introduisons une nouvelle classe de réseaux phylogénétiques, les réseaux non enracinés de niveau k, sur lesquels nous donnons également quelques propriétés . Nous prouvons alors des relations d'inclusion ou d'équivalence entre des classes de réseaux phylogénétiques explicites et abstraits [Gambette et Huber, 2010;, avant de synthétiser l'ensemble des relations connues entre sous-classes de réseaux phylogénétiques. Le chapitre 2 est consacré à l'algorithmique de la reconstruction des réseaux phylogénétiques. ...
... Dernière limite des méthodes combinatoires : leur fiabilité. Nous montrons que même avec un ensemble complet et correct de données en entrées, il peut y avoir une incertitude sur les résultats fournis par les algorithmes de reconstruction à partir de triplets car plusieurs réseaux tout aussi parcimonieux sont solutions [Gambette et Huber, 2010]. ...
... • [Gambette et Huber, 2010] La définition du paramètre de niveau dans un contexte non enraciné de la section 1.4.4, les liens entre réseaux non enracinés de niveau 1 et ensembles circulaires de bipartitions de la section 1.5.2, ...
Article
Diaporama de soutenance sur http://www.slideshare.net/PhilippeGambette/practical-use-of-combinatorial-methods-for-phylogenetic-network-reconstruction-6908561
... Modulo a well-understood exception (described below) all networks M generated via Algorithm 4 were recovered correctly by Lev1athan when ǫ = 1.0. This is a consequence of the fact that Lev1athan prioritises JNS moves and that a level-1 network M is completely defined by T (M) up to, but not including, galls of size 4 [10]. This is a drawback of any triplet based phylogenetic network approach since such approaches have to make a choice between the three galls on a set X ={a, b, c} that are all consistent with T = {ab|c, a|bc}. ...
Article
Recently, much attention has been devoted to the construction of phylogenetic networks which generalize phylogenetic trees in order to accommodate complex evolutionary processes. Here, we present an efficient, practical algorithm for reconstructing level-1 phylogenetic networks--a type of network slightly more general than a phylogenetic tree--from triplets. Our algorithm has been made publicly available as the program LEV1ATHAN. It combines ideas from several known theoretical algorithms for phylogenetic tree and network reconstruction with two novel subroutines. Namely, an exponential-time exact and a greedy algorithm both of which are of independent theoretical interest. Most importantly, LEV1ATHAN runs in polynomial time and always constructs a level-1 network. If the data are consistent with a phylogenetic tree, then the algorithm constructs such a tree. Moreover, if the input triplet set is dense and, in addition, is fully consistent with some level-1 network, it will find such a network. The potential of LEV1ATHAN is explored by means of an extensive simulation study and a biological data set. One of our conclusions is that LEV1ATHAN is able to construct networks consistent with a high percentage of input triplets, even when these input triplets are affected by a low to moderate level of noise.
Article
Phylogenetic networks generalize evolutionary trees, and are commonly used to represent evolutionary histories of species that undergo reticulate evolutionary processes such as hybridization, recombination and lateral gene transfer. Recently, there has been great interest in trying to develop methods to construct rooted phylogenetic networks from triplets, that is rooted trees on three species. However, although triplets determine or encode rooted phylogenetic trees, they do not in general encode rooted phylogenetic networks, which is a potential issue for any such method. Motivated by this fact, Huber and Moulton recently introduced trinets as a natural extension of rooted triplets to networks. In particular, they showed that [Formula: see text] phylogenetic networks are encoded by their trinets, and also conjectured that all "recoverable" rooted phylogenetic networks are encoded by their trinets. Here we prove that recoverable binary level-2 networks and binary tree-child networks are also encoded by their trinets. To do this we prove two decomposition theorems based on trinets which hold for all recoverable binary rooted phylogenetic networks. Our results provide some additional evidence in support of the conjecture that trinets encode all recoverable rooted phylogenetic networks, and could also lead to new approaches to construct phylogenetic networks from trinets.
Article
Full-text available
Phylogenetic networks were introduced to describe evolution in the presence of exchanges of genetic material between coexisting species or individuals. Split networks in particular were introduced as a special kind of abstract network to visualize conflicts between phylogenetic trees which may correspond to such exchanges. More recently, methods were designed to reconstruct explicit phylogenetic networks (whose vertices can be interpreted as biological events) from triplet data. In this article, we link abstract and explicit networks through their combinatorial properties, by introducing the unrooted analog of level-k networks. In particular, we give an equivalence theorem between circular split systems and unrooted level-1 networks. We also show how to adapt to quartets some existing results on triplets, in order to reconstruct unrooted level-k phylogenetic networks. These results give an interesting perspective on the combinatorics of phylogenetic networks and also raise algorithmic and combinatorial questions.
Article
Phylogenetic networks are a generalization of phylogenetic trees that are used in biology to represent reticulate or non-treelike evolution. Recently, several algorithms have been developed which aim to construct phylogenetic networks from biological data using {\em triplets}, i.e. binary phylogenetic trees on 3-element subsets of a given set of species. However, a fundamental problem with this approach is that the triplets displayed by a phylogenetic network do not necessary uniquely determine or {\em encode} the network. Here we propose an alternative approach to encoding and constructing phylogenetic networks, which uses phylogenetic networks on 3-element subsets of a set, or {\em trinets}, rather than triplets. More specifically, we show that for a special, well-studied type of phylogenetic network called a 1-nested network, the trinets displayed by a 1-nested network always encode the network. We also present an efficient algorithm for deciding whether a {\em dense} set of trinets (i.e. one that contains a trinet on every 3-element subset of a set) can be displayed by a 1-nested network or not and, if so, constructs that network. In addition, we discuss some potential new directions that this new approach opens up for constructing and comparing phylogenetic networks.
Article
Full-text available
Galled trees, directed acyclic graphs that model evolutionary histories with isolated hybridization events, have become very popular due to both their biological significance and the existence of polynomial-time algorithms for their reconstruction. In this paper, we establish to which extent several distance measures for the comparison of evolutionary networks are metrics for galled trees, and hence, when they can be safely used to evaluate galled tree reconstruction methods.
Article
Full-text available
Warning: this manuscript is outdated. Results of Section 2 have been improved and appeared in our article at CPM'09, and Section 3 contains an error which was pointed out by Sylvain Guillemot (Appendix F cannot be used to reduce the complexity of the FPT algorithm) who provides simpler obstructions and a valid FPT algorithm, as well as a polynomial kernel for the MCSRT problem in a 2009 article with Matthias Mnich. This manuscript will remain available as long as the results of Section 4 are not published elsewhere. We propose a new approach for the reconstruction of level-k phylogenetic networks from their set of triplets when all triplets of the network to reconstruct are considered to be in the input set. For level-0 networks, i.e. trees, we give obstructions on four leaves which characterize dense triplet sets incompatible with a tree. We deduce a linear time certifying algorithm for tree reconstruction, and an FPT algorithm for finding the minimum number of triplets to edit to make a dense triplet set compatible with a tree. For level-1 networks, i.e. galled trees, we give a similar reconstruction algorithm to decide whether the triplet set T is extremely dense, that is when T is the set of all triplets compatible with the reconstructed network. This approach is based on level-k generators. We give rules to build them automatically, and a decomposition theorem of level-k networks based on these generators.
Article
Full-text available
This paper is devoted to some selected topics relating Combinatorial Optimization and Hierarchical Classification. It is oriented toward extensions of the standard classification schemes (the hierarchies): pyramids, quasi-hierarchies, circular clustering, rigid clustering and others. Bijection theorems between these models and dissimilarity models allow to state some clustering problems as optimization problems. Within the galaxy of optimization we have especially discussed the following: NP-completeness results and search for polynomial instances; problems solved in a polynomial time (e.g. subdominant theory); design, analysis and applications of algorithms. In contrast with the orientation to “new” clustering problems, the last part discusses some standard algorithmic approaches.
Article
Phylogenetic relationships among taxa have usually been represented by rooted trees in which the leaves correspond to extant taxa and interior vertices correspond to extinct ancestral taxa. Recently, more general graphs than trees have been investigated in order to be able to represent hybridization, lateral gene transfer, and recombination events. A model is presented in which the genome at a vertex is represented by a binary string. In the presence of hybridization and the absence of convergent evolution and homoplasies, the evolution is modeled by an acyclic digraph. It is shown how distances are most naturally related to the vertices rather than to the edges. Indeed, distances are computed in terms of the “originating weights” at vertices. It is shown that some distances may not in fact correspond to the sum of branch lengths on any path in the graph. In typical applications, direct measurements can be made only on the leaves, including the root. A study is made of how to infer the originating weights at interior vertices from such information.
Conference Paper
Given a set T{\cal T} of rooted triplets with leaf set L, we consider the problem of determining whether there exists a phylogenetic network consistent with T{\mathcal{T}}, and if so, constructing one. If no restrictions are placed on the hybrid nodes in the solution, the problem is trivially solved in polynomial time by a simple sorting network-based construction. For the more interesting (and biologically more motivated) case where the solution is required to be a level-1 phylogenetic network, we present an algorithm solving the problem in O(n 6) time when T{\mathcal{T}} is dense (i.e., contains at least one rooted triplet for each cardinality three subset of L), where n = |L|. Note that the size of the input is Θ(n 3) if T{\mathcal{T}} is dense. We also give an O(n 5)-time algorithm for finding the set of all phylogenetic networks having a single hybrid node attached to exactly one leaf (and having no other hybrid nodes) that are consistent with a given dense set of rooted triplets.
Conference Paper
For a given dense triplet set T\mathcal{T}, there exist two natural questions [7]: Does there exist any phylogenetic network consistent with T\mathcal{T}? In case such networks exist, can we find an effective algorithm to construct one? For cases of networks of levels k = 0, 1 or 2, these questions were answered in [1,6,7,8,10] with effective polynomial algorithms. For higher levels k, partial answers were recently obtained in [11] with an O(|T|k+1)O(|\mathcal{T}|^{k+1}) time algorithm for simple networks. In this paper, we give a complete answer to the general case, solving a problem proposed in [7]. The main idea of our proof is to use a special property of SN-sets in a level-k network. As a consequence, for any fixed k, we can also find a level-k network with the minimum number of reticulations, if one exists, in polynomial time.
Conference Paper
The result of a multiple gene tree analysis is usually a number of different tree topologies that are each supported by a significant proportion of the genes. We introduce the concept of a cluster network that can be used to combine such trees into a single rooted network, which can be drawn either as a cladogram or phylogram. In contrast to split networks, which can grow exponentially in the size of the input, cluster networks grow only quadratically. A cluster network is easily computed using a modification of the tree-popping algorithm, which we call network-popping. The approach has been implemented as part of the Dendroscope tree-drawing program and its application is illustrated using data and results from three recent studies on large numbers of gene trees.
Conference Paper
This paper considers the problem of determining whether a given set \T of rooted triplets can be merged without conflicts into a galled phylogenetic network and, if so, constructing such a network. When the input \T is dense, we solve the problem in O(|\T|) time, which is optimal since the size of the input is \Theta(|\T|). In comparison, the previously fastest algorithm for this problem runs in O(|\T|^2) time. We also develop an optimal O(|\T|)-time algorithm for enumerating all simple phylogenetic networks leaf-labeled by L that are consistent with \T, where L is the set of leaf labels in \T, which is used by our main algorithm. Next, we prove that the problem becomes NP-hard if extended to nondense inputs, even for the special case of simple phylogenetic networks. We also show that for every positive integer n, there exists some set \T of rooted triplets on n leaves such that any galled network can be consistent with at most 0.4883 \cdot |\T| of the rooted triplets in \T. On the other hand, we provide a polynomial-time approximation algorithm that always outputs a galled network consistent with at least a factor of 512\frac{5}{12} (>0.4166> 0.4166) of the rooted triplets in \T.