Algorithmic and complexity results for decompositions of biological networks into monotone subsystems.
ABSTRACT A useful approach to the mathematical analysis of large-scale biological networks is based upon their decompositions into monotone dynamical systems. This paper deals with two computational problems associated to finding decompositions which are optimal in an appropriate sense. In graph-theoretic language, the problems can be recast in terms of maximal sign-consistent subgraphs. The theoretical results include polynomial-time approximation algorithms as well as constant-ratio inapproximabil- ity results. One of the algorithms, which has a worst-case guarantee of 87.9% from optimality, is based on the semidefinite programming relaxation approach of Goemans- Williamson (23). The algorithm was implemented and tested on a Drosophila segmen- tation network and an Epidermal Growth Factor Receptor pathway model, and it was found to perform close to optimally.
- [Show abstract] [Hide abstract]
ABSTRACT: This paper (parts I and II) provides an expository introduction to monotone and near-monotone dynamical systems associated to biochemical networks, those whose graphs are consistent or near-consistent. Many conclusions can be drawn from signed network structure, associated to purely stoichiometric information and ignoring fluxes. In particular, monotone systems respond in a predictable fashion to perturbations and have robust and ordered dynamical characteristics, making them reliable components of larger networks. Interconnections of monotone systems may be fruitfully analyzed using tools from control theory, by viewing larger systems as interconnections of monotone subsystems. This allows one to obtain precise bifurcation diagrams without appeal to explicit knowledge of fluxes or of kinetic constants and other parameters, using merely "input/output characteristics" (steady-state responses or DC gains). The procedure may be viewed as a "model reduction" approach in which monotone subsystems are viewed as essentially one-dimensional objects. The possibility of performing a decomposition into a small number of monotone components is closely tied to the question of how "near" a system is to being monotone. We argue that systems that are "near monotone" may be more biologically more desirable than systems that are far from being monotone. Indeed, there are indications that biological networks may be much closer to being monotone than random networks that have the same numbers of vertices and of positive and negative edges.01/2007; - SourceAvailable from: Eduardo D Sontag[Show abstract] [Hide abstract]
ABSTRACT: Monotone subsystems have appealing properties as components of larger networks, since they exhibit robust dynamical stability and predictability of responses to perturbations. This suggests that natural biological systems may have evolved to be, if not monotone, at least close to monotone in the sense of being decomposable into a "small" number of monotone components, In addition, recent research has shown that much insight can be attained from decomposing networks into monotone subsystems and the analysis of the resulting interconnections using tools from control theory. This paper provides an expository introduction to monotone systems and their interconnections, describing the basic concepts and some of the main mathematical results in a largely informal fashion.Systems and Synthetic Biology 05/2007; 1(2):59-87. - SourceAvailable from: sissa.it[Show abstract] [Hide abstract]
ABSTRACT: In this paper we propose three different graph-theoretical decompositions of large-scale biologi-cal networks, all three aiming at highlighting specific dynamical properties of the system. The first consists in finding a maximal directed acyclic subgraph in the network, which dynamically cor-responds to searching for the maximal open-loop subsystem of the given system. The other two decompositions deal with the strong monotonicity property, and aim at decomposing the system into strongly monotone components with different structural characteristics: a single large strongly con-nected monotone subsystem in one case, and a set of smaller disjoint monotone subsystems in the other. For all three decompositions we provide original heuristic algorithms.09/2010;
Page 1
UNCORRECTED PROOF
0303-2647/$ – see front matter © 2006 Elsevier Ireland Ltd. All rights reserved.
doi:10.1016/j.biosystems.2006.08.001
BIO 2594 1–18
BIO25941–18
BioSystems xxx (2006) xxx–xxx
Algorithmic and complexity results for decompositions of
biological networks into monotone subsystems
3
4
Bhaskar DasGuptaa,1,∗, German Andres Encisob,2, Eduardo Sontagc,3, Yi Zhanga,1
5
aDepartment of Computer Science, University of Illinois at Chicago, Chicago, IL 60607, United States
bMathematical Biosciences Institute, 250 Mathematics Building, 231 W 18th Avenue, Columbus, OH 43210, United States
cDepartment of Mathematics, Rutgers University, New Brunswick, NJ 08903, United States
6
7
8
Received 23 January 2006; received in revised form 3 August 2006; accepted 3 August 2006
9
Abstract
10
A useful approach to the mathematical analysis of large-scale biological networks is based upon their decompositions into mono-
tone dynamical systems. This paper deals with two computational problems associated to finding decompositions which are optimal
in an appropriate sense. In graph-theoretic language, the problems can be recast in terms of maximal sign-consistent subgraphs.
The theoretical results include polynomial-time approximation algorithms as well as constant-ratio inapproximability results. One
of the algorithms, which has a worst-case guarantee of 87.9% from optimality, is based on the semidefinite programming relaxation
approachofGoemans–Williamson[Goemans,M.,Williamson,D.,1995.Improvedapproximationalgorithmsformaximumcutand
satisfiability problems using semidefinite programming. J. ACM 42 (6), 1115–1145]. The algorithm was implemented and tested on
a Drosophila segmentation network and an Epidermal Growth Factor Receptor pathway model, and it was found to perform close
to optimally.
© 2006 Elsevier Ireland Ltd. All rights reserved.
11
12
13
14
15
16
17
18
19
20
21
1. Introduction
22
In living cells, networks of proteins, RNA, DNA,
metabolites, and other species process environmental
signals, control internal events such as gene expres-
sion, and produce appropriate cellular responses. The
fieldofsystems(molecular)biologyislargelyconcerned
with the study of such networks, viewed as dynamical
systems. One approach to their mathematical analysis
23
24
25
26
27
28
29
∗Corresponding author. Tel.: +1 3123551319; fax: +1 3124130024.
E-mail addresses: dasgupta@cs.uic.edu (B. DasGupta),
yzhang3@cs.uic.edu (Y. Zhang), genciso@mbi.osu.edu
(G.A. Enciso), sontag@math.rutgers.edu (E. Sontag).
1Partly supported by NSF grants CCR-0296041, CCR-0206795,
CCR-0208749 and IIS-0346973.
2Work done while the author was with the Mathematics Depart-
ment of Rutgers University and partly supported by NSF grant CCR-
0206789.
3Partly supported by NSF grants EIA 0205116 and DMS-0504557.
relies upon viewing them as made up of subsystems
whosebehaviorissimplerandeasiertounderstand.Cou-
pled with appropriate interconnection rules, the hope is
that emergent properties of the complete system can be
deduced from the understanding of these subsystems.
Diagrammatically, we picture this as in Fig. 1, which
shows a full system as composed of four subsystems.
Aparticularlyappealingclassofcandidatesfor“sim-
pler behaved” subsystems are monotone systems, as in
Hirsch (1985, 1983) and Smith (1995). Monotone sys-
tems are a class of dynamical systems for which patho-
logical behavior (“chaos”) is ruled out. Even though
they may have arbitrarily large dimensionality, mono-
tonesystemsbehaveinmanywayslikeone-dimensional
systems. For instance, in monotone systems, bounded
trajectories generically converge to steady states, and
therearenostableoscillatorybehaviors.Moreprecisely,
see below, one must extend the notion of monotone sys-
tem so as to incorporate input and output channels, as
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
1
2
Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks
into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001
Page 2
UNCORRECTED PROOF
BIO 2594 1–18
BIO25941–18
2
B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx
Fig. 1. A system composed of four subsystems.
introduced and initially developed in Angeli and Sontag
(2003); inputs and outputs are required so that intercon-
nections like those shown in Fig. 1 can be defined.
Monotonicity is closely related, as explained later,
to positive and feedback loops in systems. The topic
of analyzing the behaviors of such feedback loops is a
long-standing one in biology in the context of regula-
tion,metabolism,anddevelopment;aclassicalreference
in that regard is the work (Monod and Jacob, 1961)
of Monod and Jacob in 1961. See also, for example,
Angeli et al. (2004), Angeli and Sontag (2004), Cinquin
and Demongeot (2002), Lewis et al. (1977), Meinhardt
(1978), Plathe et al. (1995), Remy et al. (2003), Snoussi
(1998) and Thomas (1978).
An interconnection of monotone subsystems, that is
to say, an entire system made up of monotone compo-
nents,mayormaynotbemonotone:“positivefeedback”
(in a sense that can be made precise) preserves mono-
tonicity, while “negative feedback” destroys it. Thus,
oscillators such as circadian rhythm generators require
negative feedback loops in order for periodic orbits to
arise, and hence are not themselves monotone systems,
although they can be decomposed into monotone sub-
systems (cf. Angeli and Sontag, 2004). A rich theory is
beginning to arise, characterizing the behavior of non-
monotone interconnections. For example, Angeli and
Sontag (2003) shows how to preserve convergence to
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
Fig. 2. A consistent and an inconsistent graph.
equilibria; see also the follow-up papers (Angeli et al.,
2004; Enciso et al., 2005; Enciso and Sontag, 2006;
Gedeon and Sontag, 2005; De Leenheer et al., 2005).
Even for monotone interconnections, the decomposi-
tion approach is very useful, as it permits locating and
characterizing the stability of steady states based upon
input/output behaviors of components, as described in
Angeli and Sontag (2004); see also the follow-up papers
(Angeli et al., 2004; Enciso and Sontag, 2005; De Leen-
heer and Malisoff, 2006).
Moreover, a key point brought up in Sontag (2004,
2005) is that new techniques for monotone systems in
many situations allow one to characterize the behavior
of an entire system, based upon the “qualitative” knowl-
edge represented by general network topology and the
inhibitory or activating character of interconnections,
combined with only a relatively small amount of quan-
titative data. The latter data may consist of steady-state
responses of components (dose-response curves and so
forth), and there is no need to know the precise form
of dynamics or parameters such as kinetic constants in
order to obtain global stability conclusions.
In Section 2 of this paper, we briefly discuss mono-
tonicity of systems described by ordinary differential
equations (the study of monotonicity can be extended
to partial differential equations, delay-differential equa-
tions, and even more arbitrary dynamical systems, see
e.g. Enciso and Sontag, 2006 in the context of mono-
tone systems with inputs and outputs). We explain there
how the study of monotone systems, and more generally
of decompositions into monotone systems, relates to a
sign-consistency property for the graph which describes
how each state variable influences each other variable in
a given system.
Generally, a graph, whose edges are labeled by “+”
or “−” signs (sometimes one writes +1,−1 instead of
+,−, or uses respectively activating “→” or inhibiting
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks
into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001
Page 3
UNCORRECTED PROOF
activated state or transcription factors. Assume now that
a perturbation instantaneously increases the value of the
BIO 2594 1–18
BIO25941–18
B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx
3
Fig. 3. Pulling-out inconsistent connections.
“?” arrows as shown in Fig. 2), is said to be sign-
consistent if all paths between any two nodes have the
samenetsign,orequivalently,allclosedloopshavepos-
itive parity, i.e. an even number, possibly 0, of negative
edges. (For technical reasons, one ignores the direction
of arrows, looking only at undirected graphs; see more
details in Section 2.) Thus, the first graph in Fig. 2 is
consistent, but the second one, which differs in just one
edge from the first one, is not (two paths with differ-
ent parity are possible from node 1 to node 4, a direct
odd one as well as an even one transversing nodes 2 and
3). Self-loops, which in biochemical systems often rep-
resent degradation terms, are ignored in this definition.
(We discuss this point further below.)
When applying decomposition theorems such as
those described in Angeli et al. (2004), Angeli et al.
(2004), Angeli and Sontag (2003, 2004), Enciso et al.
(2005), Enciso and Sontag (2005), Enciso and Sontag
(2006), Gedeon and Sontag (2005), De Leenheer et al.
(2005) and De Leenheer and Malisoff (2006), Sontag
(2004, 2005), it tends to be the case that the fewer the
numberofinterconnectionsamongcomponents,theeas-
ier it is to obtain useful conclusions. One may view a
decomposition into interconnections of monotone sub-
systems as the “pulling out” of “inconsistent” connec-
tions among monotone components, the original system
being a “negative feedback” loop around an otherwise
consistent system, as represented in Fig. 3. In this inter-
pretation, the number of interconnections among mono-
tonecomponentscorrespondstothenumberofvariables
being fed-back. In addition, and independently from the
theory developed in the above references, one might
speculate that nature tends to favor systems that are
decomposableintosmallmonotoneinterconnections(or
equivalently,haveasmallnumberofinconsistentpaths).
There are two reasons for this.
Fromadynamicalsystemsperspective,negativefeed-
back loops, although required for homeostasis and for
periodic behavior, have potentially destabilizing effects,
especially if there are signal propagation delays; thus,
minimizing their number is desirable.
Another advantage of consistency is as follows
(Sontag, in preparation). Suppose that the nodes in the
graphs shown in Fig. 2 represent concentrations of a
chemical species in a cell, such as receptors in a certain
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
concentration of node 1. For the graph on the left, the
instantaneous effect on the other nodes is predictable:
nodes 2 and 6 will increase, while nodes 3, 4, and 5
willdecrease.Thisunambiguousglobaleffectholdstrue
regardlessoftheactualalgebraicformsofreactions,val-
ues of parameters such and kinetic constants, etc. In
contrast, consider the graph shown on the right. Now
the net effect of an increase in node 1 is ambiguous. It is
impossible to know if node 4 will be repressed (because
of the direct edge from 1 to 4) or activated (because of
the indirect path). There is no way to resolve this ambi-
guity unless equations and precise parameter values are
assignedtothearrows.Sincecellsofthesametypediffer
in precise parameter values, due to varying concentra-
tions of ATP, enzymes, and other chemicals, two cells of
the same type may react in different ways to the same
“stimulus” (increase in concentration of chemical 1).
While such epigenetic diversity is sometimes desirable,
itmakesbehaviorlesspredictable.Fromanevolutionary
viewpoint, a “change in wiring” due to a mutation will
have an ambiguous effect, in this inconsistent network.
Ofcourse,oneshouldnotexpectlargenetworkstobe
globally consistent. However, if the number of inconsis-
tencies in a biological interaction graph is small, it may
well be the case that the network is in fact consistent
in a practical sense. For example, a gene regulatory net-
workrepresentsallpotentialeffectsamonggenes.These
effects are mediated by proteins which themselves may
need to be “activated” in order to perform their func-
tion, and this activation may, in turn, depend on certain
extracellular ligands being present. Thus, depending on
the particular combination of external signals present,
different subgraphs of the original graph describe the
system under those conditions, and these graphs may be
individually consistent. For example, for the system in
Fig.2,theedgefrom1to2maynotbepresentunderenvi-
ronmental conditions A, while the edge from 2 to 3 may
not be present under conditions B. Thus, under either
conditions, A or B, the graph would be consistent, even
though the entire network is not. See Sontag (in prepa-
ration) for more discussion of these issues. In summary,
consistencyinbiologicalnetworksmaybedesirable,and
therefore one might conjecture that true biological net-
works tend to maximize it. Evidence that this is indeed
the case is provided by Ma’ayan et al. (in preparation),
where the authors compare certain biological networks
andappropriatelyrandomizedversionsofthemandshow
that the original networks are closer to being consistent,
when consistency is measured using a simple heuristic.
In the last section of this paper, we apply our algorithms
to perform a similar analysis, and once again derive the
conclusion that nature seems to favor consistency.
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks
into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001
Page 4
UNCORRECTED PROOF
approximability used in the paper, leading to the state-
ment of our main theoretical results in Section 4, which
BIO 2594 1–18
BIO25941–18
4
B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx
Fig. 4. Dropping the diagonal edge gives consistency.
Thus, we are led to the subject of this paper, namely
computing the smallest number of edges that have to
be removed so that there remains a consistent graph.
For example, for the particular graph shown in Fig. 4
the answer is that one edge (the diagonal positive one)
suffices (in this case, the solution is unique: no single
other edge would suffice; in other problems, there may
be more than one optimizing solutions).
There has been other work dealing with efficient
knock-out strategies in biochemical reaction networks,
also formulated, as in this paper, as edge deletion prob-
lems. As an example, we mention the recent paper
(Klamt, 2006), which dealt with the question of iden-
tifying a minimal set of reactions whose removal would
block the operation of a prespecified reaction. The prob-
lem that we consider is completely different, however.
In this paper, we will study the computational com-
plexity of the question of how many edges must be
removed in order to obtain consistency, and we pro-
vide a relaxation-based polynomial-time approximation
algorithm guaranteed to solve the problem to about
87.9% of the optimum solution, which is based on
the semidefinite programming relaxation approach of
Goemans–WilliamsonGoemansandWilliamson(1995)
(A variant of the problem is discussed as well.) We also
observe that it is not possible to have a polynomial-time
algorithm with performance too close to the optimal.
While our emphasis is on theory, one of the algorithms
was implemented, and we show results of its applica-
tion to a Drosophila segmentation network and to an
Epidermal Growth Factor Receptor pathway model. It
turns out that, when applying the algorithm, often the
solution is much closer to optimal than the worst-case
guarantee of 87.9%, and indeed often gives an optimal
solution.
The remainder of this paper is organized as follows.
Section2brieflydiscussesmonotonicity.Thediscussion
is self-contained for the purposes of this paper, and ref-
erences are given to the dynamical systems results that
motivate the problem studied here. The connection to
consistency is also explained there. Section 3 discusses
the associated graph-theoretic problems and notions of
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
are proved in Section 5. Section 6 contains the men-
tioned examples of application of the algorithm. Finally,
in Section 6.3 we consider a yeast gene regulatory net-
work and various randomized versions of it, concluding
that the original network is far closer to consistent than
may be expected from chance alone. Several technical
proofs are separately provided in Appendix A.
255
256
257
258
259
260
261
2. Monotone systems and consistency
262
Wewillillustratethemotivationfortheproblemstud-
ied here using systems of ordinary differential equations
263
264
˙ x = F(x)
(the dot indicates time derivative, and x = x(t) is a vec-
tor), although the discussion applies as well to more
general types of dynamical systems such as delay-
differential systems or certain systems of reaction-
diffusion partial differential equations. In applications
to biological networks, the component xi(t) of the vec-
torx = x(t)indicatestheconcentrationoftheithspecies
in the model at time t.
Wewillrestrictattentiontomodelsinwhichthedirect
effect that one given variable in the model has over
another is unambiguous, in the sense that it is always
inhibitory or always promoting. Thus, if protein A binds
to the promoter region of gene B, we assume that it does
so either to prevent the transcription of the gene or to
facilitate it, no matter what are the respective concen-
trations. Mathematically, what we are saying is that we
require that for every i,j = 1,...,n, i ?= j, the partial
derivative ∂Fi/∂xjbe either ≥ 0 at all states or ≤ 0 at all
states.
Let us briefly discuss this non-ambiguity assump-
tion. First of all, we remark that this assumption does
not prevent protein A from having an indirect influ-
ence, through other molecules, perhaps dimmers of A
itself, that can ultimately lead to the opposite effect
on gene B from that of a direct connection. Indeed,
this is the whole point of studying graph consistency.
Second, in biomolecular networks, ambiguous signs in
Jacobians often represent heterogeneous mechanisms.
Forexample,takethecasewhereproteinAenhancesthe
transcriptionrateofgeneBonlyifitispresentatlowcon-
centrations, but represses B if its concentration is larger
than some threshold. A careful study of the chemical
mechanism often reveals the existence of an interme-
diate form (perhaps a homodimer) that is responsible
for this ambiguous effect. (Mathematically, an example
is a rate of transcription k1a − k2a2, where a denotes
the concentration of A.) Introducing a new species into
the model (mathematically, an additional state variable
(1)
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks
into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001
Page 5
UNCORRECTED PROOF
Lemma 1. Consider an orthant order ≤sgenerated by
s = (s1,...,sn). A system (1) is monotone with respect
BIO 2594 1–18
BIO25941–18
B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx
5
representing this intermediate form) reduces one to the
problem in which Jacobian entries are unambiguous. (In
ourexample,wewouldwritetherateask1a − k2c,where
c is the concentration of the dimer. In addition, there
would be a new equation such as dc/dt = k3a2− k4c
representingformationofthedimeranditsdegradation.)
Finally,wenotethatsmall-scalenegativeloopsareabun-
dant in nature. Self-loops or “auto repression” are an
extreme example of these, and appear as a consequence
of degradation and other effects. Regarding such self-
loops, observe that the requirement of a fixed sign for
Jacobian entries is not imposed on diagonal elements.
In fact, these elements play no role in the graph to be
introduced next, nor on monotonicity—the properties
of monotone systems are not affected by them. More
generally, it is often the case that small loops represent
fast dynamics which may be collapsed into a self-loops
via time-scale decomposition (singular perturbations or,
specificallyforenzymes,“quasi-steadystateapproxima-
tions”) and hence may be viewed and diagonal terms
which may be safely ignored. This is a modeling ques-
tion, to be settled before the algorithms studied here are
to be applied.
Given any partial order ≤ defined on Rn, a system
(1) is said to be monotone with respect to ≤ if x0≤
y0implies x(t) ≤ y(t) for every t ≥ 0. Here x(t), y(t)
are the solutions of (1) with initial conditions x0, y0,
respectively. Of course, whether a system is monotone
or not depends on the partial order being considered, but
weonesayssimplythatasystemismonotoneiftheorder
is clear from the context. Monotonicity with respect to
nontrivial orders rules out chaotic attractors and even
stable periodic orbits; see Hirsch (1985, 1983), Smith
(1995), and is, as discussed in the introduction, a useful
property for components when analyzing larger systems
in terms of subsystems.
A useful way to define partial orders in Rn, and the
only one to be further considered in this paper, is as fol-
lows. Given a tuple s = (s1,...,sn), where si∈ {1,−1}
for every i, we say that x ≤sy if sixi≤ siyifor every
i. For instance, the “cooperative order” is the orthant
order ≤sgenerated by s = (1,...,1). This is the order
≤ defined by x ≤ y if and only if xi≤ yifor all i =
1,...,n. It is not difficult to verify if a system is coop-
erative with respect to an orthant order; the following
lemma, known as “Kamke’s condition,” is not hard to
prove, see Smith (1995) for details (also Angeli and
Sontag, 2003 in the more general context of monotone
systems with input and output channels).
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
to ≤sif and only if
sisj∂Fj
∂xi
355
≥ 0,i,j = 1,...,n,i ?= j.
(2)
356
To provide intuition, let us sketch the sufficiency part
of the proof for the special case of the cooperative
order. Suppose by contradiction that the system is not
monotone, and that therefore there is a pair of ini-
tial conditions x0≤ y0whose solutions x(t), y(t) cease
to satisfy x(t) ≤ y(t) at some point. This implies that
at a certain critical moment in time t, there is some
coordinate i so that xi(t−) < yi(t−) but xi(t+) > yi(t+).
(This argument is not entirely accurate, but it gives
the flavor of the proof.) Thus xi(t) = yi(t) for some i
and the derivative with respect to time of xiis larger
than that of yi at time t, meaning that that Fi(x) >
Fi(y), where x = xi(t) and y = yi(t). However, this
cannot happen if Fiis increasing on all the variables
xj except possibly xi, so that x ≤ y,xi= yi implies
Fi(x) ≤ Fi(y). An equivalent way to phrase this con-
dition is by ask that ∂Fi/∂xj≥ 0 at all states for every
i,j,i ?= j, which is the Kamke condition for the special
case of the cooperative order. The name of the order
arises because in a monotone system with respect to that
order each species promotes or “cooperates” with each
other.
A rephrasing of this characterization of monotonicity
with respect to orthant orders can be given by looking at
the signed digraph G associated to (1). We define the
vertex set V(G) and the edge set E(G) of G as fol-
lows. Let V(G) = {1,...,n}, and given vertices i,j,
let (i,j) ∈ E(G) and fE(i,j) = 1 if both ∂Fj/∂xi≥ 0
and the strict inequality holds at least at one state.
Similarly let (i,j) ∈ E(G) and fE(i,j) = −1 if both
∂Fj/∂xi≤ 0andthestrictinequalityholdsatleastatone
state. Finally, let (i,j) ?∈ E(G) if ∂Fj/∂xi≡ 0. Recall
that we are assuming that one of the three cases must
hold.
Now we can define an orthant cone using any func-
tion fV: V(G) → {−1,1}, by letting x ≤fVy if and
only if fV(i)xi≤ fV(i)yifor all i. Given fV, we define
the consistency function g : E(G) → {true, false} by
g(i,j) = fV(i)fV(j)fE(i,j).Then,thefollowinganalog
of Lemma 1 holds.
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
Lemma 2. Consider a system (1) and an orthant cone
≤fV. Then (1) is monotone with respect to ≤fVif and
only if g(i,j) ≡ 1 on E(G).
Proof.
Let
si= fV(i),i = 1,...,n.
sisj∂fi/∂xj= 0 if (i,j) ?∈ E(G). For (i,j) ∈ E(G), it
holdsthatsisj∂fi/∂xj≥ 0ifandonlyifsisjfE(i,j) = 1,
397
398
399
Notethat
400
401
402
Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks
into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001
Page 6
UNCORRECTED PROOF
us consider the following biological model of testos-
terone dynamics (Enciso and Sontag, 2004; Murray and
BIO 2594 1–18
BIO25941–18
6
B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx
that is, if and only if g(i,j) = 1. The result follows from
Lemma 1.
?
403
404
For the next lemma, let the parity of a chain in G be the
productofthesigns(+1,−1)ofitsindividualedges.We
will consider in the next result closed undirected chains,
that is, sequences xi1,...,xirsuch that xi1= xir, and
such that for every λ = 1,...,r − 1 either (xiλ,xiλ+1) ∈
E(G) or (xiλ+1,xiλ) ∈ E(G).
The following lemma (see DeAngelis et al., 1986 as
well as Smith, 1988, page 101) is analogous to the fact
from vector calculus that path integrals of a vector field
are independent of the particular path of integration if
and only if there exists a potential function. Since the
result is key to the formulation of the problem being
considered,weprovideasimpleandself-containedproof
in Appendix A.
405
406
407
408
409
410
411
412
413
414
415
416
417
418
Lemma 3. Consider a dynamical system (1) with asso-
ciated directed graph G. Then (1) is monotone with
respect to some orthant order if and only if all closed
undirected chains of G have parity 1.
419
420
421
422
2.1. Systems with inputs and outputs
423
As we discussed in the introduction, a useful
approach to the analysis of biological networks consists
of decomposing a given system into an interconnection
of monotone subsystems. The formulation of the notion
of interconnection requires subsystems to be endowed
with “input and output channels” through which infor-
mation is to be exchanged. In order to address this we
consider controlled dynamical systems (Sontag, 1990)
which are systems with an additional parameter u ∈ Rm
and which have the form
424
425
426
427
428
429
430
431
432
433
˙ x = g(x,u).
The values of u over time are specified by means of
a function t → u(t) ∈ Rm, t ≥ 0, called an input or
control. Thus each input defines a time-dependent
dynamical system in the usual sense. To system (3)
there is associated a feedback function h : Rn→ Rm,
which is usually used to create the closed loop system
˙ x = g(x,h(x)). Finally, if Rn,Rmare ordered by orthant
orders ≤fV,≤qrespectively, we say that the system is
monotone if it satisfies (2) for every u, and also
(3)
434
435
436
437
438
439
440
441
442
443
qkfV(j)∂gj
∂uk
≥ 0,
for everyk,j
(4)
444
(see also Angeli and Sontag, 2003.) As an example, let
445
446
447
Mathematical Biology, 2002):
448
˙ x1=
˙ x3= c2x2− b3x3.
Drawing the digraph of this system, it is easy to see that
it is not monotone with respect to any orthant order,
as follows by application of Lemma 3. On the other
hand, replacing x3in the first equation by u, we obtain
a system that is monotone with respect to the orders
≤(1,1,1),≤(−1)for state and input respectively. Defining
h(x) = x3, the closed loop system of this controlled
system is none other than (5). The paper (Enciso and
Sontag, 2004) shows how, using this decomposition
together with the “small gain theorem” from monotone
input/output theory (Angeli and Sontag, 2003) leads
one to a proof that the system does not have oscillatory
behavior, even under arbitrary delays in the feedback
loop, contrary to the assertion made in Murray and
Mathematical Biology (2002).
We can carry out this procedure on an arbitrary sys-
tem (1) with a directed graph G, as follows: given a
set E of edges in G, enumerate the edges in ECas
(i1,j1),...,(im,jm). For every k = 1,...,m, replace
all appearances of xikin the function Fjkby the vari-
able uk, to form the function g(x,u). Define h(x) =
(xi1,...,xim).Itiseasytoseethatthiscontrolledsystem
(3) has closed loop (1).
Note that the controlled system (3) generated by the
setEasabovehas,asassociateddigraph,thesub-digraph
of G generated by E. This is because for every k, one has
∂gjk(x,u)/∂xik≡ 0, i.e., the edge from ikto jkhas been
“erased”.
Denote byˆG the underlying undirected graph of a
directed graph G obtained by ignoring the directions of
theedges.GivenasetE ⊆ V(G)ofverticesina(directed
or undirected) graph G, denote by G(E) the undirected
subgraph of G generated by E. The edges of bothˆG and
G(E) are labeled with ±1 using the labels in the edges
of G, whenever appropriate. Let E be called consistent if
ˆG(E) has no closed chains with parity −1. Note that this
isequivalenttotheexistenceoffVsuchthatg ≡ 1onE,
by Lemma 4 applied to the open loop system (3). If E is
consistent, then the associated system (3) itself can also
be shown to be monotone: to verify condition (4), sim-
ply define each qkso that (4) is satisfied for k,jk. Since
∂gjk/∂uk= ∂Fjk/∂xik?≡ 0, this choice is in fact unam-
biguous. Conversely, if (3) is monotone with respect to
the orthant orders ≤fV,≤q, then in particular it is mono-
tone for every fixed constant u, so that E is consistent by
Lemma 3. We thus have the following result.
A
K + x3
− b1x1,
˙ x2= c1x1− b2x2,
449
(5)
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks
into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001
Page 7
UNCORRECTED PROOF
of these types of problems, such as when the equations
areoverGF(p)foranarbitraryprimep > 2,whenthere
BIO 2594 1–18
BIO25941–18
B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx
7
Lemma 4. Let E be a set of edges of the digraph G.
Then E is consistent if and only if the corresponding
controlled system (3) is monotone with respect to some
orthant orders.
497
498
499
500
3. Statement of problem
501
A natural problem is therefore the following. Given
a dynamical system (1) that admits a digraph G, use
the procedure above to decompose it as the closed loop
of a monotone controlled system (3), while minimiz-
ing the number ?EC? of inputs. Equivalently, find fV
such that P(E+) = ?E+? is maximized and P(E−) =
?E−? = ?EC
problem formulation.
502
503
504
505
506
507
+? minimized. This produces the following
508
509
Problem 1 (Undirected labeling problem (ULP)). An
instance of this problem is (G,h), where G = (V,E) is
an undirected graph and h : E ?→ {0,1}. A valid solu-
tion is a vertex labeling function f : V → {0,1}. Define
anedge{u,v} ∈ Etobeconsistentiffh(u,v) ≡ (f(u) +
f(v)) (mod 2). The objective is then to find a valid solu-
tion maximizing |F| where F is the set of consistent
edges.
510
511
512
513
514
515
516
517
That ULP is a correct formulation for our problem is
confirmed by the following easy equivalence.
518
519
Proposition1. Consideraninstance(G,h)ofULPwith
an optimal solution having x consistent edges given by
a vertex labeling function f. Let D be a set of edges of
smallest cardinality that have to be removed such that
for the remaining graph, that is the graph G?= (V,E \
D) with the same vertex set V but an edge set E \ D,
there exists a vertex labeling function f?: V → {0,1}
that makes every edge consistent. Then, x = |E| − |D|.
Proof. Since f produces a solution of ULP with x con-
sistent edges, exactly |E| − x edges are inconsistent,
thus |D| ≤ |E| − x, that is, x ≤ |E| − |D|. Conversely,
since there is a solution with |E| − |D| consistent edges,
x ≥ |E| − |D|.
A special case of ULP, namely when h(e) = 1 for all
e ∈ E, is the MAX-CUT problem (defined in Section
3.1). Moreover, ULP can be posed as a special type of
“constraint satisfaction problem” as follows. We have
|E| linear equations over GF(2), one equation per edge
and each equation involving exactly two variables, over
|V|Booleanvariables.Thegoalistoassignvaluestothe
variables to satisfy the maximum number of equations.
Foralgorithmsandlower-boundresultsforgeneralcases
520
521
522
523
524
525
526
527
528
529
530
531
?
532
533
534
535
536
537
538
539
540
541
542
543
areanarbitrarynumberofvariablesperequationorwhen
the goal is to minimize the number of unsatisfied equa-
tions, see references such as Amaldi and Kann (1996),
BermanandKarpinski(2001),Creignouetal.(2001)and
Hastad and Venkatesh (2002) and the references therein.
Another interpretation (Sontag, in preparation) of
ULP is in statistical mechanics terms. Let us label edges
by “±1” instead of {0,1}, denoting by wuv= (−1)h(u,v)
theedgeparities,nowcalled“interactionenergies.”Sim-
ilarly, let us consider ±1-valued vertex labeling func-
tions, now called (magnetic) “spin configurations,” σ :
V → {−1,+1}, σ(v) = (−1)f(v). An edge {u,v} is con-
sistent provided that wuvσuσj= 1. A graph with ±1
weights is called an Ising spin-glass model in statistical
physics. A “non-frustrated” spin-glass model is one for
which there is a spin configuration for which every edge
is consistent (Barahona, 1982; Cipra, 2000; De Simone
et al., 1995; Istrail, 2000). This is the same as a consis-
tent graph in our sense. Moreover, a spin configuration
thatmaximizesthenumberofconsistentedgesisonefor
whichthe“freeenergy”(withnoexteriormagneticfield):
?
is minimized, a “ground state”. (When h(e) = 1 or
equivalently we= −1 for all edges, one has what
is called the “anti-ferromagnetic case”.) Thus, our
problem amounts to finding ground states.
Given orthant orders ≤fVand ≤q for Rnand Rm
respectively,wesaythatafeedbackfunctionhispositive
if x ≤fVy implies h(x) ≤qh(y), and that it is negative
if x ≤fVy implies h(x) ≥qh(y). It can be shown that
the closed loop of a monotone system with a positive
feedback function is actually itself monotone, so that no
system can be produced in this way that was not mono-
tonealready.Butifhisanegativefeedbackfunction,then
several results become available which use the methods
of monotone systems for systems that are not monotone,
seeAngeliandSontag(2003),EncisoandSontag(2004)
and Enciso and Sontag (2006). For the following result,
let (C,⊆) be the class of consistent subsets of E(G),
ordered under inclusion.
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
−
ij
wuvσuσv
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
Proposition 2.
maximal in (C,⊆) if and only if h is a negative feedback
function for every fVsuch that g ≡ 1 on E.
Proof.
Suppose that E is maximal, and let fV be
such that g ≡ 1 on E. Given any edge (ik,jk) ∈ EC, it
holds that g(ik,jk) = −1. Otherwise one could extend
E by adding (ik,jk), thus violating maximality. That
is, fV(ik)fV(jk)fE(ik,jk) = −1. By monotonicity, it
holds that qkfV(jk)∂gjk/∂uk≥ 0, and since ∂gjk/∂uk=
Let E be a consistent set. Then E is
584
585
586
587
588
589
590
591
592
Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks
into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001
Page 8
UNCORRECTED PROOF
∃y ∈ V,(u,y) ∈ C} for any C ⊆ E and F is the set of
consistent edges.
BIO 2594 1–18
BIO25941–18
8
B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx
∂Fjk/∂xik, it follows necessarily that
qkfV(jk)fE(ik,jk) = 1.
Therefore it must hold that qk= −fV(ik) for each k,
which implies that h is a negative feedback function.
Conversely, if fVis such that g ≡ 1 on E and h is a
negative feedback function, then qk= −fV(ik). By the
same argument as above, qkfV(jk)fE(ik,jk) = 1 for all
k by monotonicity. Therefore g ≡ −1 on EC. Repeating
this for all admissible fV, maximality follows.
593
594
595
596
597
598
599
600
?
601
There is a second, slightly more sophisticated way of
writing a system (1) as the feedback loop of a system (3)
using an arbitrary set of edges E. Given any such E,
define S(Ec) = {i|there is somejsuch that(i,j) ∈ Ec}.
Now enumerate S(Ec) as {i1,...,im}, and for each k
label the set {j|(ik,j) ∈ Ec} as jk1,jk2,.... Then for
each k,l, one can replace each appearance of xikin
Fjklby uk, to form the function g(x,u). Then one lets
h(x) = (xi1,...,xim) as above. The closed loop of this
system(3)isalso(1)asbeforebutwiththeadvantagethat
there are |S(Ec)| inputs, and of course |S(Ec)| ≤ |Ec|.
If E is a consistent and maximal set, then one can
make (3) into a monotone system as follows. By let-
ting fV be such that g ≡ 1 on E, we define the order
≤fVon Rn. For every ik,jklsuch that (ik,jkl) ∈ EC,
it must hold that fV(ik)fV(jkl)fE(ik,jkl) = −1. Other-
wise E ∪ {(ik,jkl)} would be consistent, thus violating
maximality.Bychoosingqk= −fV(ik),Eq.(4)isthere-
foresatisfied.SeetheproofofProposition2.Conversely,
if the system generated by E using this second algorithm
is monotone with respect to orthant orders, and if h is a
negative function, then it is easy to verify that E must be
both consistent and maximal.
Thus the problem of finding E consistent and such
that P(E−) = ?S(E−)? = ?S(EC)? is smallest, when
restricted to those sets that are maximal and consistent
(this does not change the minimum ?S(EC)?), is equiv-
alent to the following problem: decompose (1) into the
negative feedback loop of an orthant monotone control
system, using the second algorithm above, and using as
fewinputsaspossible.Thisproducesthefollowingprob-
lem formulation.
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
Problem 2
instance of this problem is (G,h) where G = (V,E) is
a directed graph and h : E → {0,1}. A valid solution
is a vertex labeling function f : V → {0,1}. Define an
edge (u,v) ∈ E to be consistent iff h(u,v) ≡ (f(u) +
f(v)) (mod 2). The objective is then to find a valid
solution minimizing |g(E − F)| where g(C) = {u ∈ V |
(Directed labeling problem (DLP)). An
634
635
636
637
638
639
640
641
642
3.1. Summary of key concepts and results in
approximation algorithms
643
644
Foranyγ ≥ 1(resp.γ ≤ 1),aγ-approximatesolution
(orsimplyanγ-approximation)ofaminimization(resp.,
maximization) problem is a solution with an objective
value no larger than γ times (resp., no smaller that
γ times) the value of the optimum, and an algorithm
achieving such a solution is said to have an approxima-
tion ratio of γ.
In Papadimitriou and Yannakakis (1991) Papadim-
itriou and Yannakakis defined the class of MAX-SNP
optimization problems and a special approximation-
preserving reduction, the so-called L-reduction, that can
beusedtoshowMAX-SNP-hardnessofanoptimization
problem. The version of the L-reduction that we provide
below is a slightly modified but equivalent version that
appeared in Berman and Schnitger (1992).
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
Definition
Papadimitriou and Yannakakis (1991) Given two opti-
mizationproblemsΠ andΠ?,wesaythatΠ L-reducesto
Π?if there are three polynomial-time procedures T1,T2,
T3andtwoconstantsaandb > 0suchthatthefollowing
two conditions are satisfied: (1) For any instance I of Π,
algorithm T1produces an instance I?= f(I) of Π?gen-
erated from T1such that the optima of I and I?, OPT(I)
andOPT(I?),denotedbyrespectively,satisfyOPT(I?) ≤
a · OPT(I). (2) For any solution of I?with cost c?, algo-
rithm T2produces another solution with a cost c??no
worse than c?, and algorithm T3produces a solution of
I of Π with cost c (possibly from the solution produced
by T2) satisfying |c − OPT(I)| ≤ b ·??c??− OPT(I?)??.
leminMAX-SNPL-reducestothatproblem.Theimpor-
tance of proving MAX-SNP-hardness results comes
from a result proved by Arora et al. Arora et al. (1998)
which shows that, assuming P?=NP, for every MAX-
SNP-hard minimization (resp., maximization) problem
there exists a constant ε > 0 such that no polynomial
time algorithm can achieve an approximation ratio bet-
ter than 1 + ε (resp., better than 1 − ε).
A special case of the ULP problem, namely when
h(e) = 1 for all e ∈ E, is the well-known MAX-CUT
problem. An instance of this problem is an undirected
graph G = (V,E). A valid solution is a set S ⊆ V. The
objective is to find a valid solution that maximizes the
number of edges {u,v} ∈ E such that |{u,v} ∩ S| = 1.
The MAX-CUT problem is known to be MAX-SNP-
hard. For further details on these topics, the reader is
referred to the excellent book by Vazirani (Vazirani,
2001).
1.
BermanandSchnitger (1992),
660
661
662
663
664
665
666
667
668
669
670
671
672
673
AnoptimizationproblemisMAX-SNP-hardifanyprob-
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks
into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001
Page 9
UNCORRECTED PROOF
taken is O(|V|2L. · (|V| + |E|)3), which is a polynomial
in |V| + |E| if L is a constant.
BIO 2594 1–18
BIO25941–18
B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx
9
SometerminologyThefollowingnotationwillbeused
fortheremainderofthepaper.GivenasetSofverticesin
adirectedgraphG,defineEout(S) = {(u,v) ∈ E(G)|u ∈
S}asthesetofout-boundedgesofverticesinS.OPTP(I)
denotes the size of an optimal solution for a problem P
with instance I. Recall that the length of a circuit c is
normally defined as the number of edges in the circuit.
Givenaweightfunctionw : E ?→ R,thelengthofcwith
respect to w is defined as?
4. Theoretical results
693
694
695
696
697
698
699
700
e∈cw(e).
701
702
Our theoretical results are summarized as follows.
703
Theorem 1.
704
(a) Forsomeconstantε > 0,itisnotpossibletoapprox-
imate in polynomial time the ULP and the DLP
problems to within an approximation ratio of 1 − ε
and 1 + ε, respectively, unless P = NP.
(b) For ULP, we provide a polynomial time α-
approximation algorithm where α ≈ 0.87856 is the
approximation factor for the MAX-CUT problem
obtained in Goemans and Williamson (1995) via
semidefinite programming.
(c) For DLP, if dmax
in
denotes the maximum in-degree of
any vertex in the graph, then we give a polynomial-
time approximation algorithm with an approxima-
tion ratio of at most dmax
in
705
706
707
708
709
710
711
712
713
714
715
716
· O(log|V|).
717
Our computational results are illustrated in Section 6 by
an implementation of the algorithms applied to a 13-
node Drosophila segmentation network, as well as to a
200+node recently published network of the Epidermal
Growth Factor Receptor pathway.
718
719
720
721
722
Remark 1. It should be noted that the complexity of
ULP becomes tractable if the network is biased signifi-
cantly towards excitatory connections. Obviously, if all
the edges of the given graph G = (V,E) are labeled 0,
then it is possible to label the vertices such that all the
edges are consistent. Moreover, given any graph G, it
is easy to check in O((|V| + |E|)3) time if an optimal
solution contains all the edges as consistent by solving
a set of linear equations via Gaussian elimination. Now,
suppose that at most L of the edges of G are labeled
1. Then, obviously at most L inconsistent edges exist
in any optimal solution. Thus a straightforward way to
solve the problem is to consider all possible subsets of
edges in which at most L edges are dropped and check-
ing, for each such subset, if there is an optimal solution
that contains all the edges as consistent. The total time
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
5. Proof of Theorem 1
741
This section provides the proof of Theorem 1, broken
up into a series of technical parts.
742
743
5.1. Proof of Theorem 1(a)
744
Based on the discussion in Section 3.1, it suffices
to show that both these problems are MAX-SNP-hard.
ULPisMAX-SNP-hardsinceitsspecialcase,theMAX-
CUTproblem,isMAX-SNP-hard.ToproveMAX-SNP-
hardnessofDLP,weneedthedefinitionsofthefollowing
two problems.
745
746
747
748
749
750
Problem 3 (Node deletion problem with bipartite prop-
erty (NDBP)). An instance of this problem is an undi-
rected graph G = (V,E). A valid solution is a vertex
set S ⊆ V, such that G(V − S) is a bipartite graph. The
objective is to find a valid solution minimizing |S|.
Problem 4
(Variance of node deletion problem
(VNDP)). An instance of this problem is (G,h) where
G = (V,E) is a directed graph and h : E → {0,1}. A
valid solutions is a vertex set S ⊆ V with the following
property: if GS= (VS,ES) is the graph with VS= V
and ES= E − Eout(S), then?
is to find a valid solution minimizing |S|.
First, we note that DLP is equivalent to VNDP. If one
identifies the solution set S in UNDP with the solution
set g(E − F) in DLP, then the set of consistent edges F
inDLPcorrespondstotheESinUNDPsinceeveryedge
(u,v) ∈ F satisfyingh(u,v) ≡ (f(u) + f(v))(mod2)is
equivalent to stating that?
Thus, to prove the MAX-SNP-hardness of DLP it
suffices to prove that of VNDP. NDBP is known to be
MAX-SNP-hard (Lund and Yannakakis, 1993). We pro-
videaL-reductionfromNDBPtoVNDP.Foraninstance
of VNDP with graph G = (V,E), construct an instance
of DLP with instance (G?,h) as follows (note that G?is
a digraph):
751
752
753
754
755
756
757
758
759
760
GSis free of odd length
761
circuit with respect to weight function h. The objective
762
763
764
765
766
767
768
GSis free of odd length circuit
769
with respect to weight function h.
770
771
772
773
774
775
776
777
V?= V(G?) = V ∪ {Au,v,Bu,v|{u,v} ∈ E},
E?= E(G?)
= {(u,Au,v),(Au,v,Bu,v),(v,Bu,v)|{u,v} ∈ E},
and h(e) = 1 for all e ∈ E?Now, the following
holds:
778
779
780
781
782
(1) If S is a solution to NDBP, it is also a solution
to the generated instance of UNDP. The reason
783
784
Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks
into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001
Page 10
UNCORRECTED PROOF
: xv∈ R|V|.
BIO 2594 1–18
BIO25941–18
10
B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx
is as follows. Notice that every odd length (resp.,
even length) circuit C in G corresponds to an odd
length (resp., even length) circuit C?in?
is a bipartite graph, it is free of odd length circuits.
So for each odd length cycle C of G, there exists
u ∈ S such that the deletion of all out-bound edges
of u in G?breaks its corresponding odd length cycle
C?.
(2) If S?is a solution to UNDP, then we can construct
a solution S of NDBP in the following manner: for
each x ∈ S?:
ifx = Au,v,addutoT;ifx = Bu,v,addvtoT;
ifx = uorx = v,addxtoT.
785
786
G?with
787
respect to the weight function h. Since G(V − S)
788
789
790
791
792
793
794
795
796
797
798
It is now easy to see that since the graph?
odd length circuit either.
Hence, we have OPTUNDP(G?,h) ≤ OPTNDBP(G).
Moreover, given a solution S?of UNDP, we are able
to generate a solution S of NDBP such that
GS? is free of
799
odd length circuit with respect to h, G(V − S) has no
800
801
802
803
804
||S| − OPTNDBP(G)| ≤ ||S?| − OPTUNDP(G?,h)|.
Thus, our reduction satisfies Definition 1 of a L-
reduction with a = b = 1.
805
806
807
5.2. Proof of Theorem 1(b)
808
Our algorithm for ULP uses the semidefinite pro-
gramming (SDP) technique used by Goemans and
Williamson in Goemans and Williamson (1995); hence
we use notations and terminologies similar to that used
in the paper (readers not very familiar with this tech-
nique are also referred to the excellent explanation of
this technique in the book by Vazirani Vazirani (2001)).
For each vertex v ∈ V, we have a real vector xv∈ R|V|
with ||xv||2= 1. Then, we can generate from ULP the
following vector program (where · denotes the vector
inner product):
809
810
811
812
813
814
815
816
817
818
SolvethefollowingvectorprogramviaSDP
methods:
maximize1
2
h(u,v)=1
subject to : for eachv ∈ V : xv· xv= 1for eachv ∈ V
?
(1−xu· xv)+1
2
?
h(u,v)=0
(1+xu· xv)
Select a uniformly random vector r in the
|V|-dimensional unit sphere and set
?
1 otherwise
f(v) =
0 ifr · xv≥ 0
This proof of the claimed approximation performance
of the above vector program is obtained by adapting the
proof in Section 26.5 of Vazirani (2001) for the MAX-
2SAT problem to deal with fact that, in our problem,
aij= bij= 1/2 as opposed to a different set of values in
Vazirani(2001).Sincetherearesomesubtletiesinadapt-
ing that proof for readers unfamiliar with this approach,
weprovideasketchoftheproofinAppendixA.Thepro-
cedure can be derandomized via methods of conditional
probabilities (e.g., see Mahajan and Ramesh (1995)).
819
820
821
822
823
824
825
826
827
828
5.3. Proof of Theorem 1(c)
829
For an instance of (G,h) of DLP, construct instance
(G?= (V?,E?),h?) as follows:
V?= V ∪ {Cu,v|(u,v) ∈ E&h(u,v) = 0},
E?= {e|e ∈ E&h(e) = 1} ∪ {(u,Cu,v),
×(Cu,v,v)|(u,v) ∈ E&h(u,v) = 0},
and
830
831
832
833
834
835
h?(e) = 1for alle ∈ E?.
Note that every odd (resp., even) length circuit in G with
respecttoweightfunctionhcorrespondstoanodd(resp.,
even)lengthcircuitinG?withrespecttoweightfunction
h?, and vice versa. Let F is a set of consistent edges in
(G,h) with a vertex labeling function f. Now, observe
the following:
836
837
838
839
840
841
842
(1) F?is a set of consistent edges in (G?,h?) with a
vertex labeling function f?with f?(x) = f(x) for
x ∈ V?∩ V andf?(Cu,v) = f(u) = f(v)foranedge
(u,v) ∈ F with h(u,v) = 0; thus, an edge (u,v) in
F correspond to an edge (u,v) in F?if h(u,v) = 1
andcorrespondtoapairofedges(u,Cu,v),(Cu,v,v)
in F?if h(u,v) = 0.
(2) If (u,v) ∈ E − F is an inconsistent edge in (G,h),
then the edge (Cu,v,v) in G?can always be made
consistent by choosing f?(Cu,v) = f(v).
Thus,ifF??isthesetofconsistentedgesobtainedfromF
following rules (1) and (2) above, then |g(E?− F??)| =
843
844
845
846
847
848
849
850
851
852
853
854
Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks
into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001
Page 11
UNCORRECTED PROOF
Fig. 5. The network associated to the Drosophila segment polarity, as proposed in von Dassow et al. (2000), Courtesy of N. Ingolia and PLoS. The
three edges that have been crossed have been chosen in order to let the remaining edges form an orthant monotone system.
BIO 2594 1–18
BIO25941–18
B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx
11
|g(E − F)| and thus OPTDLP(G?,h?) = OPTDLP(G,h).
ConsidertheNDBPproblemon?
consistent edges F?cannot contain an odd cycle of con-
sistent edges and thus provides a solution to NDBP on
?
OPTNDBP(?
and Yannakakis, 1993), i.e., we can find a solution
SNDBP(?
≤ O(log|V|) · OPTDLP(G,h).
Now,
855
G?.AnysolutiontoDLP
856
on (G?,h?) with vertex labeling function f?and set of
857
858
859
G?of size |g(E?− F?)|. Thus,
OPTNDBP(?
to within an approximation ratio of O(log|V?|) (Lund
860
G?) ≤ OPTDLP(G?,h?) = OPTDLP(G,h).
G?) can be approximated in polynomial time
861
862
863
864
G?) in polynomial time such that
|SNDBP(?
865
G?)| ≤ O(log|V?|) · OPTNDBP(?
G?)
866
867
868
SDLP(G,h) = SNDBP(G?)
869
× ∪ {u | ∃v ∈ SNDBP(G?),(u,v) ∈ E},
is obviously a solution to DLP on (G,h). Recall that
dmax
in
denotes the maximum in-degree of any vertex in
G. Thus,
870
871
872
873
|SDLP(G,h)| ≤ dmax
in
· |SNDBP(G?)|
· O(log|V|) · OPTDLP(G,h).
874
≤ dmax
in
875
876
6. Examples of applications of the ULP
algorithm
877
878
We have implemented the SDP-based algorithm for
calculating approximate solutions of the undirected
labeling problem using Matlab, and we illustrate this
879
880
881
algorithm with two applications to biological systems.
The first application concerns the relatively small-scale
13-variable digraph of a model of the Drosophila seg-
ment polarity network. A second application involves a
digraph with 300+ variables associated to the human
Epidermal Growth Factor Receptor (EGFR) signaling
network. This model was published recently and built
using information from 242 published papers. Finally,
we provide an example involving a yeast gene regula-
tory network.
882
883
884
885
886
887
888
889
890
891
6.1. Drosophila segment polarity
892
An important part of the development of the early
Drosophila (fruit fly) embryo is the differentiation of
cells into several stripes (or segments), each of which
eventually gives rise to an identifiable part of the body
such as the head, the wings, the abdomen, etc. Each seg-
ment then differentiates into a posterior and an anterior
part, in which case the segment is said to be polarized.
(This differentiation process continues up to the point
when all identifiable tissues of the fruit fly have devel-
oped.) Differentiation at this level starts with differing
concentrations of certain key proteins in the cells; these
proteinsformstripedpatternsbyreactingwitheachother
and by diffusion through the cell membranes.
A model for the network that is responsible for seg-
ment polarity (von Dassow et al., 2000) is illustrated
in Fig. 5. As explained above, this model is best stud-
ied when multiple cells are present interacting with each
other. But it is interesting at the one-cell level in its own
right—and difficult enough to study that analytic tools
seem mostly unavailable. The arrows with a blunt end
are interpreted as having a negative sign in our notation.
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks
into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001
Page 12
UNCORRECTED PROOF
intoamonotonesystemafterthedeletionofonly3nodes.
It is conceivable that this restricts the possible dynam-
BIO 2594 1–18
BIO25941–18
12
B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx
Furthermore,theconcentrationsofthemembrane-bound
and inter-cell traveling compounds PTC, PH, HH and
WG (membrane) on all cells have been identified in
the one-cell model (so that, say, HH→ PH is now in
the digraph). Finally, PTC acts on the reaction CI→
CN itself by promoting it without being itself affected,
which in our notation means PTC→+CN and PTC→−
CI.
The implementation. The Matlab implementation of
thealgorithmonthisdigraphwith13nodesand20edges
producedseveralpartitionswithasmanyas17consistent
edges. One of these possible partitions simply consists
of placing the three nodes ci, CI and CN in one set and
all other nodes in the other set, whereby the only incon-
sistent edges are CL→+wg, CL→+ptc, and PTC→+
CN. But note that it is desirable for the resulting open
loop system to have as simple remaining loops as possi-
ble after eliminating all inconsistent edges. In this case,
the remaining directed loops
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
EN
−
→ci
EN
+
→CI
→CI
+
→WG(membrane)
can still cause difficulties.
A second partition which generated 17 consistent
edges is that in which EN, hh, CN, and the membrane
compoundsPTC,PH,HHareononeset,andtheremain-
ing compounds on the other. The edges cut are ptc→+
PTC, CI→+CN and en→+EN, each of which elim-
inates one or several positive loops. By writing the
remaining consistent digraph in the form of a cascade, it
is easy to see that the only loop whatsoever remaining is
wg ↔ WG; this makes the analysis proposed in Enciso
and Sontag (2006) easier.
In this relatively low dimensional case we can prove
that in fact OPT = 17, as the results below will show.
Lemma 5. Any partition of the nodes in the digraph in
Fig. 5 generates at most 17 consistent edges.
+
→CN
+
→CN
−
→en
→wg
+
→EN
→
+
→en
932
−
→ci
+−+
933
WG
+
→EN
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
Proof. FromLemma3,asimplewaytoprovethisstate-
ment is by showing that there are three disjoint cycles
with odd weighted length in the network associated to
Fig. 5 (disjoint in the sense that no edge is part of more
than one of the cycles). Such three disjoint cycles exist
in this case, and they are CI-CN-wg, CI-ptc-PTC, CN-
en-EN-hh-HH-PH-PTC.
?
It is surprising that a realistic biological system with as
many as 13 variables and 20 edges can be transformed
950
951
952
953
954
955
956
957
958
959
960
ics of the system. This is especially the case given that
the open loop digraph has almost no closed oriented
paths (except for WG ↔ wg), which is evidence that
thedynamicsofthecontrolsystemunderconstantinputs
maybeespeciallysimple,e.g.suchthatallsolutionscon-
verge towards a unique equilibrium.
961
962
963
964
965
966
6.1.1. Multiple copies
It was mentioned above that the purpose of this
network is to create striped patterns of protein con-
centrations along multiple cells. In this sense, it is
most meaningful to consider a coupled collection
of networks as it is given originally in Figs. 6 and 5.
Considerarowofkcells,eachofwhichhasindependent
concentration variables for each of the compounds, and
let the cell-to-cell interactions be as in Fig. 5 with cyclic
boundary conditions (that is, the kth cell is coupled
with the first in the natural way). We show that the
results can be extended in a very similar manner as
before.
Given a partition fVof the one-cell network consid-
ered above, letˆfVbe the partition of the k-cell network
defined byˆfV(eni) := fV(en) for every i, etc. ThusˆfV
consists of k copies of the partition fVin a natural way.
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
Lemma 6. Let fVbe a partition of the nodes of the 1-
cell network with n consistent edges. Then with respect
984
985
Fig.6. AdiagramoftheDrosophilaembryoduringearlydevelopment.
EachhexagonrepresentsacellcontainingacopyofthenetworkinFig.
6, and neighboring cells interact to form a collective behavior. In this
example, an initial striped pattern of the genes en and wg induces the
productionofthegenehh,butonlyinthosecellsthatareproducingen.
This will further strengthen the pattern of stripes and help differentiate
the various tissues. Courtesy of N. Ingolia and PLoS (Ingolia, 2004).
Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks
into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001
Page 13
UNCORRECTED PROOF
include any of the two edges (WGmem,en) and (HH,PH), which con-
nect the networks of different cells in Fig. 5; this will be important in
the proof of Lemma 7.
BIO 2594 1–18
BIO25941–18
B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx
13
to the partitionˆfV, there are exactly kn consistent edges
for the k-cell coupled model.
986
987
Proof. Consider the network consisting of k isolated
copies of the network, that is, k groups of nodes each of
whichisconnectedexactlyasintheone-cellcase.Under
the partitionˆfV, this network has exactly kn consistent
edges.Toarrivetothecouplednetwork,itissufficientto
replacealledgesoftheform(HHi,PHi)by(HHi+1,PHi)
and(WGi,eni)by(WGi+1,eni),i = 1,...,k(wherewe
identifyk + 1with1).SincebydefinitionˆfV(HHi+1) =
ˆfV(HHi) andˆfV(WGi+1) =ˆfV(WGi), the consistency
of these edges does not change, and the number of con-
sistent edges therefore remains constant.
988
989
990
991
992
993
994
995
996
997
?
998
In particular, OPT≥ 17k for the coupled system. The
following result will establish an upper bound for OPT.
999
1000
Lemma 7. Any partition of the nodes in the digraph in
the k-cell coupled network generates at most 17k con-
sistent edges.
1001
1002
1003
Proof. Consider the signed graph in Fig. 7, which is a
sub-digraph of the network associated to Fig. 5. Since
the inter-cell edges (WGmem,en) and (HH,PH) are not
in this graph, it follows that there are k identical copies
of it in the k-cell model. If it is shown that at least three
edges need to be cut in each of these k sub-digraphs, the
result follows immediately.
Consider the negative cycle ci-CI-wg-CN-en-EN,
which must contain at least one inconsistent edge for
1004
1005
1006
1007
1008
1009
1010
1011
1012
Fig. 7. A sub-digraph of the network in Fig. 5, using the notation
defined in the previous sections. Note that this sub-digraph does not
anygivenpartition.Theremainingedgesofthesubgraph
form a tetrahedron with four negative parity triangles,
which cannot all be cut by eliminating any single edge.
If follows that no two edges can eliminate all negative
parity cycles in this signed graph, and that therefore
20k − 3k = 17k is an upper bound for the number of
consistent edges in the k-cell network.
1013
1014
1015
1016
1017
1018
1019
Corollary 1. For the k-cell linearly coupled network
described in Fig. 5, it holds OPT = 17k.
Proof. Follows from the previous two results.
1020
1021
?
1022
6.2. EGFR signaling
1023
The protein called epidermal growth factor is fre-
quently stored in epithelial tissues such as skin, and it is
releasedwhenrapidcelldivisionisneeded(forinstance,
it is mechanically triggered after an injury). Its function
istobindtoareceptoronthemembraneofthecells,aptly
calledtheepidermalgrowthfactorreceptor.TheEGFR,
ontheinnersideofthemembrane,hastheappearanceof
a scaffold with dozens of docks to bind with numerous
agents, and it starts a reaction of vast proportions at the
cell level that ultimately induces cell division.
In their May 2005 paper (Oda et al., 2005), Oda
et al. integrate the information that has become avail-
able about this process from multiple sources, and they
define a network with 330 known molecules under
211 chemical reactions. The network itself is available
from supplementary material in SBML format (Systems
Biology Markup Language, http://www.sbml.org), and
will most likely be subject to continuous updates. The
implementation. Each reaction in the network classifies
the molecules as reactants, products, and/or modifiers
(enzymes). This information was imported into Matlab
using the Systems Biology Toolbox. The digraph G that
is used for this analysis has many more edges than the
digraphconsideredinthedigraphdisplayedinOdaetal.
(2005). The reason for this is as follows: if molecules A
and B are both reactants in the same reaction, then the
presenceofAwillhaveanindirectinhibitingeffectonthe
concentration of B, since it will accelerate the reaction
which consumes B (assuming B is not also a product).
Therefore a negative edge must also appear from A to B,
and vice versa. Similarly, modifiers have an inhibiting
effect on reactants.
We thus define G by letting sign(i,j) = 1 if there
exists a reaction in which j is a product and i is either
a reactant or a modifier. We let sign(i,j) = −1 if there
exists a reaction in which j is a reactant, and i is also
either a reactant or a modifier. Similarly sign(i,j) = 0
if the nodes i,j are not simultaneously involved in any
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks
into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001
Page 14
UNCORRECTED PROOF
the out-edges of a node xican be potentially cut at the
expense of only one input u, by replacing all the appear-
BIO 2594 1–18
BIO25941–18
14
B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx
given reaction, and sign(i,j) is undefined (NaN) if the
first two conditions above are both satisfied.
In a few of the reactions of this network there is a
modifier or a reactant involved which has an inhibitory
effect in the reaction. The effect of this compound on
the remaining participants of the reaction is the opposite
from that described above. Determining which com-
poundswereinhibitorsinthereactionwasdifficultgiven
the nature of this dataset. Therefore the digraph was cor-
rected by hand in this implementation by looking at the
annotations given for each reaction.
Anundefinededgecanbethoughtofasanedgethatis
bothpositiveandnegative,anditcanbedealtwith,given
an arbitrary partition, by deleting exactly one of the two
signed edges so that the remaining edge is consistent.
Thus, in practice, one can consider undefined edges as
edges with sign 0, and simply add the number of unde-
fined edges to the number of inconsistent edges in the
end of each procedure, in order to form the total number
of inputs. This is the approach followed here; there are
exactly seven such entries in the digraph G.
The results. After running the algorithm several hun-
dred times for this problem, and choosing that partition
which produced the highest number of consistent edges,
theinducedconsistentsetcontained636outof855edges
(ignoring the edges on the diagonal and the 7 undefined
edges).SeesupplementarymaterialfortherelevantMat-
lab functions that carry out this algorithm. A procedure
analogous to that carried out for system (5) allows to
decompose the system as the feedback loop of a con-
trolledmonotonesystemusing855 − 636 = 219inputs.
Sincetheinducedconsistentsetismaximalbydefinition,
Proposition 2 guarantees that the function h is a negative
feedback.
Contrary to the previous application, many of the
reactions involve several reactants and products in a sin-
gle reaction. This induces a denser amount of negative
and positive edges: even though there are 211 reactions,
there are 855 (directed) edges in the 330 × 330 graph G.
It is very likely that this substantially decreases OPT for
this system.
TheapproximationratiooftheSDPalgorithmisguar-
anteed to be at least 0.87 for some r, which gives the
estimate OPT≤≈ 636/0.87 ≈ 731 (valid to the extent
thatrhassampledtherightareasofthe330-dimensional
sphere, but reasonably accurate in practice).
One procedure that can be carried out to lower the
number of inputs is a hybrid algorithm involving out-
hubs, that is, nodes with an abnormally high out-degree.
RecallfromthedescriptionoftheDLPalgorithmthatall
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
ances of xiin fj(x), j ?= i, by u. We considered the k
nodes with the highest out-degrees, and eliminated all
the out-edges associated to these hubs from the reaction
digraph to form the graph G1. Then we run the ULP
algorithm on G1to find a partition fVof the nodes and
a set of m edges that can be cut to eliminate all remain-
ing negative closed chains. Finally, we put back on the
digraph those edges that were taken in the first step, and
whichareconsistentwithrespecttothepartitionfV.The
result is a decomposition of the system as the negative
feedback loop of a controlled monotone system, using
at most k + m edges.
An implementation of this algorithm with k = 60
yieldedatotalmaximumnumberofinputsk + m = 136.
This is a significant improvement over the 226 inputs
in the original algorithm. Clearly, it would be worth-
while to investigate further the problem of designing
efficient algorithms for the DLP problem to generate
improved hybrid algorithmic approaches. The approx-
imation ratios in Theorem 1(c) are not very satisfactory
since dmax
in
and log|V| could be large factors; hence
future research work may be carried out in designing
better approximation algorithms.
Weconcludewithanother,moretentativewaytodras-
tically reduce the number of inputs necessary to write
this system as the negative closed loop of a controlled
monotone system. The idea is to make suitable changes
ofvariablesintheoriginalsystemusingthemassconser-
vation laws. Such changes of variables are discussed in
manyplaces,forexampleinVolpertetal.(2000),Angeli
and Sontag (2003). In terms of the associated digraph,
the result of the change of variables is often the elimina-
tion of one of the closed chains. The simplest target for
a suitable change of variables is a set of three nodes that
formpartofthesamechemicalreaction,forinstancetwo
reactants and one product, or one reactant, one product
and one modifier. It is easy to see that such nodes are
connected in the associated digraph by an odd length
triangle of three edges.
In order to estimate the number of inputs that can
potentially be eliminated by suitable changes of vari-
ables, we counted pairwise disjoint, odd length triangles
inthedigraphoftheEGFRnetwork.Usingagreedyalgo-
rithmtofindandtagdisjointnegativefeedbacktriangles,
we found a maximal number of them in the subgraph
associatedtoeachofthe211chemicalreactions.Special
care was taken so that any two triangles from different
reactions were themselves disjoint. After carrying out
this procedure we found 196 such triangles in the EGFR
network.Thisisasurprisinglyhighnumber,considering
thateachofthesetrianglesmusthavebeenopenedinthe
ULP algorithm implementation above and that therefore
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks
into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001
Page 15
UNCORRECTED PROOF
100 negatives, leads to a less consistent network, with
115.4 ± 4.0 required deletions, or about 10.7% of the
BIO 2594 1–18
BIO25941–18
B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx
15
each triangle must contain 1 of the 226 edges cut. To
the extent to which most of these triangles can be elim-
inated by suitable changes of variables, this can yield a
much lower number of edges to cut, and it could pro-
vide a way to thus stress the underlying structure of the
system.
1166
1167
1168
1169
1170
1171
6.3. A yeast regulatory network
1172
As a final example, we run our algorithm on the yeast
Saccharomycescerevisiaegeneregulatorynetworkfrom
Milo et al. (2002), downloaded from Anon (2006). This
networkhas690nodesand1082edges,ofwhich221are
negative and 861 are positive (we labeled the one “neu-
tral” edge as positive; the conclusions will not change
if we labeled it negative instead, or we deleted this one
edge).
Our algorithm (with 200 randomizations) provides
an answer of 43 inconsistent edges, for the best partition
found. In other words, it shows that deleting a mere 4%
of edges makes the network consistent.
Also interesting is the following fact. The original
graph has 11 components: a large one of size 664, one
of size 5, three of size 3, and six of size 2. All of these
components remain connected after edge deletion. The
edges deleted all belong to the largest component, and
theyareincidentonatotalof65nodesinthiscomponent.
To better appreciate if this small number of deletions
might arise by chance, we also run our algorithm on
random graphs having 690 nodes and 1082 edges (cho-
sen uniformly), of which 221 edges (chosen uniformly)
are negative. We found that, for such random graphs,
about 12.6% (136.6 ± 5) of edges have to be removed
in order to achieve consistency. Thus, the number of
deletions needed in the biological network is roughly
15 standard deviations away from the mean for random
graphs.
Itwouldappearthatboththetopology(i.e.,theunder-
lying graph) and the actual sign assignments contribute
to this near-consistency of the yeast network. To jus-
tify this remark, we performed the following numerical
experiment. We randomly changed the signs of 50 posi-
tiveand50negativeedges,thusobtaininganetworkthat
has the same number of positive and negative edges,
and the same underlying graph, as the original yeast
network, but with 100 edges, picked randomly, hav-
ing different signs. Now, one needs 8.2% (88.3 ± 7.1)
deletions, an amount in-between that obtained for the
original yeast network and the one obtained for ran-
dom graphs. Changing more signs, 100 positives and
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
originaledges,althoughstillnotasmanyasforarandom
network.
1216
1217
Appendix A. More details on SDP algorithm
1218
In this appendix, we provide details regarding the
proof of the SDP algorithm for Theorem 1(b) described
in Section 5.2. The proof method is similar to that used
in better-known problems. For simplicity, we do not
describe the derandomization methods and provide a
proof for the expected approximation ratio only. Define
the following notations for convenience:
1219
1220
1221
1222
1223
1224
1225
• The vertex set V of the graph for ULP is simply
{1,2,...,|V|};
• fOPTisanoptimalvertexlabelingforULPwithFOPT
being the set of consistent edges;
• SDPOPTis the maximum value of the objective value
of the vector program
1226
1227
1228
1229
1230
1231
maximize1
2
?
= 0(1 + xu· xv)
h(u,v)=1
(1 − xu· xv) +1
2
?
h(u,v)
subject to : for eachv ∈ V : xv· xv= 1
for eachv ∈ V : xv∈ R|V|
1232
1233
It is easy to see that SDPOPT≥ |FOPT| as follows. For
every v ∈ V if fOPT(v) = 0 then set
1234
1235
xv= (1,0,0,...,0
?
whereas if fOPT(v) = 1 then set
???
|V|−1|
),
1236
1237
xv= (−1,0,0,...,0
????
|V|−1|
);
1238
this provides a solution for the vector program with an
objective value of precisely |FOPT|. Thus, it suffices if
we prove our claim on the approximation ratio relative
to SDPOPT.
Next, note that the vector program can indeed be
solved by a SDP approach. Let Y ∈ R|V|×|V|be an
unknown real matrix with yi,jdenoting the (i,j)th ele-
ment of Y. It is not difficult to see (via Cholesky decom-
positionforrealsymmetricmatrices)thattheabovevec-
tor program is equivalent to the followingsemidefinite
1239
1240
1241
1242
1243
1244
1245
1246
1247
Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks
into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001
View other sources
Hide other sources
- Available from cs.uic.edu
- Available from Bhaskar Dasgupta · May 20, 2014