# Algorithmic and complexity results for decompositions of biological networks into monotone subsystems.

**ABSTRACT** A useful approach to the mathematical analysis of large-scale biological networks is based upon their decompositions into monotone dynamical systems. This paper deals with two computational problems associated to finding decompositions which are optimal in an appropriate sense. In graph-theoretic language, the problems can be recast in terms of maximal sign-consistent subgraphs. The theoretical results include polynomial-time approximation algorithms as well as constant-ratio inapproximabil- ity results. One of the algorithms, which has a worst-case guarantee of 87.9% from optimality, is based on the semidefinite programming relaxation approach of Goemans- Williamson (23). The algorithm was implemented and tested on a Drosophila segmen- tation network and an Epidermal Growth Factor Receptor pathway model, and it was found to perform close to optimally.

**0**Bookmarks

**·**

**75**Views

- [Show abstract] [Hide abstract]

**ABSTRACT:**This paper (parts I and II) provides an expository introduction to monotone and near-monotone dynamical systems associated to biochemical networks, those whose graphs are consistent or near-consistent. Many conclusions can be drawn from signed network structure, associated to purely stoichiometric information and ignoring fluxes. In particular, monotone systems respond in a predictable fashion to perturbations and have robust and ordered dynamical characteristics, making them reliable components of larger networks. Interconnections of monotone systems may be fruitfully analyzed using tools from control theory, by viewing larger systems as interconnections of monotone subsystems. This allows one to obtain precise bifurcation diagrams without appeal to explicit knowledge of fluxes or of kinetic constants and other parameters, using merely "input/output characteristics" (steady-state responses or DC gains). The procedure may be viewed as a "model reduction" approach in which monotone subsystems are viewed as essentially one-dimensional objects. The possibility of performing a decomposition into a small number of monotone components is closely tied to the question of how "near" a system is to being monotone. We argue that systems that are "near monotone" may be more biologically more desirable than systems that are far from being monotone. Indeed, there are indications that biological networks may be much closer to being monotone than random networks that have the same numbers of vertices and of positive and negative edges.01/2007; - SourceAvailable from: Eduardo D Sontag[Show abstract] [Hide abstract]

**ABSTRACT:**Monotone subsystems have appealing properties as components of larger networks, since they exhibit robust dynamical stability and predictability of responses to perturbations. This suggests that natural biological systems may have evolved to be, if not monotone, at least close to monotone in the sense of being decomposable into a "small" number of monotone components, In addition, recent research has shown that much insight can be attained from decomposing networks into monotone subsystems and the analysis of the resulting interconnections using tools from control theory. This paper provides an expository introduction to monotone systems and their interconnections, describing the basic concepts and some of the main mathematical results in a largely informal fashion.Systems and Synthetic Biology 05/2007; 1(2):59-87. - SourceAvailable from: sissa.it[Show abstract] [Hide abstract]

**ABSTRACT:**In this paper we propose three different graph-theoretical decompositions of large-scale biologi-cal networks, all three aiming at highlighting specific dynamical properties of the system. The first consists in finding a maximal directed acyclic subgraph in the network, which dynamically cor-responds to searching for the maximal open-loop subsystem of the given system. The other two decompositions deal with the strong monotonicity property, and aim at decomposing the system into strongly monotone components with different structural characteristics: a single large strongly con-nected monotone subsystem in one case, and a set of smaller disjoint monotone subsystems in the other. For all three decompositions we provide original heuristic algorithms.09/2010;

Page 1

UNCORRECTED PROOF

0303-2647/$ – see front matter © 2006 Elsevier Ireland Ltd. All rights reserved.

doi:10.1016/j.biosystems.2006.08.001

BIO 2594 1–18

BIO25941–18

BioSystems xxx (2006) xxx–xxx

Algorithmic and complexity results for decompositions of

biological networks into monotone subsystems

3

4

Bhaskar DasGuptaa,1,∗, German Andres Encisob,2, Eduardo Sontagc,3, Yi Zhanga,1

5

aDepartment of Computer Science, University of Illinois at Chicago, Chicago, IL 60607, United States

bMathematical Biosciences Institute, 250 Mathematics Building, 231 W 18th Avenue, Columbus, OH 43210, United States

cDepartment of Mathematics, Rutgers University, New Brunswick, NJ 08903, United States

6

7

8

Received 23 January 2006; received in revised form 3 August 2006; accepted 3 August 2006

9

Abstract

10

A useful approach to the mathematical analysis of large-scale biological networks is based upon their decompositions into mono-

tone dynamical systems. This paper deals with two computational problems associated to finding decompositions which are optimal

in an appropriate sense. In graph-theoretic language, the problems can be recast in terms of maximal sign-consistent subgraphs.

The theoretical results include polynomial-time approximation algorithms as well as constant-ratio inapproximability results. One

of the algorithms, which has a worst-case guarantee of 87.9% from optimality, is based on the semidefinite programming relaxation

approachofGoemans–Williamson[Goemans,M.,Williamson,D.,1995.Improvedapproximationalgorithmsformaximumcutand

satisfiability problems using semidefinite programming. J. ACM 42 (6), 1115–1145]. The algorithm was implemented and tested on

a Drosophila segmentation network and an Epidermal Growth Factor Receptor pathway model, and it was found to perform close

to optimally.

© 2006 Elsevier Ireland Ltd. All rights reserved.

11

12

13

14

15

16

17

18

19

20

21

1. Introduction

22

In living cells, networks of proteins, RNA, DNA,

metabolites, and other species process environmental

signals, control internal events such as gene expres-

sion, and produce appropriate cellular responses. The

fieldofsystems(molecular)biologyislargelyconcerned

with the study of such networks, viewed as dynamical

systems. One approach to their mathematical analysis

23

24

25

26

27

28

29

∗Corresponding author. Tel.: +1 3123551319; fax: +1 3124130024.

E-mail addresses: dasgupta@cs.uic.edu (B. DasGupta),

yzhang3@cs.uic.edu (Y. Zhang), genciso@mbi.osu.edu

(G.A. Enciso), sontag@math.rutgers.edu (E. Sontag).

1Partly supported by NSF grants CCR-0296041, CCR-0206795,

CCR-0208749 and IIS-0346973.

2Work done while the author was with the Mathematics Depart-

ment of Rutgers University and partly supported by NSF grant CCR-

0206789.

3Partly supported by NSF grants EIA 0205116 and DMS-0504557.

relies upon viewing them as made up of subsystems

whosebehaviorissimplerandeasiertounderstand.Cou-

pled with appropriate interconnection rules, the hope is

that emergent properties of the complete system can be

deduced from the understanding of these subsystems.

Diagrammatically, we picture this as in Fig. 1, which

shows a full system as composed of four subsystems.

Aparticularlyappealingclassofcandidatesfor“sim-

pler behaved” subsystems are monotone systems, as in

Hirsch (1985, 1983) and Smith (1995). Monotone sys-

tems are a class of dynamical systems for which patho-

logical behavior (“chaos”) is ruled out. Even though

they may have arbitrarily large dimensionality, mono-

tonesystemsbehaveinmanywayslikeone-dimensional

systems. For instance, in monotone systems, bounded

trajectories generically converge to steady states, and

therearenostableoscillatorybehaviors.Moreprecisely,

see below, one must extend the notion of monotone sys-

tem so as to incorporate input and output channels, as

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

1

2

Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks

into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001

Page 2

UNCORRECTED PROOF

BIO 2594 1–18

BIO25941–18

2

B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx

Fig. 1. A system composed of four subsystems.

introduced and initially developed in Angeli and Sontag

(2003); inputs and outputs are required so that intercon-

nections like those shown in Fig. 1 can be defined.

Monotonicity is closely related, as explained later,

to positive and feedback loops in systems. The topic

of analyzing the behaviors of such feedback loops is a

long-standing one in biology in the context of regula-

tion,metabolism,anddevelopment;aclassicalreference

in that regard is the work (Monod and Jacob, 1961)

of Monod and Jacob in 1961. See also, for example,

Angeli et al. (2004), Angeli and Sontag (2004), Cinquin

and Demongeot (2002), Lewis et al. (1977), Meinhardt

(1978), Plathe et al. (1995), Remy et al. (2003), Snoussi

(1998) and Thomas (1978).

An interconnection of monotone subsystems, that is

to say, an entire system made up of monotone compo-

nents,mayormaynotbemonotone:“positivefeedback”

(in a sense that can be made precise) preserves mono-

tonicity, while “negative feedback” destroys it. Thus,

oscillators such as circadian rhythm generators require

negative feedback loops in order for periodic orbits to

arise, and hence are not themselves monotone systems,

although they can be decomposed into monotone sub-

systems (cf. Angeli and Sontag, 2004). A rich theory is

beginning to arise, characterizing the behavior of non-

monotone interconnections. For example, Angeli and

Sontag (2003) shows how to preserve convergence to

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

Fig. 2. A consistent and an inconsistent graph.

equilibria; see also the follow-up papers (Angeli et al.,

2004; Enciso et al., 2005; Enciso and Sontag, 2006;

Gedeon and Sontag, 2005; De Leenheer et al., 2005).

Even for monotone interconnections, the decomposi-

tion approach is very useful, as it permits locating and

characterizing the stability of steady states based upon

input/output behaviors of components, as described in

Angeli and Sontag (2004); see also the follow-up papers

(Angeli et al., 2004; Enciso and Sontag, 2005; De Leen-

heer and Malisoff, 2006).

Moreover, a key point brought up in Sontag (2004,

2005) is that new techniques for monotone systems in

many situations allow one to characterize the behavior

of an entire system, based upon the “qualitative” knowl-

edge represented by general network topology and the

inhibitory or activating character of interconnections,

combined with only a relatively small amount of quan-

titative data. The latter data may consist of steady-state

responses of components (dose-response curves and so

forth), and there is no need to know the precise form

of dynamics or parameters such as kinetic constants in

order to obtain global stability conclusions.

In Section 2 of this paper, we briefly discuss mono-

tonicity of systems described by ordinary differential

equations (the study of monotonicity can be extended

to partial differential equations, delay-differential equa-

tions, and even more arbitrary dynamical systems, see

e.g. Enciso and Sontag, 2006 in the context of mono-

tone systems with inputs and outputs). We explain there

how the study of monotone systems, and more generally

of decompositions into monotone systems, relates to a

sign-consistency property for the graph which describes

how each state variable influences each other variable in

a given system.

Generally, a graph, whose edges are labeled by “+”

or “−” signs (sometimes one writes +1,−1 instead of

+,−, or uses respectively activating “→” or inhibiting

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks

into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001

Page 3

UNCORRECTED PROOF

activated state or transcription factors. Assume now that

a perturbation instantaneously increases the value of the

BIO 2594 1–18

BIO25941–18

B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx

3

Fig. 3. Pulling-out inconsistent connections.

“?” arrows as shown in Fig. 2), is said to be sign-

consistent if all paths between any two nodes have the

samenetsign,orequivalently,allclosedloopshavepos-

itive parity, i.e. an even number, possibly 0, of negative

edges. (For technical reasons, one ignores the direction

of arrows, looking only at undirected graphs; see more

details in Section 2.) Thus, the first graph in Fig. 2 is

consistent, but the second one, which differs in just one

edge from the first one, is not (two paths with differ-

ent parity are possible from node 1 to node 4, a direct

odd one as well as an even one transversing nodes 2 and

3). Self-loops, which in biochemical systems often rep-

resent degradation terms, are ignored in this definition.

(We discuss this point further below.)

When applying decomposition theorems such as

those described in Angeli et al. (2004), Angeli et al.

(2004), Angeli and Sontag (2003, 2004), Enciso et al.

(2005), Enciso and Sontag (2005), Enciso and Sontag

(2006), Gedeon and Sontag (2005), De Leenheer et al.

(2005) and De Leenheer and Malisoff (2006), Sontag

(2004, 2005), it tends to be the case that the fewer the

numberofinterconnectionsamongcomponents,theeas-

ier it is to obtain useful conclusions. One may view a

decomposition into interconnections of monotone sub-

systems as the “pulling out” of “inconsistent” connec-

tions among monotone components, the original system

being a “negative feedback” loop around an otherwise

consistent system, as represented in Fig. 3. In this inter-

pretation, the number of interconnections among mono-

tonecomponentscorrespondstothenumberofvariables

being fed-back. In addition, and independently from the

theory developed in the above references, one might

speculate that nature tends to favor systems that are

decomposableintosmallmonotoneinterconnections(or

equivalently,haveasmallnumberofinconsistentpaths).

There are two reasons for this.

Fromadynamicalsystemsperspective,negativefeed-

back loops, although required for homeostasis and for

periodic behavior, have potentially destabilizing effects,

especially if there are signal propagation delays; thus,

minimizing their number is desirable.

Another advantage of consistency is as follows

(Sontag, in preparation). Suppose that the nodes in the

graphs shown in Fig. 2 represent concentrations of a

chemical species in a cell, such as receptors in a certain

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

concentration of node 1. For the graph on the left, the

instantaneous effect on the other nodes is predictable:

nodes 2 and 6 will increase, while nodes 3, 4, and 5

willdecrease.Thisunambiguousglobaleffectholdstrue

regardlessoftheactualalgebraicformsofreactions,val-

ues of parameters such and kinetic constants, etc. In

contrast, consider the graph shown on the right. Now

the net effect of an increase in node 1 is ambiguous. It is

impossible to know if node 4 will be repressed (because

of the direct edge from 1 to 4) or activated (because of

the indirect path). There is no way to resolve this ambi-

guity unless equations and precise parameter values are

assignedtothearrows.Sincecellsofthesametypediffer

in precise parameter values, due to varying concentra-

tions of ATP, enzymes, and other chemicals, two cells of

the same type may react in different ways to the same

“stimulus” (increase in concentration of chemical 1).

While such epigenetic diversity is sometimes desirable,

itmakesbehaviorlesspredictable.Fromanevolutionary

viewpoint, a “change in wiring” due to a mutation will

have an ambiguous effect, in this inconsistent network.

Ofcourse,oneshouldnotexpectlargenetworkstobe

globally consistent. However, if the number of inconsis-

tencies in a biological interaction graph is small, it may

well be the case that the network is in fact consistent

in a practical sense. For example, a gene regulatory net-

workrepresentsallpotentialeffectsamonggenes.These

effects are mediated by proteins which themselves may

need to be “activated” in order to perform their func-

tion, and this activation may, in turn, depend on certain

extracellular ligands being present. Thus, depending on

the particular combination of external signals present,

different subgraphs of the original graph describe the

system under those conditions, and these graphs may be

individually consistent. For example, for the system in

Fig.2,theedgefrom1to2maynotbepresentunderenvi-

ronmental conditions A, while the edge from 2 to 3 may

not be present under conditions B. Thus, under either

conditions, A or B, the graph would be consistent, even

though the entire network is not. See Sontag (in prepa-

ration) for more discussion of these issues. In summary,

consistencyinbiologicalnetworksmaybedesirable,and

therefore one might conjecture that true biological net-

works tend to maximize it. Evidence that this is indeed

the case is provided by Ma’ayan et al. (in preparation),

where the authors compare certain biological networks

andappropriatelyrandomizedversionsofthemandshow

that the original networks are closer to being consistent,

when consistency is measured using a simple heuristic.

In the last section of this paper, we apply our algorithms

to perform a similar analysis, and once again derive the

conclusion that nature seems to favor consistency.

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks

into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001

Page 4

UNCORRECTED PROOF

approximability used in the paper, leading to the state-

ment of our main theoretical results in Section 4, which

BIO 2594 1–18

BIO25941–18

4

B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx

Fig. 4. Dropping the diagonal edge gives consistency.

Thus, we are led to the subject of this paper, namely

computing the smallest number of edges that have to

be removed so that there remains a consistent graph.

For example, for the particular graph shown in Fig. 4

the answer is that one edge (the diagonal positive one)

suffices (in this case, the solution is unique: no single

other edge would suffice; in other problems, there may

be more than one optimizing solutions).

There has been other work dealing with efficient

knock-out strategies in biochemical reaction networks,

also formulated, as in this paper, as edge deletion prob-

lems. As an example, we mention the recent paper

(Klamt, 2006), which dealt with the question of iden-

tifying a minimal set of reactions whose removal would

block the operation of a prespecified reaction. The prob-

lem that we consider is completely different, however.

In this paper, we will study the computational com-

plexity of the question of how many edges must be

removed in order to obtain consistency, and we pro-

vide a relaxation-based polynomial-time approximation

algorithm guaranteed to solve the problem to about

87.9% of the optimum solution, which is based on

the semidefinite programming relaxation approach of

Goemans–WilliamsonGoemansandWilliamson(1995)

(A variant of the problem is discussed as well.) We also

observe that it is not possible to have a polynomial-time

algorithm with performance too close to the optimal.

While our emphasis is on theory, one of the algorithms

was implemented, and we show results of its applica-

tion to a Drosophila segmentation network and to an

Epidermal Growth Factor Receptor pathway model. It

turns out that, when applying the algorithm, often the

solution is much closer to optimal than the worst-case

guarantee of 87.9%, and indeed often gives an optimal

solution.

The remainder of this paper is organized as follows.

Section2brieflydiscussesmonotonicity.Thediscussion

is self-contained for the purposes of this paper, and ref-

erences are given to the dynamical systems results that

motivate the problem studied here. The connection to

consistency is also explained there. Section 3 discusses

the associated graph-theoretic problems and notions of

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

are proved in Section 5. Section 6 contains the men-

tioned examples of application of the algorithm. Finally,

in Section 6.3 we consider a yeast gene regulatory net-

work and various randomized versions of it, concluding

that the original network is far closer to consistent than

may be expected from chance alone. Several technical

proofs are separately provided in Appendix A.

255

256

257

258

259

260

261

2. Monotone systems and consistency

262

Wewillillustratethemotivationfortheproblemstud-

ied here using systems of ordinary differential equations

263

264

˙ x = F(x)

(the dot indicates time derivative, and x = x(t) is a vec-

tor), although the discussion applies as well to more

general types of dynamical systems such as delay-

differential systems or certain systems of reaction-

diffusion partial differential equations. In applications

to biological networks, the component xi(t) of the vec-

torx = x(t)indicatestheconcentrationoftheithspecies

in the model at time t.

Wewillrestrictattentiontomodelsinwhichthedirect

effect that one given variable in the model has over

another is unambiguous, in the sense that it is always

inhibitory or always promoting. Thus, if protein A binds

to the promoter region of gene B, we assume that it does

so either to prevent the transcription of the gene or to

facilitate it, no matter what are the respective concen-

trations. Mathematically, what we are saying is that we

require that for every i,j = 1,...,n, i ?= j, the partial

derivative ∂Fi/∂xjbe either ≥ 0 at all states or ≤ 0 at all

states.

Let us briefly discuss this non-ambiguity assump-

tion. First of all, we remark that this assumption does

not prevent protein A from having an indirect influ-

ence, through other molecules, perhaps dimmers of A

itself, that can ultimately lead to the opposite effect

on gene B from that of a direct connection. Indeed,

this is the whole point of studying graph consistency.

Second, in biomolecular networks, ambiguous signs in

Jacobians often represent heterogeneous mechanisms.

Forexample,takethecasewhereproteinAenhancesthe

transcriptionrateofgeneBonlyifitispresentatlowcon-

centrations, but represses B if its concentration is larger

than some threshold. A careful study of the chemical

mechanism often reveals the existence of an interme-

diate form (perhaps a homodimer) that is responsible

for this ambiguous effect. (Mathematically, an example

is a rate of transcription k1a − k2a2, where a denotes

the concentration of A.) Introducing a new species into

the model (mathematically, an additional state variable

(1)

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks

into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001

Page 5

UNCORRECTED PROOF

Lemma 1. Consider an orthant order ≤sgenerated by

s = (s1,...,sn). A system (1) is monotone with respect

BIO 2594 1–18

BIO25941–18

B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx

5

representing this intermediate form) reduces one to the

problem in which Jacobian entries are unambiguous. (In

ourexample,wewouldwritetherateask1a − k2c,where

c is the concentration of the dimer. In addition, there

would be a new equation such as dc/dt = k3a2− k4c

representingformationofthedimeranditsdegradation.)

Finally,wenotethatsmall-scalenegativeloopsareabun-

dant in nature. Self-loops or “auto repression” are an

extreme example of these, and appear as a consequence

of degradation and other effects. Regarding such self-

loops, observe that the requirement of a fixed sign for

Jacobian entries is not imposed on diagonal elements.

In fact, these elements play no role in the graph to be

introduced next, nor on monotonicity—the properties

of monotone systems are not affected by them. More

generally, it is often the case that small loops represent

fast dynamics which may be collapsed into a self-loops

via time-scale decomposition (singular perturbations or,

specificallyforenzymes,“quasi-steadystateapproxima-

tions”) and hence may be viewed and diagonal terms

which may be safely ignored. This is a modeling ques-

tion, to be settled before the algorithms studied here are

to be applied.

Given any partial order ≤ defined on Rn, a system

(1) is said to be monotone with respect to ≤ if x0≤

y0implies x(t) ≤ y(t) for every t ≥ 0. Here x(t), y(t)

are the solutions of (1) with initial conditions x0, y0,

respectively. Of course, whether a system is monotone

or not depends on the partial order being considered, but

weonesayssimplythatasystemismonotoneiftheorder

is clear from the context. Monotonicity with respect to

nontrivial orders rules out chaotic attractors and even

stable periodic orbits; see Hirsch (1985, 1983), Smith

(1995), and is, as discussed in the introduction, a useful

property for components when analyzing larger systems

in terms of subsystems.

A useful way to define partial orders in Rn, and the

only one to be further considered in this paper, is as fol-

lows. Given a tuple s = (s1,...,sn), where si∈ {1,−1}

for every i, we say that x ≤sy if sixi≤ siyifor every

i. For instance, the “cooperative order” is the orthant

order ≤sgenerated by s = (1,...,1). This is the order

≤ defined by x ≤ y if and only if xi≤ yifor all i =

1,...,n. It is not difficult to verify if a system is coop-

erative with respect to an orthant order; the following

lemma, known as “Kamke’s condition,” is not hard to

prove, see Smith (1995) for details (also Angeli and

Sontag, 2003 in the more general context of monotone

systems with input and output channels).

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

to ≤sif and only if

sisj∂Fj

∂xi

355

≥ 0,i,j = 1,...,n,i ?= j.

(2)

356

To provide intuition, let us sketch the sufficiency part

of the proof for the special case of the cooperative

order. Suppose by contradiction that the system is not

monotone, and that therefore there is a pair of ini-

tial conditions x0≤ y0whose solutions x(t), y(t) cease

to satisfy x(t) ≤ y(t) at some point. This implies that

at a certain critical moment in time t, there is some

coordinate i so that xi(t−) < yi(t−) but xi(t+) > yi(t+).

(This argument is not entirely accurate, but it gives

the flavor of the proof.) Thus xi(t) = yi(t) for some i

and the derivative with respect to time of xiis larger

than that of yi at time t, meaning that that Fi(x) >

Fi(y), where x = xi(t) and y = yi(t). However, this

cannot happen if Fiis increasing on all the variables

xj except possibly xi, so that x ≤ y,xi= yi implies

Fi(x) ≤ Fi(y). An equivalent way to phrase this con-

dition is by ask that ∂Fi/∂xj≥ 0 at all states for every

i,j,i ?= j, which is the Kamke condition for the special

case of the cooperative order. The name of the order

arises because in a monotone system with respect to that

order each species promotes or “cooperates” with each

other.

A rephrasing of this characterization of monotonicity

with respect to orthant orders can be given by looking at

the signed digraph G associated to (1). We define the

vertex set V(G) and the edge set E(G) of G as fol-

lows. Let V(G) = {1,...,n}, and given vertices i,j,

let (i,j) ∈ E(G) and fE(i,j) = 1 if both ∂Fj/∂xi≥ 0

and the strict inequality holds at least at one state.

Similarly let (i,j) ∈ E(G) and fE(i,j) = −1 if both

∂Fj/∂xi≤ 0andthestrictinequalityholdsatleastatone

state. Finally, let (i,j) ?∈ E(G) if ∂Fj/∂xi≡ 0. Recall

that we are assuming that one of the three cases must

hold.

Now we can define an orthant cone using any func-

tion fV: V(G) → {−1,1}, by letting x ≤fVy if and

only if fV(i)xi≤ fV(i)yifor all i. Given fV, we define

the consistency function g : E(G) → {true, false} by

g(i,j) = fV(i)fV(j)fE(i,j).Then,thefollowinganalog

of Lemma 1 holds.

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

Lemma 2. Consider a system (1) and an orthant cone

≤fV. Then (1) is monotone with respect to ≤fVif and

only if g(i,j) ≡ 1 on E(G).

Proof.

Let

si= fV(i),i = 1,...,n.

sisj∂fi/∂xj= 0 if (i,j) ?∈ E(G). For (i,j) ∈ E(G), it

holdsthatsisj∂fi/∂xj≥ 0ifandonlyifsisjfE(i,j) = 1,

397

398

399

Notethat

400

401

402

Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks

into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001

Page 6

UNCORRECTED PROOF

us consider the following biological model of testos-

terone dynamics (Enciso and Sontag, 2004; Murray and

BIO 2594 1–18

BIO25941–18

6

B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx

that is, if and only if g(i,j) = 1. The result follows from

Lemma 1.

?

403

404

For the next lemma, let the parity of a chain in G be the

productofthesigns(+1,−1)ofitsindividualedges.We

will consider in the next result closed undirected chains,

that is, sequences xi1,...,xirsuch that xi1= xir, and

such that for every λ = 1,...,r − 1 either (xiλ,xiλ+1) ∈

E(G) or (xiλ+1,xiλ) ∈ E(G).

The following lemma (see DeAngelis et al., 1986 as

well as Smith, 1988, page 101) is analogous to the fact

from vector calculus that path integrals of a vector field

are independent of the particular path of integration if

and only if there exists a potential function. Since the

result is key to the formulation of the problem being

considered,weprovideasimpleandself-containedproof

in Appendix A.

405

406

407

408

409

410

411

412

413

414

415

416

417

418

Lemma 3. Consider a dynamical system (1) with asso-

ciated directed graph G. Then (1) is monotone with

respect to some orthant order if and only if all closed

undirected chains of G have parity 1.

419

420

421

422

2.1. Systems with inputs and outputs

423

As we discussed in the introduction, a useful

approach to the analysis of biological networks consists

of decomposing a given system into an interconnection

of monotone subsystems. The formulation of the notion

of interconnection requires subsystems to be endowed

with “input and output channels” through which infor-

mation is to be exchanged. In order to address this we

consider controlled dynamical systems (Sontag, 1990)

which are systems with an additional parameter u ∈ Rm

and which have the form

424

425

426

427

428

429

430

431

432

433

˙ x = g(x,u).

The values of u over time are specified by means of

a function t → u(t) ∈ Rm, t ≥ 0, called an input or

control. Thus each input defines a time-dependent

dynamical system in the usual sense. To system (3)

there is associated a feedback function h : Rn→ Rm,

which is usually used to create the closed loop system

˙ x = g(x,h(x)). Finally, if Rn,Rmare ordered by orthant

orders ≤fV,≤qrespectively, we say that the system is

monotone if it satisfies (2) for every u, and also

(3)

434

435

436

437

438

439

440

441

442

443

qkfV(j)∂gj

∂uk

≥ 0,

for everyk,j

(4)

444

(see also Angeli and Sontag, 2003.) As an example, let

445

446

447

Mathematical Biology, 2002):

448

˙ x1=

˙ x3= c2x2− b3x3.

Drawing the digraph of this system, it is easy to see that

it is not monotone with respect to any orthant order,

as follows by application of Lemma 3. On the other

hand, replacing x3in the first equation by u, we obtain

a system that is monotone with respect to the orders

≤(1,1,1),≤(−1)for state and input respectively. Defining

h(x) = x3, the closed loop system of this controlled

system is none other than (5). The paper (Enciso and

Sontag, 2004) shows how, using this decomposition

together with the “small gain theorem” from monotone

input/output theory (Angeli and Sontag, 2003) leads

one to a proof that the system does not have oscillatory

behavior, even under arbitrary delays in the feedback

loop, contrary to the assertion made in Murray and

Mathematical Biology (2002).

We can carry out this procedure on an arbitrary sys-

tem (1) with a directed graph G, as follows: given a

set E of edges in G, enumerate the edges in ECas

(i1,j1),...,(im,jm). For every k = 1,...,m, replace

all appearances of xikin the function Fjkby the vari-

able uk, to form the function g(x,u). Define h(x) =

(xi1,...,xim).Itiseasytoseethatthiscontrolledsystem

(3) has closed loop (1).

Note that the controlled system (3) generated by the

setEasabovehas,asassociateddigraph,thesub-digraph

of G generated by E. This is because for every k, one has

∂gjk(x,u)/∂xik≡ 0, i.e., the edge from ikto jkhas been

“erased”.

Denote byˆG the underlying undirected graph of a

directed graph G obtained by ignoring the directions of

theedges.GivenasetE ⊆ V(G)ofverticesina(directed

or undirected) graph G, denote by G(E) the undirected

subgraph of G generated by E. The edges of bothˆG and

G(E) are labeled with ±1 using the labels in the edges

of G, whenever appropriate. Let E be called consistent if

ˆG(E) has no closed chains with parity −1. Note that this

isequivalenttotheexistenceoffVsuchthatg ≡ 1onE,

by Lemma 4 applied to the open loop system (3). If E is

consistent, then the associated system (3) itself can also

be shown to be monotone: to verify condition (4), sim-

ply define each qkso that (4) is satisfied for k,jk. Since

∂gjk/∂uk= ∂Fjk/∂xik?≡ 0, this choice is in fact unam-

biguous. Conversely, if (3) is monotone with respect to

the orthant orders ≤fV,≤q, then in particular it is mono-

tone for every fixed constant u, so that E is consistent by

Lemma 3. We thus have the following result.

A

K + x3

− b1x1,

˙ x2= c1x1− b2x2,

449

(5)

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks

into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001

Page 7

UNCORRECTED PROOF

of these types of problems, such as when the equations

areoverGF(p)foranarbitraryprimep > 2,whenthere

BIO 2594 1–18

BIO25941–18

B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx

7

Lemma 4. Let E be a set of edges of the digraph G.

Then E is consistent if and only if the corresponding

controlled system (3) is monotone with respect to some

orthant orders.

497

498

499

500

3. Statement of problem

501

A natural problem is therefore the following. Given

a dynamical system (1) that admits a digraph G, use

the procedure above to decompose it as the closed loop

of a monotone controlled system (3), while minimiz-

ing the number ?EC? of inputs. Equivalently, find fV

such that P(E+) = ?E+? is maximized and P(E−) =

?E−? = ?EC

problem formulation.

502

503

504

505

506

507

+? minimized. This produces the following

508

509

Problem 1 (Undirected labeling problem (ULP)). An

instance of this problem is (G,h), where G = (V,E) is

an undirected graph and h : E ?→ {0,1}. A valid solu-

tion is a vertex labeling function f : V → {0,1}. Define

anedge{u,v} ∈ Etobeconsistentiffh(u,v) ≡ (f(u) +

f(v)) (mod 2). The objective is then to find a valid solu-

tion maximizing |F| where F is the set of consistent

edges.

510

511

512

513

514

515

516

517

That ULP is a correct formulation for our problem is

confirmed by the following easy equivalence.

518

519

Proposition1. Consideraninstance(G,h)ofULPwith

an optimal solution having x consistent edges given by

a vertex labeling function f. Let D be a set of edges of

smallest cardinality that have to be removed such that

for the remaining graph, that is the graph G?= (V,E \

D) with the same vertex set V but an edge set E \ D,

there exists a vertex labeling function f?: V → {0,1}

that makes every edge consistent. Then, x = |E| − |D|.

Proof. Since f produces a solution of ULP with x con-

sistent edges, exactly |E| − x edges are inconsistent,

thus |D| ≤ |E| − x, that is, x ≤ |E| − |D|. Conversely,

since there is a solution with |E| − |D| consistent edges,

x ≥ |E| − |D|.

A special case of ULP, namely when h(e) = 1 for all

e ∈ E, is the MAX-CUT problem (defined in Section

3.1). Moreover, ULP can be posed as a special type of

“constraint satisfaction problem” as follows. We have

|E| linear equations over GF(2), one equation per edge

and each equation involving exactly two variables, over

|V|Booleanvariables.Thegoalistoassignvaluestothe

variables to satisfy the maximum number of equations.

Foralgorithmsandlower-boundresultsforgeneralcases

520

521

522

523

524

525

526

527

528

529

530

531

?

532

533

534

535

536

537

538

539

540

541

542

543

areanarbitrarynumberofvariablesperequationorwhen

the goal is to minimize the number of unsatisfied equa-

tions, see references such as Amaldi and Kann (1996),

BermanandKarpinski(2001),Creignouetal.(2001)and

Hastad and Venkatesh (2002) and the references therein.

Another interpretation (Sontag, in preparation) of

ULP is in statistical mechanics terms. Let us label edges

by “±1” instead of {0,1}, denoting by wuv= (−1)h(u,v)

theedgeparities,nowcalled“interactionenergies.”Sim-

ilarly, let us consider ±1-valued vertex labeling func-

tions, now called (magnetic) “spin configurations,” σ :

V → {−1,+1}, σ(v) = (−1)f(v). An edge {u,v} is con-

sistent provided that wuvσuσj= 1. A graph with ±1

weights is called an Ising spin-glass model in statistical

physics. A “non-frustrated” spin-glass model is one for

which there is a spin configuration for which every edge

is consistent (Barahona, 1982; Cipra, 2000; De Simone

et al., 1995; Istrail, 2000). This is the same as a consis-

tent graph in our sense. Moreover, a spin configuration

thatmaximizesthenumberofconsistentedgesisonefor

whichthe“freeenergy”(withnoexteriormagneticfield):

?

is minimized, a “ground state”. (When h(e) = 1 or

equivalently we= −1 for all edges, one has what

is called the “anti-ferromagnetic case”.) Thus, our

problem amounts to finding ground states.

Given orthant orders ≤fVand ≤q for Rnand Rm

respectively,wesaythatafeedbackfunctionhispositive

if x ≤fVy implies h(x) ≤qh(y), and that it is negative

if x ≤fVy implies h(x) ≥qh(y). It can be shown that

the closed loop of a monotone system with a positive

feedback function is actually itself monotone, so that no

system can be produced in this way that was not mono-

tonealready.Butifhisanegativefeedbackfunction,then

several results become available which use the methods

of monotone systems for systems that are not monotone,

seeAngeliandSontag(2003),EncisoandSontag(2004)

and Enciso and Sontag (2006). For the following result,

let (C,⊆) be the class of consistent subsets of E(G),

ordered under inclusion.

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

−

ij

wuvσuσv

565

566

567

568

569

570

571

572

573

574

575

576

577

578

579

580

581

582

583

Proposition 2.

maximal in (C,⊆) if and only if h is a negative feedback

function for every fVsuch that g ≡ 1 on E.

Proof.

Suppose that E is maximal, and let fV be

such that g ≡ 1 on E. Given any edge (ik,jk) ∈ EC, it

holds that g(ik,jk) = −1. Otherwise one could extend

E by adding (ik,jk), thus violating maximality. That

is, fV(ik)fV(jk)fE(ik,jk) = −1. By monotonicity, it

holds that qkfV(jk)∂gjk/∂uk≥ 0, and since ∂gjk/∂uk=

Let E be a consistent set. Then E is

584

585

586

587

588

589

590

591

592

Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks

into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001

Page 8

UNCORRECTED PROOF

∃y ∈ V,(u,y) ∈ C} for any C ⊆ E and F is the set of

consistent edges.

BIO 2594 1–18

BIO25941–18

8

B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx

∂Fjk/∂xik, it follows necessarily that

qkfV(jk)fE(ik,jk) = 1.

Therefore it must hold that qk= −fV(ik) for each k,

which implies that h is a negative feedback function.

Conversely, if fVis such that g ≡ 1 on E and h is a

negative feedback function, then qk= −fV(ik). By the

same argument as above, qkfV(jk)fE(ik,jk) = 1 for all

k by monotonicity. Therefore g ≡ −1 on EC. Repeating

this for all admissible fV, maximality follows.

593

594

595

596

597

598

599

600

?

601

There is a second, slightly more sophisticated way of

writing a system (1) as the feedback loop of a system (3)

using an arbitrary set of edges E. Given any such E,

define S(Ec) = {i|there is somejsuch that(i,j) ∈ Ec}.

Now enumerate S(Ec) as {i1,...,im}, and for each k

label the set {j|(ik,j) ∈ Ec} as jk1,jk2,.... Then for

each k,l, one can replace each appearance of xikin

Fjklby uk, to form the function g(x,u). Then one lets

h(x) = (xi1,...,xim) as above. The closed loop of this

system(3)isalso(1)asbeforebutwiththeadvantagethat

there are |S(Ec)| inputs, and of course |S(Ec)| ≤ |Ec|.

If E is a consistent and maximal set, then one can

make (3) into a monotone system as follows. By let-

ting fV be such that g ≡ 1 on E, we define the order

≤fVon Rn. For every ik,jklsuch that (ik,jkl) ∈ EC,

it must hold that fV(ik)fV(jkl)fE(ik,jkl) = −1. Other-

wise E ∪ {(ik,jkl)} would be consistent, thus violating

maximality.Bychoosingqk= −fV(ik),Eq.(4)isthere-

foresatisfied.SeetheproofofProposition2.Conversely,

if the system generated by E using this second algorithm

is monotone with respect to orthant orders, and if h is a

negative function, then it is easy to verify that E must be

both consistent and maximal.

Thus the problem of finding E consistent and such

that P(E−) = ?S(E−)? = ?S(EC)? is smallest, when

restricted to those sets that are maximal and consistent

(this does not change the minimum ?S(EC)?), is equiv-

alent to the following problem: decompose (1) into the

negative feedback loop of an orthant monotone control

system, using the second algorithm above, and using as

fewinputsaspossible.Thisproducesthefollowingprob-

lem formulation.

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

Problem 2

instance of this problem is (G,h) where G = (V,E) is

a directed graph and h : E → {0,1}. A valid solution

is a vertex labeling function f : V → {0,1}. Define an

edge (u,v) ∈ E to be consistent iff h(u,v) ≡ (f(u) +

f(v)) (mod 2). The objective is then to find a valid

solution minimizing |g(E − F)| where g(C) = {u ∈ V |

(Directed labeling problem (DLP)). An

634

635

636

637

638

639

640

641

642

3.1. Summary of key concepts and results in

approximation algorithms

643

644

Foranyγ ≥ 1(resp.γ ≤ 1),aγ-approximatesolution

(orsimplyanγ-approximation)ofaminimization(resp.,

maximization) problem is a solution with an objective

value no larger than γ times (resp., no smaller that

γ times) the value of the optimum, and an algorithm

achieving such a solution is said to have an approxima-

tion ratio of γ.

In Papadimitriou and Yannakakis (1991) Papadim-

itriou and Yannakakis defined the class of MAX-SNP

optimization problems and a special approximation-

preserving reduction, the so-called L-reduction, that can

beusedtoshowMAX-SNP-hardnessofanoptimization

problem. The version of the L-reduction that we provide

below is a slightly modified but equivalent version that

appeared in Berman and Schnitger (1992).

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

Definition

Papadimitriou and Yannakakis (1991) Given two opti-

mizationproblemsΠ andΠ?,wesaythatΠ L-reducesto

Π?if there are three polynomial-time procedures T1,T2,

T3andtwoconstantsaandb > 0suchthatthefollowing

two conditions are satisfied: (1) For any instance I of Π,

algorithm T1produces an instance I?= f(I) of Π?gen-

erated from T1such that the optima of I and I?, OPT(I)

andOPT(I?),denotedbyrespectively,satisfyOPT(I?) ≤

a · OPT(I). (2) For any solution of I?with cost c?, algo-

rithm T2produces another solution with a cost c??no

worse than c?, and algorithm T3produces a solution of

I of Π with cost c (possibly from the solution produced

by T2) satisfying |c − OPT(I)| ≤ b ·??c??− OPT(I?)??.

leminMAX-SNPL-reducestothatproblem.Theimpor-

tance of proving MAX-SNP-hardness results comes

from a result proved by Arora et al. Arora et al. (1998)

which shows that, assuming P?=NP, for every MAX-

SNP-hard minimization (resp., maximization) problem

there exists a constant ε > 0 such that no polynomial

time algorithm can achieve an approximation ratio bet-

ter than 1 + ε (resp., better than 1 − ε).

A special case of the ULP problem, namely when

h(e) = 1 for all e ∈ E, is the well-known MAX-CUT

problem. An instance of this problem is an undirected

graph G = (V,E). A valid solution is a set S ⊆ V. The

objective is to find a valid solution that maximizes the

number of edges {u,v} ∈ E such that |{u,v} ∩ S| = 1.

The MAX-CUT problem is known to be MAX-SNP-

hard. For further details on these topics, the reader is

referred to the excellent book by Vazirani (Vazirani,

2001).

1.

BermanandSchnitger (1992),

660

661

662

663

664

665

666

667

668

669

670

671

672

673

AnoptimizationproblemisMAX-SNP-hardifanyprob-

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

691

692

Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks

into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001

Page 9

UNCORRECTED PROOF

taken is O(|V|2L. · (|V| + |E|)3), which is a polynomial

in |V| + |E| if L is a constant.

BIO 2594 1–18

BIO25941–18

B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx

9

SometerminologyThefollowingnotationwillbeused

fortheremainderofthepaper.GivenasetSofverticesin

adirectedgraphG,defineEout(S) = {(u,v) ∈ E(G)|u ∈

S}asthesetofout-boundedgesofverticesinS.OPTP(I)

denotes the size of an optimal solution for a problem P

with instance I. Recall that the length of a circuit c is

normally defined as the number of edges in the circuit.

Givenaweightfunctionw : E ?→ R,thelengthofcwith

respect to w is defined as?

4. Theoretical results

693

694

695

696

697

698

699

700

e∈cw(e).

701

702

Our theoretical results are summarized as follows.

703

Theorem 1.

704

(a) Forsomeconstantε > 0,itisnotpossibletoapprox-

imate in polynomial time the ULP and the DLP

problems to within an approximation ratio of 1 − ε

and 1 + ε, respectively, unless P = NP.

(b) For ULP, we provide a polynomial time α-

approximation algorithm where α ≈ 0.87856 is the

approximation factor for the MAX-CUT problem

obtained in Goemans and Williamson (1995) via

semidefinite programming.

(c) For DLP, if dmax

in

denotes the maximum in-degree of

any vertex in the graph, then we give a polynomial-

time approximation algorithm with an approxima-

tion ratio of at most dmax

in

705

706

707

708

709

710

711

712

713

714

715

716

· O(log|V|).

717

Our computational results are illustrated in Section 6 by

an implementation of the algorithms applied to a 13-

node Drosophila segmentation network, as well as to a

200+node recently published network of the Epidermal

Growth Factor Receptor pathway.

718

719

720

721

722

Remark 1. It should be noted that the complexity of

ULP becomes tractable if the network is biased signifi-

cantly towards excitatory connections. Obviously, if all

the edges of the given graph G = (V,E) are labeled 0,

then it is possible to label the vertices such that all the

edges are consistent. Moreover, given any graph G, it

is easy to check in O((|V| + |E|)3) time if an optimal

solution contains all the edges as consistent by solving

a set of linear equations via Gaussian elimination. Now,

suppose that at most L of the edges of G are labeled

1. Then, obviously at most L inconsistent edges exist

in any optimal solution. Thus a straightforward way to

solve the problem is to consider all possible subsets of

edges in which at most L edges are dropped and check-

ing, for each such subset, if there is an optimal solution

that contains all the edges as consistent. The total time

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

738

739

740

5. Proof of Theorem 1

741

This section provides the proof of Theorem 1, broken

up into a series of technical parts.

742

743

5.1. Proof of Theorem 1(a)

744

Based on the discussion in Section 3.1, it suffices

to show that both these problems are MAX-SNP-hard.

ULPisMAX-SNP-hardsinceitsspecialcase,theMAX-

CUTproblem,isMAX-SNP-hard.ToproveMAX-SNP-

hardnessofDLP,weneedthedefinitionsofthefollowing

two problems.

745

746

747

748

749

750

Problem 3 (Node deletion problem with bipartite prop-

erty (NDBP)). An instance of this problem is an undi-

rected graph G = (V,E). A valid solution is a vertex

set S ⊆ V, such that G(V − S) is a bipartite graph. The

objective is to find a valid solution minimizing |S|.

Problem 4

(Variance of node deletion problem

(VNDP)). An instance of this problem is (G,h) where

G = (V,E) is a directed graph and h : E → {0,1}. A

valid solutions is a vertex set S ⊆ V with the following

property: if GS= (VS,ES) is the graph with VS= V

and ES= E − Eout(S), then?

is to find a valid solution minimizing |S|.

First, we note that DLP is equivalent to VNDP. If one

identifies the solution set S in UNDP with the solution

set g(E − F) in DLP, then the set of consistent edges F

inDLPcorrespondstotheESinUNDPsinceeveryedge

(u,v) ∈ F satisfyingh(u,v) ≡ (f(u) + f(v))(mod2)is

equivalent to stating that?

Thus, to prove the MAX-SNP-hardness of DLP it

suffices to prove that of VNDP. NDBP is known to be

MAX-SNP-hard (Lund and Yannakakis, 1993). We pro-

videaL-reductionfromNDBPtoVNDP.Foraninstance

of VNDP with graph G = (V,E), construct an instance

of DLP with instance (G?,h) as follows (note that G?is

a digraph):

751

752

753

754

755

756

757

758

759

760

GSis free of odd length

761

circuit with respect to weight function h. The objective

762

763

764

765

766

767

768

GSis free of odd length circuit

769

with respect to weight function h.

770

771

772

773

774

775

776

777

V?= V(G?) = V ∪ {Au,v,Bu,v|{u,v} ∈ E},

E?= E(G?)

= {(u,Au,v),(Au,v,Bu,v),(v,Bu,v)|{u,v} ∈ E},

and h(e) = 1 for all e ∈ E?Now, the following

holds:

778

779

780

781

782

(1) If S is a solution to NDBP, it is also a solution

to the generated instance of UNDP. The reason

783

784

Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks

into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001

Page 10

UNCORRECTED PROOF

: xv∈ R|V|.

BIO 2594 1–18

BIO25941–18

10

B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx

is as follows. Notice that every odd length (resp.,

even length) circuit C in G corresponds to an odd

length (resp., even length) circuit C?in?

is a bipartite graph, it is free of odd length circuits.

So for each odd length cycle C of G, there exists

u ∈ S such that the deletion of all out-bound edges

of u in G?breaks its corresponding odd length cycle

C?.

(2) If S?is a solution to UNDP, then we can construct

a solution S of NDBP in the following manner: for

each x ∈ S?:

ifx = Au,v,addutoT;ifx = Bu,v,addvtoT;

ifx = uorx = v,addxtoT.

785

786

G?with

787

respect to the weight function h. Since G(V − S)

788

789

790

791

792

793

794

795

796

797

798

It is now easy to see that since the graph?

odd length circuit either.

Hence, we have OPTUNDP(G?,h) ≤ OPTNDBP(G).

Moreover, given a solution S?of UNDP, we are able

to generate a solution S of NDBP such that

GS? is free of

799

odd length circuit with respect to h, G(V − S) has no

800

801

802

803

804

||S| − OPTNDBP(G)| ≤ ||S?| − OPTUNDP(G?,h)|.

Thus, our reduction satisfies Definition 1 of a L-

reduction with a = b = 1.

805

806

807

5.2. Proof of Theorem 1(b)

808

Our algorithm for ULP uses the semidefinite pro-

gramming (SDP) technique used by Goemans and

Williamson in Goemans and Williamson (1995); hence

we use notations and terminologies similar to that used

in the paper (readers not very familiar with this tech-

nique are also referred to the excellent explanation of

this technique in the book by Vazirani Vazirani (2001)).

For each vertex v ∈ V, we have a real vector xv∈ R|V|

with ||xv||2= 1. Then, we can generate from ULP the

following vector program (where · denotes the vector

inner product):

809

810

811

812

813

814

815

816

817

818

SolvethefollowingvectorprogramviaSDP

methods:

maximize1

2

h(u,v)=1

subject to : for eachv ∈ V : xv· xv= 1for eachv ∈ V

?

(1−xu· xv)+1

2

?

h(u,v)=0

(1+xu· xv)

Select a uniformly random vector r in the

|V|-dimensional unit sphere and set

?

1 otherwise

f(v) =

0 ifr · xv≥ 0

This proof of the claimed approximation performance

of the above vector program is obtained by adapting the

proof in Section 26.5 of Vazirani (2001) for the MAX-

2SAT problem to deal with fact that, in our problem,

aij= bij= 1/2 as opposed to a different set of values in

Vazirani(2001).Sincetherearesomesubtletiesinadapt-

ing that proof for readers unfamiliar with this approach,

weprovideasketchoftheproofinAppendixA.Thepro-

cedure can be derandomized via methods of conditional

probabilities (e.g., see Mahajan and Ramesh (1995)).

819

820

821

822

823

824

825

826

827

828

5.3. Proof of Theorem 1(c)

829

For an instance of (G,h) of DLP, construct instance

(G?= (V?,E?),h?) as follows:

V?= V ∪ {Cu,v|(u,v) ∈ E&h(u,v) = 0},

E?= {e|e ∈ E&h(e) = 1} ∪ {(u,Cu,v),

×(Cu,v,v)|(u,v) ∈ E&h(u,v) = 0},

and

830

831

832

833

834

835

h?(e) = 1for alle ∈ E?.

Note that every odd (resp., even) length circuit in G with

respecttoweightfunctionhcorrespondstoanodd(resp.,

even)lengthcircuitinG?withrespecttoweightfunction

h?, and vice versa. Let F is a set of consistent edges in

(G,h) with a vertex labeling function f. Now, observe

the following:

836

837

838

839

840

841

842

(1) F?is a set of consistent edges in (G?,h?) with a

vertex labeling function f?with f?(x) = f(x) for

x ∈ V?∩ V andf?(Cu,v) = f(u) = f(v)foranedge

(u,v) ∈ F with h(u,v) = 0; thus, an edge (u,v) in

F correspond to an edge (u,v) in F?if h(u,v) = 1

andcorrespondtoapairofedges(u,Cu,v),(Cu,v,v)

in F?if h(u,v) = 0.

(2) If (u,v) ∈ E − F is an inconsistent edge in (G,h),

then the edge (Cu,v,v) in G?can always be made

consistent by choosing f?(Cu,v) = f(v).

Thus,ifF??isthesetofconsistentedgesobtainedfromF

following rules (1) and (2) above, then |g(E?− F??)| =

843

844

845

846

847

848

849

850

851

852

853

854

Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks

into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001

Page 11

UNCORRECTED PROOF

Fig. 5. The network associated to the Drosophila segment polarity, as proposed in von Dassow et al. (2000), Courtesy of N. Ingolia and PLoS. The

three edges that have been crossed have been chosen in order to let the remaining edges form an orthant monotone system.

BIO 2594 1–18

BIO25941–18

B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx

11

|g(E − F)| and thus OPTDLP(G?,h?) = OPTDLP(G,h).

ConsidertheNDBPproblemon?

consistent edges F?cannot contain an odd cycle of con-

sistent edges and thus provides a solution to NDBP on

?

OPTNDBP(?

and Yannakakis, 1993), i.e., we can find a solution

SNDBP(?

≤ O(log|V|) · OPTDLP(G,h).

Now,

855

G?.AnysolutiontoDLP

856

on (G?,h?) with vertex labeling function f?and set of

857

858

859

G?of size |g(E?− F?)|. Thus,

OPTNDBP(?

to within an approximation ratio of O(log|V?|) (Lund

860

G?) ≤ OPTDLP(G?,h?) = OPTDLP(G,h).

G?) can be approximated in polynomial time

861

862

863

864

G?) in polynomial time such that

|SNDBP(?

865

G?)| ≤ O(log|V?|) · OPTNDBP(?

G?)

866

867

868

SDLP(G,h) = SNDBP(G?)

869

× ∪ {u | ∃v ∈ SNDBP(G?),(u,v) ∈ E},

is obviously a solution to DLP on (G,h). Recall that

dmax

in

denotes the maximum in-degree of any vertex in

G. Thus,

870

871

872

873

|SDLP(G,h)| ≤ dmax

in

· |SNDBP(G?)|

· O(log|V|) · OPTDLP(G,h).

874

≤ dmax

in

875

876

6. Examples of applications of the ULP

algorithm

877

878

We have implemented the SDP-based algorithm for

calculating approximate solutions of the undirected

labeling problem using Matlab, and we illustrate this

879

880

881

algorithm with two applications to biological systems.

The first application concerns the relatively small-scale

13-variable digraph of a model of the Drosophila seg-

ment polarity network. A second application involves a

digraph with 300+ variables associated to the human

Epidermal Growth Factor Receptor (EGFR) signaling

network. This model was published recently and built

using information from 242 published papers. Finally,

we provide an example involving a yeast gene regula-

tory network.

882

883

884

885

886

887

888

889

890

891

6.1. Drosophila segment polarity

892

An important part of the development of the early

Drosophila (fruit fly) embryo is the differentiation of

cells into several stripes (or segments), each of which

eventually gives rise to an identifiable part of the body

such as the head, the wings, the abdomen, etc. Each seg-

ment then differentiates into a posterior and an anterior

part, in which case the segment is said to be polarized.

(This differentiation process continues up to the point

when all identifiable tissues of the fruit fly have devel-

oped.) Differentiation at this level starts with differing

concentrations of certain key proteins in the cells; these

proteinsformstripedpatternsbyreactingwitheachother

and by diffusion through the cell membranes.

A model for the network that is responsible for seg-

ment polarity (von Dassow et al., 2000) is illustrated

in Fig. 5. As explained above, this model is best stud-

ied when multiple cells are present interacting with each

other. But it is interesting at the one-cell level in its own

right—and difficult enough to study that analytic tools

seem mostly unavailable. The arrows with a blunt end

are interpreted as having a negative sign in our notation.

893

894

895

896

897

898

899

900

901

902

903

904

905

906

907

908

909

910

911

912

Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks

into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001

Page 12

UNCORRECTED PROOF

intoamonotonesystemafterthedeletionofonly3nodes.

It is conceivable that this restricts the possible dynam-

BIO 2594 1–18

BIO25941–18

12

B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx

Furthermore,theconcentrationsofthemembrane-bound

and inter-cell traveling compounds PTC, PH, HH and

WG (membrane) on all cells have been identified in

the one-cell model (so that, say, HH→ PH is now in

the digraph). Finally, PTC acts on the reaction CI→

CN itself by promoting it without being itself affected,

which in our notation means PTC→+CN and PTC→−

CI.

The implementation. The Matlab implementation of

thealgorithmonthisdigraphwith13nodesand20edges

producedseveralpartitionswithasmanyas17consistent

edges. One of these possible partitions simply consists

of placing the three nodes ci, CI and CN in one set and

all other nodes in the other set, whereby the only incon-

sistent edges are CL→+wg, CL→+ptc, and PTC→+

CN. But note that it is desirable for the resulting open

loop system to have as simple remaining loops as possi-

ble after eliminating all inconsistent edges. In this case,

the remaining directed loops

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

929

930

931

EN

−

→ci

EN

+

→CI

→CI

+

→WG(membrane)

can still cause difficulties.

A second partition which generated 17 consistent

edges is that in which EN, hh, CN, and the membrane

compoundsPTC,PH,HHareononeset,andtheremain-

ing compounds on the other. The edges cut are ptc→+

PTC, CI→+CN and en→+EN, each of which elim-

inates one or several positive loops. By writing the

remaining consistent digraph in the form of a cascade, it

is easy to see that the only loop whatsoever remaining is

wg ↔ WG; this makes the analysis proposed in Enciso

and Sontag (2006) easier.

In this relatively low dimensional case we can prove

that in fact OPT = 17, as the results below will show.

Lemma 5. Any partition of the nodes in the digraph in

Fig. 5 generates at most 17 consistent edges.

+

→CN

+

→CN

−

→en

→wg

+

→EN

→

+

→en

932

−

→ci

+−+

933

WG

+

→EN

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

Proof. FromLemma3,asimplewaytoprovethisstate-

ment is by showing that there are three disjoint cycles

with odd weighted length in the network associated to

Fig. 5 (disjoint in the sense that no edge is part of more

than one of the cycles). Such three disjoint cycles exist

in this case, and they are CI-CN-wg, CI-ptc-PTC, CN-

en-EN-hh-HH-PH-PTC.

?

It is surprising that a realistic biological system with as

many as 13 variables and 20 edges can be transformed

950

951

952

953

954

955

956

957

958

959

960

ics of the system. This is especially the case given that

the open loop digraph has almost no closed oriented

paths (except for WG ↔ wg), which is evidence that

thedynamicsofthecontrolsystemunderconstantinputs

maybeespeciallysimple,e.g.suchthatallsolutionscon-

verge towards a unique equilibrium.

961

962

963

964

965

966

6.1.1. Multiple copies

It was mentioned above that the purpose of this

network is to create striped patterns of protein con-

centrations along multiple cells. In this sense, it is

most meaningful to consider a coupled collection

of networks as it is given originally in Figs. 6 and 5.

Considerarowofkcells,eachofwhichhasindependent

concentration variables for each of the compounds, and

let the cell-to-cell interactions be as in Fig. 5 with cyclic

boundary conditions (that is, the kth cell is coupled

with the first in the natural way). We show that the

results can be extended in a very similar manner as

before.

Given a partition fVof the one-cell network consid-

ered above, letˆfVbe the partition of the k-cell network

defined byˆfV(eni) := fV(en) for every i, etc. ThusˆfV

consists of k copies of the partition fVin a natural way.

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

Lemma 6. Let fVbe a partition of the nodes of the 1-

cell network with n consistent edges. Then with respect

984

985

Fig.6. AdiagramoftheDrosophilaembryoduringearlydevelopment.

EachhexagonrepresentsacellcontainingacopyofthenetworkinFig.

6, and neighboring cells interact to form a collective behavior. In this

example, an initial striped pattern of the genes en and wg induces the

productionofthegenehh,butonlyinthosecellsthatareproducingen.

This will further strengthen the pattern of stripes and help differentiate

the various tissues. Courtesy of N. Ingolia and PLoS (Ingolia, 2004).

Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks

into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001

Page 13

UNCORRECTED PROOF

include any of the two edges (WGmem,en) and (HH,PH), which con-

nect the networks of different cells in Fig. 5; this will be important in

the proof of Lemma 7.

BIO 2594 1–18

BIO25941–18

B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx

13

to the partitionˆfV, there are exactly kn consistent edges

for the k-cell coupled model.

986

987

Proof. Consider the network consisting of k isolated

copies of the network, that is, k groups of nodes each of

whichisconnectedexactlyasintheone-cellcase.Under

the partitionˆfV, this network has exactly kn consistent

edges.Toarrivetothecouplednetwork,itissufficientto

replacealledgesoftheform(HHi,PHi)by(HHi+1,PHi)

and(WGi,eni)by(WGi+1,eni),i = 1,...,k(wherewe

identifyk + 1with1).SincebydefinitionˆfV(HHi+1) =

ˆfV(HHi) andˆfV(WGi+1) =ˆfV(WGi), the consistency

of these edges does not change, and the number of con-

sistent edges therefore remains constant.

988

989

990

991

992

993

994

995

996

997

?

998

In particular, OPT≥ 17k for the coupled system. The

following result will establish an upper bound for OPT.

999

1000

Lemma 7. Any partition of the nodes in the digraph in

the k-cell coupled network generates at most 17k con-

sistent edges.

1001

1002

1003

Proof. Consider the signed graph in Fig. 7, which is a

sub-digraph of the network associated to Fig. 5. Since

the inter-cell edges (WGmem,en) and (HH,PH) are not

in this graph, it follows that there are k identical copies

of it in the k-cell model. If it is shown that at least three

edges need to be cut in each of these k sub-digraphs, the

result follows immediately.

Consider the negative cycle ci-CI-wg-CN-en-EN,

which must contain at least one inconsistent edge for

1004

1005

1006

1007

1008

1009

1010

1011

1012

Fig. 7. A sub-digraph of the network in Fig. 5, using the notation

defined in the previous sections. Note that this sub-digraph does not

anygivenpartition.Theremainingedgesofthesubgraph

form a tetrahedron with four negative parity triangles,

which cannot all be cut by eliminating any single edge.

If follows that no two edges can eliminate all negative

parity cycles in this signed graph, and that therefore

20k − 3k = 17k is an upper bound for the number of

consistent edges in the k-cell network.

1013

1014

1015

1016

1017

1018

1019

Corollary 1. For the k-cell linearly coupled network

described in Fig. 5, it holds OPT = 17k.

Proof. Follows from the previous two results.

1020

1021

?

1022

6.2. EGFR signaling

1023

The protein called epidermal growth factor is fre-

quently stored in epithelial tissues such as skin, and it is

releasedwhenrapidcelldivisionisneeded(forinstance,

it is mechanically triggered after an injury). Its function

istobindtoareceptoronthemembraneofthecells,aptly

calledtheepidermalgrowthfactorreceptor.TheEGFR,

ontheinnersideofthemembrane,hastheappearanceof

a scaffold with dozens of docks to bind with numerous

agents, and it starts a reaction of vast proportions at the

cell level that ultimately induces cell division.

In their May 2005 paper (Oda et al., 2005), Oda

et al. integrate the information that has become avail-

able about this process from multiple sources, and they

define a network with 330 known molecules under

211 chemical reactions. The network itself is available

from supplementary material in SBML format (Systems

Biology Markup Language, http://www.sbml.org), and

will most likely be subject to continuous updates. The

implementation. Each reaction in the network classifies

the molecules as reactants, products, and/or modifiers

(enzymes). This information was imported into Matlab

using the Systems Biology Toolbox. The digraph G that

is used for this analysis has many more edges than the

digraphconsideredinthedigraphdisplayedinOdaetal.

(2005). The reason for this is as follows: if molecules A

and B are both reactants in the same reaction, then the

presenceofAwillhaveanindirectinhibitingeffectonthe

concentration of B, since it will accelerate the reaction

which consumes B (assuming B is not also a product).

Therefore a negative edge must also appear from A to B,

and vice versa. Similarly, modifiers have an inhibiting

effect on reactants.

We thus define G by letting sign(i,j) = 1 if there

exists a reaction in which j is a product and i is either

a reactant or a modifier. We let sign(i,j) = −1 if there

exists a reaction in which j is a reactant, and i is also

either a reactant or a modifier. Similarly sign(i,j) = 0

if the nodes i,j are not simultaneously involved in any

1024

1025

1026

1027

1028

1029

1030

1031

1032

1033

1034

1035

1036

1037

1038

1039

1040

1041

1042

1043

1044

1045

1046

1047

1048

1049

1050

1051

1052

1053

1054

1055

1056

1057

1058

1059

1060

1061

Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks

into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001

Page 14

UNCORRECTED PROOF

the out-edges of a node xican be potentially cut at the

expense of only one input u, by replacing all the appear-

BIO 2594 1–18

BIO25941–18

14

B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx

given reaction, and sign(i,j) is undefined (NaN) if the

first two conditions above are both satisfied.

In a few of the reactions of this network there is a

modifier or a reactant involved which has an inhibitory

effect in the reaction. The effect of this compound on

the remaining participants of the reaction is the opposite

from that described above. Determining which com-

poundswereinhibitorsinthereactionwasdifficultgiven

the nature of this dataset. Therefore the digraph was cor-

rected by hand in this implementation by looking at the

annotations given for each reaction.

Anundefinededgecanbethoughtofasanedgethatis

bothpositiveandnegative,anditcanbedealtwith,given

an arbitrary partition, by deleting exactly one of the two

signed edges so that the remaining edge is consistent.

Thus, in practice, one can consider undefined edges as

edges with sign 0, and simply add the number of unde-

fined edges to the number of inconsistent edges in the

end of each procedure, in order to form the total number

of inputs. This is the approach followed here; there are

exactly seven such entries in the digraph G.

The results. After running the algorithm several hun-

dred times for this problem, and choosing that partition

which produced the highest number of consistent edges,

theinducedconsistentsetcontained636outof855edges

(ignoring the edges on the diagonal and the 7 undefined

edges).SeesupplementarymaterialfortherelevantMat-

lab functions that carry out this algorithm. A procedure

analogous to that carried out for system (5) allows to

decompose the system as the feedback loop of a con-

trolledmonotonesystemusing855 − 636 = 219inputs.

Sincetheinducedconsistentsetismaximalbydefinition,

Proposition 2 guarantees that the function h is a negative

feedback.

Contrary to the previous application, many of the

reactions involve several reactants and products in a sin-

gle reaction. This induces a denser amount of negative

and positive edges: even though there are 211 reactions,

there are 855 (directed) edges in the 330 × 330 graph G.

It is very likely that this substantially decreases OPT for

this system.

TheapproximationratiooftheSDPalgorithmisguar-

anteed to be at least 0.87 for some r, which gives the

estimate OPT≤≈ 636/0.87 ≈ 731 (valid to the extent

thatrhassampledtherightareasofthe330-dimensional

sphere, but reasonably accurate in practice).

One procedure that can be carried out to lower the

number of inputs is a hybrid algorithm involving out-

hubs, that is, nodes with an abnormally high out-degree.

RecallfromthedescriptionoftheDLPalgorithmthatall

1062

1063

1064

1065

1066

1067

1068

1069

1070

1071

1072

1073

1074

1075

1076

1077

1078

1079

1080

1081

1082

1083

1084

1085

1086

1087

1088

1089

1090

1091

1092

1093

1094

1095

1096

1097

1098

1099

1100

1101

1102

1103

1104

1105

1106

1107

1108

1109

1110

1111

1112

1113

ances of xiin fj(x), j ?= i, by u. We considered the k

nodes with the highest out-degrees, and eliminated all

the out-edges associated to these hubs from the reaction

digraph to form the graph G1. Then we run the ULP

algorithm on G1to find a partition fVof the nodes and

a set of m edges that can be cut to eliminate all remain-

ing negative closed chains. Finally, we put back on the

digraph those edges that were taken in the first step, and

whichareconsistentwithrespecttothepartitionfV.The

result is a decomposition of the system as the negative

feedback loop of a controlled monotone system, using

at most k + m edges.

An implementation of this algorithm with k = 60

yieldedatotalmaximumnumberofinputsk + m = 136.

This is a significant improvement over the 226 inputs

in the original algorithm. Clearly, it would be worth-

while to investigate further the problem of designing

efficient algorithms for the DLP problem to generate

improved hybrid algorithmic approaches. The approx-

imation ratios in Theorem 1(c) are not very satisfactory

since dmax

in

and log|V| could be large factors; hence

future research work may be carried out in designing

better approximation algorithms.

Weconcludewithanother,moretentativewaytodras-

tically reduce the number of inputs necessary to write

this system as the negative closed loop of a controlled

monotone system. The idea is to make suitable changes

ofvariablesintheoriginalsystemusingthemassconser-

vation laws. Such changes of variables are discussed in

manyplaces,forexampleinVolpertetal.(2000),Angeli

and Sontag (2003). In terms of the associated digraph,

the result of the change of variables is often the elimina-

tion of one of the closed chains. The simplest target for

a suitable change of variables is a set of three nodes that

formpartofthesamechemicalreaction,forinstancetwo

reactants and one product, or one reactant, one product

and one modifier. It is easy to see that such nodes are

connected in the associated digraph by an odd length

triangle of three edges.

In order to estimate the number of inputs that can

potentially be eliminated by suitable changes of vari-

ables, we counted pairwise disjoint, odd length triangles

inthedigraphoftheEGFRnetwork.Usingagreedyalgo-

rithmtofindandtagdisjointnegativefeedbacktriangles,

we found a maximal number of them in the subgraph

associatedtoeachofthe211chemicalreactions.Special

care was taken so that any two triangles from different

reactions were themselves disjoint. After carrying out

this procedure we found 196 such triangles in the EGFR

network.Thisisasurprisinglyhighnumber,considering

thateachofthesetrianglesmusthavebeenopenedinthe

ULP algorithm implementation above and that therefore

1114

1115

1116

1117

1118

1119

1120

1121

1122

1123

1124

1125

1126

1127

1128

1129

1130

1131

1132

1133

1134

1135

1136

1137

1138

1139

1140

1141

1142

1143

1144

1145

1146

1147

1148

1149

1150

1151

1152

1153

1154

1155

1156

1157

1158

1159

1160

1161

1162

1163

1164

1165

Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks

into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001

Page 15

UNCORRECTED PROOF

100 negatives, leads to a less consistent network, with

115.4 ± 4.0 required deletions, or about 10.7% of the

BIO 2594 1–18

BIO25941–18

B. DasGupta et al. / BioSystems xxx (2006) xxx–xxx

15

each triangle must contain 1 of the 226 edges cut. To

the extent to which most of these triangles can be elim-

inated by suitable changes of variables, this can yield a

much lower number of edges to cut, and it could pro-

vide a way to thus stress the underlying structure of the

system.

1166

1167

1168

1169

1170

1171

6.3. A yeast regulatory network

1172

As a final example, we run our algorithm on the yeast

Saccharomycescerevisiaegeneregulatorynetworkfrom

Milo et al. (2002), downloaded from Anon (2006). This

networkhas690nodesand1082edges,ofwhich221are

negative and 861 are positive (we labeled the one “neu-

tral” edge as positive; the conclusions will not change

if we labeled it negative instead, or we deleted this one

edge).

Our algorithm (with 200 randomizations) provides

an answer of 43 inconsistent edges, for the best partition

found. In other words, it shows that deleting a mere 4%

of edges makes the network consistent.

Also interesting is the following fact. The original

graph has 11 components: a large one of size 664, one

of size 5, three of size 3, and six of size 2. All of these

components remain connected after edge deletion. The

edges deleted all belong to the largest component, and

theyareincidentonatotalof65nodesinthiscomponent.

To better appreciate if this small number of deletions

might arise by chance, we also run our algorithm on

random graphs having 690 nodes and 1082 edges (cho-

sen uniformly), of which 221 edges (chosen uniformly)

are negative. We found that, for such random graphs,

about 12.6% (136.6 ± 5) of edges have to be removed

in order to achieve consistency. Thus, the number of

deletions needed in the biological network is roughly

15 standard deviations away from the mean for random

graphs.

Itwouldappearthatboththetopology(i.e.,theunder-

lying graph) and the actual sign assignments contribute

to this near-consistency of the yeast network. To jus-

tify this remark, we performed the following numerical

experiment. We randomly changed the signs of 50 posi-

tiveand50negativeedges,thusobtaininganetworkthat

has the same number of positive and negative edges,

and the same underlying graph, as the original yeast

network, but with 100 edges, picked randomly, hav-

ing different signs. Now, one needs 8.2% (88.3 ± 7.1)

deletions, an amount in-between that obtained for the

original yeast network and the one obtained for ran-

dom graphs. Changing more signs, 100 positives and

1173

1174

1175

1176

1177

1178

1179

1180

1181

1182

1183

1184

1185

1186

1187

1188

1189

1190

1191

1192

1193

1194

1195

1196

1197

1198

1199

1200

1201

1202

1203

1204

1205

1206

1207

1208

1209

1210

1211

1212

1213

1214

1215

originaledges,althoughstillnotasmanyasforarandom

network.

1216

1217

Appendix A. More details on SDP algorithm

1218

In this appendix, we provide details regarding the

proof of the SDP algorithm for Theorem 1(b) described

in Section 5.2. The proof method is similar to that used

in better-known problems. For simplicity, we do not

describe the derandomization methods and provide a

proof for the expected approximation ratio only. Define

the following notations for convenience:

1219

1220

1221

1222

1223

1224

1225

• The vertex set V of the graph for ULP is simply

{1,2,...,|V|};

• fOPTisanoptimalvertexlabelingforULPwithFOPT

being the set of consistent edges;

• SDPOPTis the maximum value of the objective value

of the vector program

1226

1227

1228

1229

1230

1231

maximize1

2

?

= 0(1 + xu· xv)

h(u,v)=1

(1 − xu· xv) +1

2

?

h(u,v)

subject to : for eachv ∈ V : xv· xv= 1

for eachv ∈ V : xv∈ R|V|

1232

1233

It is easy to see that SDPOPT≥ |FOPT| as follows. For

every v ∈ V if fOPT(v) = 0 then set

1234

1235

xv= (1,0,0,...,0

?

whereas if fOPT(v) = 1 then set

???

|V|−1|

),

1236

1237

xv= (−1,0,0,...,0

????

|V|−1|

);

1238

this provides a solution for the vector program with an

objective value of precisely |FOPT|. Thus, it suffices if

we prove our claim on the approximation ratio relative

to SDPOPT.

Next, note that the vector program can indeed be

solved by a SDP approach. Let Y ∈ R|V|×|V|be an

unknown real matrix with yi,jdenoting the (i,j)th ele-

ment of Y. It is not difficult to see (via Cholesky decom-

positionforrealsymmetricmatrices)thattheabovevec-

tor program is equivalent to the followingsemidefinite

1239

1240

1241

1242

1243

1244

1245

1246

1247

Pleasecitethisarticleas:BhaskarDasGuptaetal.,Algorithmicandcomplexityresultsfordecompositionsofbiologicalnetworks

into monotone subsystems, BioSystems (2006), doi:10.1016/j.biosystems.2006.08.001

#### View other sources

#### Hide other sources

- Available from Bhaskar Dasgupta · May 20, 2014
- Available from cs.uic.edu