Content uploaded by Bo Li

Author content

All content in this area was uploaded by Bo Li on Oct 02, 2021

Content may be subject to copyright.

arXiv:2108.00960v1 [math.OC] 2 Aug 2021

Bilevel Optimization in Flow Networks – A Message-passing Approach

Bo Li1and David Saad1

1Non-linearity and Complexity Research Group, Aston University, Birmingham, B4 7ET, United Kingdom

Optimizing embedded systems, where the optimization of one depends on the state of another, is

a formidable computational and algorithmic challenge, that is ubiquitous in real-world systems. We

study ﬂow networks, where bilevel optimization is relevant to traﬃc planning, network control and

design, and where ﬂows are governed by an optimization requirement subject to the network pa-

rameters. We employ message-passing algorithms in ﬂow networks with sparsely coupled structures

to adapt network parameters that govern the network ﬂows, in order to optimize a global objective.

We demonstrate the eﬀectiveness and eﬃciency of the approach on randomly generated graphs.

Many problems in science and engineering involve hier-

archical optimization or decision-making, whereby some

of the variables cannot be freely chosen but are governed

by another optimization problem [1]. As a motivating

example, consider the task of designing a network (e.g.,

a road or communication network) that maximizes the

throughput of commodities or information ﬂow. While

the designer controls the network parameters (upper-

level optimization), traﬃc ﬂows are determined by the

network users who maximize their own beneﬁt (lower-

level optimization) [2]. Therefore, the designer needs to

adapt the network intricately, taking into account of the

reaction of network users. Similarly, many physical sys-

tems admit a certain extremization principle for given

controllable system parameters, e.g., minimal free energy

in thermal equilibrium [3] given interaction strengths,

electric ﬂows in resistor networks that minimize the en-

ergy dissipation [4, 5] and, entropy maximization and

parameter optimization are used across disciplines in in-

ference and learning tasks [6, 7]. Adapting system pa-

rameters to extremize a given objective requires bilevel

optimization, which considers both system parameters

and the inherent optimization of the physical or human-

made system variables.

These examples of bilevel optimization are intrinsi-

cally diﬃcult to solve [8]. In fact, even the simple in-

stance where both levels are linear programming tasks

was shown to be NP-hard [9, 10]. Generic methods for

bilevel optimization include (i) expressing the lower-level

optimization problem as nonlinear constraints and solv-

ing the bilevel problem as global optimization [11, 12]; (ii)

gradient-descent method by computing the descent direc-

tion of the upper-level objectives while keeping the valid

lower-level state variables [13, 14]. The former introduces

complicated nonlinear constraints, making the reduced

single-level problem diﬃcult to solve in general, while

the latter is generally challenging due to the diﬃculty

in computing the descent direction [8]. Moreover, such

generic methods do not utilize existing system structure

to simplify the task. In this Letter, we consider ﬂow op-

timization problems on sparse networks; by virtue of the

sparsely-coupled structures, the message-passing (MP)

approach, an inherently distributed algorithm, appears

to be eﬀective and eﬃcient in both single and bilevel

optimization, as demonstrated below for applications in

routing and ﬂow control.

Routing Game – We focus on a network planning prob-

lem in the routing game setting, widely used in modeling

route choices of drivers [15]. Users on the road network

make their route choices in a selﬁsh and rational manner,

where the corresponding Nash equilibrium is generally

not the most beneﬁcial for the system utility as a whole,

measured by the total travel time of all users [2, 16].

The operator’s task is to set the appropriate tolls or re-

wards on network edges in order to reduce the total travel

time while taking into account the reactions of users to

the tolls. The toll-setting problem has attracted signiﬁ-

cant interest in the ﬁeld of traﬃc engineering and oper-

ations research, where theoretical results are limited to

simple networks or cannot accommodate eﬀectively toll

constraints [17–19]. Recently, the idea of reducing traﬃc

congestion by economic incentives to inﬂuence drivers’

behaviors has regained interest [20–22], partly due to the

deployment of smart devices and data availability [23–

25]. Here, we focus on the algorithmic aspect of toll

optimization.

The road network is represented by a directed graph

G(V, E ), where Vis the set of nodes (junctions) and E

the set of directed edges (unidirectional roadways), hav-

ing one connected component. Users routing from an

origin node i0to a destination node Dwould select a

path P= ((i0, i1),(i1, i2), ..., (in−2, in−1),(in−1,D)) by

minimizing their total travel time Pe∈P ℓe(xe), or alter-

native cost, where the edge ﬂow xerepresents the number

of users choosing edge eand ℓe(xe)is the corresponding

latency function. It is assumed that ℓeis monotonically

increasing with the edge ﬂow xe. We consider the limit

of a large number of users, each of which controls an in-

ﬁnitesimal fraction of the overall traﬃc, such that the

edge ﬂow xeis a continuous variable. This is termed the

non-atomic game setting [16]. The social cost is deﬁned

as the total travel time of all users H=Pe∈Exeℓe(xe).

As the equilibrium reached by the selﬁsh decisions of

users does not generally achieve the lowest social cost,

we seek to place tolls {τe}on edges to inﬂuence users’

route choices. Gauging the monetary penalty at the same

2

(a) (b)

Figure 1. (a) Top: a directed road network section with a

junction node iand four roadways. Bottom: the correspond-

ing factor graph representation, where the ﬂows adjacent to

junction node isatisfy the ﬂow conservation constraint; node

iis called a factor node and marked by a square. (b) Bilevel

MP for the toll-planning problem. Blue arrows indicate the

direction of messages. The lower and upper levels solve the

problem for the Nash equilibrium and social optimum, respec-

tively. The equilibrium ﬂow x∗

eis determined in the lower

level, while the toll τeis set in the upper level.

scale as latency, users will choose a path Pthat min-

imizes the combined total journey cost in latency and

tolls Pe∈P ℓe(xe) + τe. If tolls can be placed freely

on all edges, marginal cost pricing is known to induced

socially optimal ﬂow [18]. However, it is usually infeasi-

ble to set an unbounded toll on every road, which ren-

ders marginal cost pricing not applicable in practice. We

therefore consider restricted tolls 0≤τe≤τmax

e; an

edge eis not chargeable when τmax

e= 0. For simplicity,

we do not consider the income from tolls to contribute

to the social cost [26]. In total, Λiusers are traveling

from node ito a universal destination D, where the case

with multiple destinations is discussed in the supplemen-

tal material (SM) [27]. The resulting edge ﬂows satisfy

the non-negativity xe≥0and the ﬂow conservation con-

straints

Ri= Λi+X

e∈∂in

i

xe−X

e∈∂out

i

xe= 0,(1)

where ∂in

iand ∂out

iare the sets of incoming and outgoing

edges adjacent to node i. It has been established that the

edge ﬂows in user equilibrium can be obtained by mini-

mizing a potential function [28, 29] Φ = Pe∈Eφe(xe) :=

Pe∈E´xe

0ℓe(y)dysubject to the constraints of Eq. (1).

The lower-level optimization is a nonlinear min-cost

ﬂow problem, where edge ﬂows are coupled through the

conservation constraints in Eq. (1), represented as factor

nodes in Fig. 1(a). We employ the MP approach devel-

oped in [30] to tackle the nonlinear optimization prob-

lem. It turns the global optimization of the potential

(i.e., minxΦ(x)) into a local computation of the follow-

ing message functions

Φi→e(xe) = min

{xe′≥0}|Ri=0 X

e′∈∂i\eΦe′→i(xe′) + φe′(xe′),

(2)

where ∂i =∂in

i∪∂out

iand Φi→e(xe)relates to the opti-

mal potential function contributed by the ﬂows adjacent

to node iwhere the ﬂow on edge eis set to xe, taking into

account ﬂow conservation at node i. In Eq. (2), denoting

e′= (k, i), we can write Φe′→i(xe′) = Φk→e′(xe′); there-

fore only factor-to-variable messages are needed. The

message Φk→e′(xe′)can be obtained recursively by an

expression similar to Eq. (2), but using the incoming mes-

sages from its upstream edges {l→k|(l, k)∈∂k\i}. The

computation of messages involves only a few variables

when the network is sparse, which can be performed ef-

ﬁciently. Upon computing the messages iteratively un-

til convergence, we can determine the equilibrium ﬂow

x∗

eon edge e= (i, j)by minimizing the edgewise full

energy dictated by the nonlinear cost φe(xe)and mes-

sages from both ends of edge e, deﬁned as Φful l

e(xe) =

Φi→e(xe) + Φj→e(xe) + φe(xe).

This algorithm can be demanding when diﬀerent values

of xeare needed to determine the proﬁle of the message

Φi→e(xe). To reduce the computational cost, we approx-

imate the message near the working point ˜xi→eas

Φi→e(˜xi→e+εe)≈Φi→e(˜xi→e) + βi→eεe+1

2αi→eεe2,

(3)

where βi→eand αi→eare the ﬁrst and second deriva-

tives of Φi→eevaluated at ˜xi→e, assuming the deriva-

tives exist. The coeﬃcients βi→eand αi→eare updated

by solving Eq. (2) and by using {˜xk→e′, βk→e′, αk→e′|e′=

(k, i)∈∂i\e}, while the working point ˜xi→eis updated

by pushing it towards the minimizer x∗

eof the full energy

Φfull

e(xe)gradually. The resulting algorithm only requires

to maintain a few numbers rather than the full proﬁle of

Φi→e, making it tractable. It has been shown to work re-

markably well in many network ﬂow problems [31], while

the messages may oscillate and not converge in prob-

lems which possess non-smooth characteristics [30], in-

cluding in the routing games analyzed here. We discover

that the non-negativity constraints on ﬂows may result

in non-smooth message Φi→e(xe)(i.e., its ﬁrst derivative

is discontinuous) with at most one break point, which

makes the approximation of Eq. (3) inadequate. A sim-

ple solution is to approximate Φi→e(xe)by a continuous

and piecewise-quadratic function with at most one break

point, where each branch mis a quadratic function gov-

erned by three numbers {˜x(m)

i→e, β(m)

i→e, α(m)

i→e}. Taking into

account the non-smooth structures, the MP algorithms

converge well even in very loopy networks and provide

the correct ﬂows for routing game with single-level opti-

mization [27].

For bilevel optimization, we notice that the cost

function of the upper layer H(x)has a similar struc-

3

ture as the one of the lower layer. Therefore, one

can apply a similar MP procedure as Hi→e(xe) =

min{xe′}|Ri=0 Pe′∈∂i\e[He′→i(xe′) + xe′ℓe(xe′)]. The

message Hi→e(xe)can also be approximated by a

piecewise-quadratic function with at most one break

point, where each branch mhas the form H(m)

i→e(˜xi→e+

εe)≈H(m)

i→e(˜xi→e) + γ(m)

i→eεe+1

2δ(m)

i→eεe2. As the equi-

librium state is determined in the lower level, the same

working points {˜xi→e}in the lower-level MP are also used

for the upper level. The landscape of the edgewise full

cost Hfull

e(xe) = Hi→e(xe)+ Hj→e(xe) + xeℓ(xe)provides

the essential information for setting the toll. Speciﬁcally,

the toll is updated by minτeHfull

e(x∗

e(τe)), where the toll-

dependent equilibrium ﬂow x∗

eis provided by the lower-

level messages. The basic structure of such bilevel MP is

illustrated in Fig. 1(b), while the details of the message

updates are provided in the SM [27].

We demonstrate the eﬀectiveness of the proposed al-

gorithm in random regular graphs (RRG) in Fig. 2(a),

where we use an aﬃne latency model ℓe(xe) = te(1 +

sxe/ce), with teand ceas the free traveling time and edge

capacity and sas a sensitivity measure of latency to con-

gestion, which is commonly used in routing games and

traﬃc modeling [16]. Details of the parameter setting,

experiments on other network topologies and the cases

of multiple destinations are discussed in the SM [27].

Though the bilevel message-passing does not generally

converge to a set of unique optimal tolls due to the non-

convex nature of the problem, we found that the social

costs are reduced when tolls are updated during MP. The

scaling relation in Fig. 2(a) empirically indicates that the

number of updates is about O(|E|2)for achieving a given

cost. Moreover, the MP algorithm can be implemented

in a fully distributed manner, unlike the generic global

optimization approach [27]. In practice, it may be in-

feasible to charge every edge separately. So we consider

the problem of choosing a subset of chargeable edges and

then optimizing only the tolls on these edges, which is

an even more diﬃcult task than the toll-setting prob-

lem [32]. An empirical solution is to choose the edges

where the equilibrium ﬂows {xN

e}are much larger than

the socially optimal ﬂows {xS

e}, which appears to work

well as seen in Fig. 2(b). However, the social optimum

may be unknown a priori in some variants of toll-setting

or network-design problems [33]. So we also use the cri-

teria based on the potential reduction in edgewise full

cost Hfull

e(x∗

e)due to tolls, which also select eﬀectively

the chargeable links as seen in Fig. 2(b).

Flow Control – When objective functions at both layers

are extensive, as in the routing game, the MP algorithm

based on edgewise updates of tolls using localized infor-

mation turns out to be eﬀective. However, in some cases,

it may be necessary to consider the inﬂuence of the con-

trol variable updates on other locations of the networks.

To showcase this, we consider the problem of tuning net-

(a) (b)

Figure 2. Eﬀect of tolls on the fractional social cost reduc-

tion, deﬁned as (H(x∗(τ)) −HS)/(HN−HS)where HSand

HNrepresent the social costs at the social optimum and the

Nash equilibrium without tolls. Random regular graphs of

degree 3and aﬃne latency functions are used. Each data

point is the average of 10 diﬀerent network realizations. (a)

Fractional cost reduction during the bilevel MP updates for

diﬀerent system sizes, where each sweep consists of 40|E|local

MP steps and 100 edgewise toll updates in a random sequen-

tial schedule. A ﬁxed number of sweeps without toll updates

are performed to warm up the system. Inset: panel (a) with

x-axis as MP steps rescaled by |E|2. (b) Fractional cost re-

duction as a function of the fraction fof chargeable edges on

a random regular graph with N= 200. A random selection of

edges to be charged is compared with selections based on the

equilibrium ﬂows diﬀerence xN

e−xS

eand edgewise full cost

reduction Hfull

e(x∗

e).

work ﬂows to achieve certain functionality. In this exam-

ple, resources need to be transported from source nodes

to destination along edges in an undirected network

G(V, E ), where the equilibrium edge ﬂows {xij }minimize

the transportation cost C=P(i,j )∈E

1

2rij x2

ij , subject to

ﬂow conservation constraints similar to Eq. (1). The ma-

jor diﬀerence of this model from the routing games is

that the network is undirected, where edge (i, j)can ac-

commodate either the ﬂow from node jto ior from node

ito j. The objective is to control the network param-

eters {rij }in order to reduce or increase the ﬂows on

some edges. The task of reducing certain edge ﬂows has

applications in power grid congestion mitigation in the

direct-current (DC) approximation [34], where rij is re-

lated to reactance of edge (i, j ), controllable through de-

vices in a ﬂexible alternating current transmission system

(FACTS) [35]. On the other hand, the task of increasing

certain edge ﬂows has been used to model the tunability

of network functions, which is applicable in mechanical

and biological networks [36] as well as learning machines

in metamaterials [37].

As an example, we consider the task of ﬂow control

such that the relative increments of the magnitude of

ﬂows on the targeted set of edges Texceed a certain limit

θ[36], i.e., ρij =|xij|−|x0

ij|

|x0

ij|−θ≥0,∀(i, j )∈ T (with x0

ij

being the ﬂow prior to tuning). It can be achieved by

minimizing the hinge loss (upper-level objective) O=

P(i,j)∈T −ρij Θ(−ρij ), where Θ(·)is the Heaviside step

function. The task of congestion mitigation of ﬂows in

power grids can be studied similarly. We adopt the usual

4

MP algorithm to tackle the the lower-level optimization

problem, resulting in the local message functions

Ci→j(xij ) = min

{xki}|Ri=0 1

2rij x2

ij +X

k∈Ni\j

Ck→i(xki),

(4)

where Niis the set of neighboring nodes adjacent to node

i. The deﬁnition of the message function Ci→j(xij )dif-

fers from the one of Eq. (2) in that it includes the in-

teraction term on edge (i, j ), which yields a more con-

cise update rule in this problem. Similar to Eq. (3), we

approximate the message function by a quadratic form

Ci→j(xij ) = 1

2αi→j(xij −ˆxi→j)2+const, such that the lo-

cal optimization in Eq. (4) reduces to the computation of

the real-valued messages mi→j∈ {αi→j,ˆxi→j}by pass-

ing the upstream messages {mk→i}k∈Ni\j, as illustrated

on the left panel of Fig. 2(a) [27]. Upon convergence, the

equilibrium ﬂow x∗

ij can be obtained by minimizing the

edgewise full cost Cfull

ij (xij ) = Ci→j(xij) + Cj→i(xij )−

1

2rij x2

ij .

The variation of the control parameters {rij }will im-

pact on the messages {mi→j}, which in turn aﬀects the

equilibrium ﬂows x∗and therefore the upper-level objec-

tive O(x∗). Speciﬁcally, one considers the eﬀect of the

change of rij on the targeted edge ﬂows {x∗

pq}(p,q)∈T ,

derived by computing the gradient of the objective func-

tion with respect to the messages ∂O

∂mi→j. The targeted

edges provide the boundary conditions in the form of

∂O

∂mp→q=∂O

∂x∗

pq

∂x∗

pq

∂mp→q,∀(p, q)∈ T . As the messages

from node ito jare functions of the upstream mes-

sages, i.e., mi→j=mi→j({mk→i}k∈Ni\j), the gradients

on edge i→jare passed backward to its upstream edges

{k→i}k∈Ni\jthrough the chain rule, as illustrated in

the right panel of Fig. 2(a). The full gradient on a non-

targeted edge k→ican be obtained by summing the

gradients on its downstream edges, computed as

∂O

∂mk→i

=X

l∈Ni\kX

mi→l∈{αi→l,ˆxi→l}

∂O

∂mi→l

∂mi→l

∂mk→i

.(5)

The gradient messages {∂O

∂mk→i}are passed in a random

and asynchronous manner, resulting in a decentralized

algorithm.

The gradient with respect to the control parameter on

the non-targeted edge (k, i)can be obtained straightfor-

wardly as

∂O

∂rki

=X

m∈{α,ˆx}∂O

∂mk→i

∂mk→i

∂rki

+∂O

∂mi→k

∂mi→k

∂rki ,

(6)

which serves to update the control parameter in a gradi-

ent descent manner rki ←rki −s∂O

∂rki with certain step

size s. In this particular ﬂow model, the gradient ∂O

∂rki can

also be calculated exactly, leading to a global gradient

descent (GGD) algorithm. However, the GGD approach

(a) (b)

(c) (d)

Figure 3. Bilevel optimization in the ﬂow control applica-

tion. A random regular graph (with N= 200, degree 3) and

a square lattice of size 15 ×15 are used in the experiments.

The source node, destination node and the targeted edges are

randomly selected. The control parameters are bounded to be

rij ∈[0.9,1.1]. (a) Left: MP for solving the lower-level equi-

librium ﬂow problem. Right: computation of the gradients of

the upper-level objective function O, which are communicated

in the reversed direction. (b) Comparison of the gradients at

initial rcomputed by the MP approach (obtained by ﬁxing

rand passing messages {mi→j}and gradients {∂O

∂mi→j}) and

the GGD approach, with |T | = 5, θ = 0.1. Inset: mean square

error (MSE) of the gradients by the MP approach during the

message and gradient passing, in comparison to the GGD ap-

proach. The lower-level MP is performed until convergence

before passing the gradients. Each sweep consists of 4|E|local

MP steps in a random sequential schedule. (c) MP for min-

imizing the upper-level objective function Owith θ= 0.1,

where one randomly selected control parameter is updated

following the descent direction every 4|E|/10 steps. (d) Frac-

tion of successfully tuned cases (satisfying O= 0)Psuccess out

of 100 diﬀerent problem realizations with |T | = 5, as a func-

tion of the threshold θ. Each realization has a diﬀerent pair

of source and destination nodes.

requires computing the inverse of the Laplacian matrix

in every iteration, which can be time-consuming for large

networks. On the contrary, the gradients are computed

in a local and distributed manner in the MP approach.

Similar ideas of gradient propagation of MP have been

proposed in [38, 39] in the context of approximate infer-

ence, which are usually implemented in the reverse order

of MP updates and in a centralized manner, unlike the

decentralized approach presented here.

The gradient computed by the MP algorithm provides

an excellent estimation to the exact gradient for diﬀer-

ent types of networks, as illustrated in Fig. 2(b). In

bilevel optimization of ﬂow networks, we do not wait un-

til the convergence of the gradient passing, but update

the control parameters during the MP iterations in order

to make the algorithm more eﬃcient. It provides ap-

5

proximated gradient information, which is already eﬀec-

tive enough for the optimization of the global objective,

as shown in Fig. 2(c). The MP approach yields similar

success rates in managing the network ﬂows for diﬀer-

ent thresholds compared to the GGD approach as shown

in Fig. 2(d), demonstrating the eﬀectiveness of the MP

approach for the bilevel optimization.

In summary, we propose MP algorithms for solving

bilevel optimization in ﬂow networks, focusing on appli-

cations in the routing game and ﬂow control problems.

In routing games, the objective functions in both levels

admit a similar structure, which leads to two sets of sim-

ilar messages being passed in both levels. Updates of the

control variables based on localized information appear

to be eﬀective for toll optimization in this case. However,

the long-range impact of control variable changes should

be considered in some applications. This is accommo-

dated by a separate distributed gradient passing process,

which is shown to be eﬀective and computationally eﬃ-

cient for optimization in ﬂow control problems. Leverag-

ing the sparse network structures, the MP approach oﬀers

eﬃcient and intrinsically distributed algorithms in con-

trast to global optimization methods such as nonlinear

programming, although the latter is generic and applica-

ble to many other problems, but is not always scalable.

The MP approach could provide eﬀective algorithms for

bilevel optimization problems that are intractable or dif-

ﬁcult to solve by global optimization approaches. For

instance, the MP algorithms can be easily extended to

atomic routing games where network ﬂows are discrete

variables [40] and are diﬃcult to solve via nonlinear pro-

gramming. This could potentially be done through the

techniques of [41–43]. We believe that these MP methods

provide a valuable element in the toolbox for solving dif-

ﬁcult bilevel optimization problems, especially in systems

with sparsely-coupled structures.

We thank K. Y. Michael Wong, Chi Ho Yeung and

Tat Shing Choi for helpful discussions. B.L. and D.S.

acknowledge support from the Leverhulme Trust (RPG-

2018-092), European Union’s Horizon 2020 research and

innovation programme under the Marie Skłodowska-

Curie Grant Agreement No. 835913. D.S. acknowledges

support from the EPSRC programme grant TRANSNET

(EP/R035342/1).

[1] A. Sinha, P. Malo, and K. Deb, A review on bilevel opti-

mization: From classical to evolutionary approaches and

applications, IEEE Transactions on Evolutionary Com-

putation 22, 276 (2018).

[2] J. G. Wardrop, Some theoretical aspects of road traﬃc

research., Proceedings of the Institution of Civil Engi-

neers 1, 325 (1952).

[3] M. Plischke and B. Bergersen, Equilibrium Statistical

Physics, 3rd ed. (WORLD SCIENTIFIC, Singapore,

2006).

[4] W. Thomson and P. G. Tait, Treatise on Natural Phi-

losophy, 2nd ed., Cambridge Library Collection - Math-

ematics, Vol. 1 (Cambridge University Press, 2009).

[5] P. G. Doyle and J. L. Snell, Random Walks and Electric

Networks (Mathematical Association of America, 1984).

[6] E. T. Jaynes, Information theory and statistical mechan-

ics, Phys. Rev. 106, 620 (1957).

[7] K. Murphy, Machine Learning: A Probabilistic Perspec-

tive, Adaptive Computation and Machine Learning series

(MIT Press, Cambridge, Massachusetts, 2012).

[8] B. Colson, P. Marcotte, and G. Savard, An overview of

bilevel optimization, Annals of Operations Research 153,

235 (2007).

[9] R. G. Jeroslow, The polynomial hierarchy and a simple

model for competitive analysis, Mathematical Program-

ming 32, 146 (1985).

[10] P. Hansen, B. Jaumard, and G. Savard, New branch-

and-bound rules for linear bilevel programming, SIAM

Journal on Scientiﬁc and Statistical Computing 13, 1194

(1992).

[11] J. F. Bard and J. E. Falk, An explicit solution to the

multi-level programming problem, Computers & Opera-

tions Research 9, 77 (1982).

[12] J. F. Bard, Convex two-level optimization, Mathematical

Programming 40-40, 15 (1988).

[13] C. D. Kolstad and L. S. Lasdon, Derivative evaluation

and computational experience with large bilevel mathe-

matical programs, Journal of Optimization Theory and

Applications 65, 485 (1990).

[14] G. Savard and J. Gauvin, The steepest descent direction

for the nonlinear bilevel programming problem, Opera-

tions Research Letters 15, 265 (1994).

[15] M. Patriksson, The traﬃc assignment problem : models

and methods (Dover Publications, Inc, New York, 2015).

[16] T. Roughgarden, Selﬁsh routing and the price of anarchy

(MIT Press, Cambridge, Massachusetts, 2005).

[17] M. J. Beckmann, C. B. McGuire, and C. B. Winsten,

Studies in the Economics of Transportation (Yale Uni-

versity Press, New Haven, 1956).

[18] M. Smith, The marginal cost taxation of a transportation

network, Transportation Research Part B: Methodologi-

cal 13, 237 (1979).

[19] R. Cole, Y. Dodis, and T. Roughgarden, How much can

taxes help selﬁsh routing?, Journal of Computer and Sys-

tem Sciences 72, 444 (2006), network Algorithms 2005.

[20] S. Çolak, A. Lima, and M. C. González, Understanding

congested travel in urban areas, Nature Communications

7, 10793 (2016), article.

[21] Electronic road pricing,

https://web.archive.org/web/20110605101108/http://www.lta.gov.sg/motoring_matters/index_motoring_erp.htm

(2019).

[22] N. Barak, Israel tries battling traﬃc

jams with cash handouts, ISRAEL21c,

https://www.israel21c.org/israel-tries-battling-traffic-jams-with-cash- handouts/

(2019).

[23] J. Alonso-Mora, S. Samaranayake, A. Wallar, E. Fraz-

zoli, and D. Rus, On-demand high-capacity ride-sharing

via dynamic trip-vehicle assignment, Proceedings of the

National Academy of Sciences 114, 462 (2017).

[24] S. Lim, H. Balakrishnan, D. Giﬀord, S. Madden, and

D. Rus, Stochastic motion planning and applications to

traﬃc, The International Journal of Robotics Research

30, 699 (2011).

6

[25] B. Li, D. Saad, and A. Y. Lokhov, Reducing urban traﬃc

congestion due to localized routing decisions, Phys. Rev.

Research 2, 032059 (2020).

[26] G. Karakostas and S. G. Kolliopoulos, The eﬃciency

of optimal taxes, in Combinatorial and Algorithmic As-

pects of Networking, edited by A. López-Ortiz and A. M.

Hamel (Springer Berlin Heidelberg, Berlin, Heidelberg,

2005) pp. 3–12.

[27] See Supplemental Material for details, which includes

Refs [44–47].

[28] D. Monderer and L. S. Shapley, Potential games, Games

and Economic Behavior 14, 124 (1996).

[29] H. Bar-Gera, Origin-based algorithm for the traﬃc

assignment problem, Transportation Science 36, 398

(2002).

[30] K. Y. M. Wong and D. Saad, Inference and optimiza-

tion of real edges on sparse graphs: A statistical physics

perspective, Phys. Rev. E 76, 011115 (2007).

[31] K. Y. M. WONG, D. SAAD, and C. H. YEUNG, Dis-

tributed optimization in transportation and logistics net-

works, IEICE Transactions on Communications E99.B,

2237 (2016).

[32] M. Hoefer, L. Olbrich, and A. Skopalik, Taxing subnet-

works, in Internet and Network Economics, edited by

C. Papadimitriou and S. Zhang (Springer Berlin Heidel-

berg, Berlin, Heidelberg, 2008) pp. 286–294.

[33] A. Migdalas, Bilevel programming in traﬃc planning:

Models, methods and challenge, Journal of Global Op-

timization 7, 381 (1995).

[34] A. Wood, B. Wollenberg, and G. Sheblé, Power Gen-

eration, Operation, and Control (Wiley, Hoboken, New

Jersey, 2013).

[35] X.-P. Zhang, C. Rehtanz, and B. Pal, Flexible AC

Transmission Systems: Modelling and Control (Springer,

Berlin, Heidelberg, 2006).

[36] J. W. Rocks, H. Ronellenﬁtsch, A. J. Liu, S. R. Nagel,

and E. Katifori, Limits of multifunctionality in tunable

networks, Proceedings of the National Academy of Sci-

ences 116, 2506 (2019).

[37] M. Stern, D. Hexner, J. W. Rocks, and A. J. Liu, Su-

pervised learning in physical networks: From machine

learning to learning machines, Phys. Rev. X 11, 021045

(2021).

[38] F. Eaton and Z. Ghahramani, Choosing a variable to

clamp, in Proceedings of the Twelth International Confer-

ence on Artiﬁcial Intelligence and Statistics, Proceedings

of Machine Learning Research, Vol. 5, edited by D. van

Dyk and M. Welling (PMLR, Hilton Clearwater Beach

Resort, Clearwater Beach, Florida USA, 2009) pp. 145–

152.

[39] J. Domke, Learning graphical model parameters with ap-

proximate marginal inference, IEEE Transactions on Pat-

tern Analysis and Machine Intelligence 35, 2454 (2013).

[40] R. W. Rosenthal, The network equilibrium problem in

integers, Networks 3, 53 (1973).

[41] C. H. Yeung and D. Saad, Competition for shortest paths

on sparse graphs, Phys. Rev. Lett. 108, 208701 (2012).

[42] C. H. Yeung, D. Saad, and K. Y. M. Wong, From the

physics of interacting polymers to optimizing routes on

the london underground, Proceedings of the National

Academy of Sciences 110, 13717 (2013).

[43] H. F. Po, C. H. Yeung, and D. Saad, Futility of being

selﬁsh in optimized traﬃc, Phys. Rev. E 103, 022306

(2021).

[44] J. D. Garcia, G. Bodin, davide f, I. Fiske, M. Be-

sançon, and N. Laws, joaquimg/bileveljump.jl: v0.4.1,

https://doi.org/10.5281/zenodo.4556393 10.5281/zen-

odo.4556393 (2021).

[45] C. H. Yeung and K. Y. M. Wong, Optimal location of

sources in transportation networks, Journal of Statisti-

cal Mechanics: Theory and Experiment 2010, P04017

(2010).

[46] P. Rebeschini and S. Tatikonda, A new approach to

laplacian solvers and ﬂow problems, Journal of Machine

Learning Research 20, 1 (2019).

[47] G. H. Golub and V. Pereyra, The diﬀerentiation of

pseudo-inverses and nonlinear least squares problems

whose variables separate, SIAM Journal on Numerical

Analysis 10, 413 (1973).

arXiv:2108.00960v1 [math.OC] 2 Aug 2021

Bilevel Optimization in Flow Networks – A Message-passing Approach

– Supplemental Material

Bo Li1and David Saad1

1Non-linearity and Complexity Research Group, Aston University, Birmingham, B4 7ET, United Kingdom

I. MESSAGE-PASSING ALGORITHMS FOR ROUTING GAME

In this section, we provide details of the message-passing (MP) algorithm for the routing game in road networks,

modeled by a directed graph G(V, E ).

A. Problem Setting and Notation

A directed edge ein the directed graph is represented by an ordered tuple e= (i, j ), where node iis the head and

node jis the tail of edge e, i.e., i=h(e), j =t(e). Note that there can be at most two directed edges connecting node

iand node j, i.e., e= (i, j ), e′= (j, i).

We write the set of incoming edges to node ias ∂in

i={e|e∈E, t(e) = i}, the set of outgoing edges from node ias

∂out

i={e|e∈E, h(e) = i}and the set of edges adjacent to node ias ∂i =∂in

i∪∂out

i. For convenience, we deﬁne the

incident operator B:E→V, with matrix elements

Bi,e =

1,if e∈∂in

i

−1,if e∈∂out

i

0,otherwise.

(1)

Consider the scenario where all users travel to a universal destination D. The edge ﬂows resulting from users’ path

choices satisfy the ﬂow conservation constraints,

Ri:= Λi+X

e∈∂in

i

xe−X

e∈∂out

i

xe

= Λi+X

e∈∂i

Bi,exe= 0,∀i6=D,(2)

and the non-negativity constraints

xe≥0,∀e. (3)

Due to the ﬂow conservation constraint, any resource on a leaf node iwith only one outgoing edge (i.e., |∂out

i|=

1,|∂in

i|= 0) must be transmitted to its only neighboring node j. Similarly, if a leaf node iwith only one incoming

edge (i.e., |∂in

i|= 1,|∂out

i|= 0) is the destination node, then traﬃc must ﬁrst arrive at its only neighboring node j,

and then go through the edge (j, i)to the destination. In the former case, one can remove the leaf node iand add Λi

resources to its neighboring node j. In the latter, one can simply set node jas the destination. By preprocessing the

network using the above reduction, we can reduce the network to have no leaf nodes.

Denoting ℓe(xe)as the latency function on edge e, the Wardrop equilibrium can be obtained by minimizing the

following potential function

Φ(x) = X

e∈EZxe

0

ℓe(y)dy=: X

e∈E

φe(xe),(4)

subject to the ﬂow conservation (2) and non-negativity constraints (3). In the presence of tolls {τe}on edges, one

replaces the latency function ℓe(xe)in Eq. (4) by ℓτ

e(xe) = ℓe(xe) + τe, assuming the same gauge between latency and

toll can be used for all users (more precisely ℓτ

e(xe) = ℓe(xe) + χτewith χbeing a certain coeﬃcient converting money

to time which is set to one in some appropriate unit).

On the other hand, the social cost is deﬁned as

H(x) = X

e∈E

xeℓe(xe) =: X

e∈E

σe(xe),(5)

where the corresponding minimizer is the social optimum. Tolls are not assumed to contribute to the social cost H(x).

2

B. MP Equations for Smooth Message Functions

The MP equation for minimizing the potential Φ(x)reads

Φi→e(xe) = min

{xe′≥0}|Ri=0 X

e′∈∂i\eΦe′→i(xe′) + φe′(xe′),(6)

where the message function Φi→e(xe)is called the cavity energy in the jargon of statistical physics. Denoting e= (i, j)

and e′= (k, i), we can write Φe′→i(xe′) = Φk→e′(xe′).

In this framework, one needs to keep track of the proﬁle of the message functions Φi→e(xe), which is only practical

if they are restricted to a certain family of functions and are easy to optimize. One can approximate the message

function Φi→e(xe)by its series expansion around the working point ˜xi→e[1]

Φi→e(xe) = Φi→e(˜xi→e+εe)

≈Φi→e(˜xi→e) + βi→e|˜xi→e·εe+1

2αi→e|˜xi→e·εe2,(7)

where βi→eand αi→eare the ﬁrst and second derivatives of Φi→eevaluated at the working point ˜xi→e, assuming

the message function Φi→e(xe)is smooth in the vicinity of ˜xi→e. The MP equations have been derived in [1] for

undirected ﬂow networks. Here, we extend it to directed graph with non-negativity ﬂow constraints.

Similarly, the interaction term φe′(xe′)is also approximated as φe′(xe′)≈φe′(˜xi→e)+φ′

e′(˜xk→e′)εe+1

2φ′′

e′(˜xk→e′)εe2.

To solve the local optimization problem in Eq. (6) over the variables on edges {k→e′|e′∈∂i\e}, we introduce the

Lagrangian

Li→e=X

e′∈∂i\e1

2αk→e′εe′2+βk→e′εe′+1

2φ′′

e′(˜xk→e′)εe′2+φ′

e′(˜xk→e′)εe′

+µi→eRi+X

e′∈∂i\e

λe′(˜xk→e′+εe′),(8)

where µi→eand λe′are the Lagrange multipliers for the ﬂow conservation constraint Ri= 0 and ﬂow non-negativity

constraint xe′≥0, respectively. Solving the extremum equation ∂Li→e

∂εe′= 0 gives

ε∗

e′(µi→e) = max −1

αk→e′+φ′′

e′µi→eBi,e′+φ′

e′+βk→e′,−˜xk→e′,(9)

and the corresponding optimal cavity ﬂow is

x∗

k→e′(µi→e) = ˜xk→e′+ε∗

e′(µi→e) = max ˜xk→e′−µi→eBi,e′+φ′

e′+βk→e′

αk→e′+φ′′

e′

,0.(10)

The Lagrange multiplier (or the dual variable) µi→eneeds to satisfy

Ri→e(µi→e;xe) := X

e′∈∂i\e

Bi,e′x∗

k→e′(µi→e) + Bi,exe+ Λi= 0.(11)

The function Ri→e(µ;xe)is a non-increasing piece-wise linear function of µ. To determine the value of µ∗

i→eat the

optimum, we need to ﬁnd the root of Ri→e(µ;xe), which can be done in ﬁnite steps by following the breakpoints of

the piece-wise linear function Ri→e(µ;xe). Upon obtaining the optimal dual variable µ∗

i→e, the messages βi→eand

αi→eare calculated by

βi→e=∂Φ∗

i→e(xe)

∂xe

=∂L∗

i→e

∂xe

=Bi,eµ∗

i→e,(12)

αi→e=∂2Φ∗

i→e(xe)

∂x2

e

=Bi,e

∂µ∗

i→e

∂xe

=Bi,e∂xe

∂µ µ=µ∗

i→e−1

=−∂

∂µ X

e′∈∂i\e

Bi,e′x∗

k→e′(µ)µ=µ∗

i→e−1

=X

e′∈∂i\e

1

αk→e′+φ′′

e′

Θαk→e′+φ′′

e′˜xk→e′−µ∗

i→eBi,e′+φ′

e′+βk→e′−1

,(13)

3

where Θ(·)is the Heaviside step function. The shadow price interpretation of Lagrangian multiplier has been used

in Eq. (12) and the inverse function theorem has been used in Eq. (13). In the implementation of the algorithm, we

take xe= ˜xi→ein solving Eq. (11).

1. Destination node D

There are two ways to treat the destination node D:

• Method I: Since the destination node Dhas no constraint, it will absorb all incoming ﬂows (like a grounded node

in an electric circuit). So it has no preference for network ﬂows of the incident edges, such that ΦD→e(xe) = 0

and

αD→e= 0, βD→e= 0.(14)

• Method II: Alternatively, one can set an explicit constraint on the ﬂows to the destination node D

RD:= ΛD+X

e∈∂D

BD,exe= 0,(15)

where ΛD=−Pi6=DΛi. Then the messages from the destination node Dare calculated in the same way as

other nodes given by Eqs. (12) and (13).

Method I is used for the experiments of routing games in the main text.

2. Working Points

We also need a scheme to update the the working points {˜xi→e}at which the messages {αi→e, βi→e}are deﬁned.

Here, we suggest to update the working point ˜xi→esuch that it gets closer to the equilibrium ﬂow x∗

e[1]

x∗

e=arg min

xe≥0Φi→e(xe) + Φj→e(xe) + φe(xe)

=arg min

xe≥01

2αi→e+1

2φ′′

e(˜xi→e)xe−˜xi→e2+βi→e+1

2φ′

e(˜xi→e)xe−˜xi→e

+1

2αj→e+1

2φ′′

e(˜xj→e)xe−˜xj→e2+βj→e+1

2φ′

e(˜xj→e)xe−˜xj→e

= max αi→e+1

2φ′′

i→e˜xi→e+αj→e+1

2φ′′

j→e˜xj→e−(βi→e+βj→e+1

2φ′

i→e+1

2φ′

j→e)

αi→e+αj→e+1

2φ′′

i→e+1

2φ′′

j→e

,0.(16)

Furthermore, a learning rate sis applied to update the working point

˜xnew

i→e←sx∗

e+ (1 −s)˜xold

i→e,(17)

such that ˜xi→edoes not jump too drastically; otherwise the messages αi→eand βi→ewill approximate the curvature

and slope of the message function Φi→e(xe)less precisely.

C. Non-Smooth Message Functions

1. Qualitative Picture

The MP algorithms in Sec. I B work well if the smoothness assumption of the message function Φi→e(xe)holds.

However, it is not always the case in the routing game problem, where the non-smoothness is induced by the non-

negativity constraints of Eq. (3). Direct implementation of the MP algorithms in Sec. I B leads to oscillations of the

messages when the traﬃc patterns are sparse. In fact, similar non-convergence phenomena have been noticed in the

system with a non-smooth energy function [1].

To better understand this phenomenon, we examine Ri→e(µ;xe)as a function of the Lagrange multiplier µin

Eq. (11), of which the root µ∗(satisfying Ri→e(µ∗, xe) = 0) will determine βi→eand αi→ein Eqs. (12) and (13).

4

(a) (b)

Figure 1. (a) The net resource Ri→e(µ;x)deﬁned in Eq. (11) is a non-increasing piecewise-linear function of the Lagrange

multiplier µ. Cases (i) and (ii) correspond to edges e′incoming to node i(Bi,e′= 1), while case (iii) corresponds to edge e′

outgoing of node i(Bi,e′=−1). (b) The roots of Ri→e(µ;xe)in the vicinity of xe=xand xe=y. It is assumed that edge eis

an outgoing edge of node i(with Bi,e =−1) such that ﬁnding the root of Ri→e(µ;xe)is equivalent to solving Ri→e(µ; 0) = xe.

For inﬁnitesimal ǫ, if the ﬂow xechanges from x+ǫto x−ǫ, the solution of the Lagrange multiplier changes continuously from

µ∗

x+ǫto µ∗

x−ǫ. On the other hand, there is a plateau at Ri→e(µ; 0) = y, so that when the ﬂow xechanges from y+ǫto y−ǫ,

the solution of the Lagrange multiplier changes discontinuously from µ∗

y+ǫto µ∗

y−ǫ.

(a) (b)

Figure 2. (a) Smooth message function Φi→e(xe), corresponding to xe=xin Fig. 1(b). (b) Non-smooth message function

Φi→e(xe)with one breakpoint, corresponding to xe=yin Fig. 1(b), where the ﬁrst and second derivatives of Φi→e(xe)are

discontinuous near xe=y.

The function Ri→e(µ;xe)is non-increasing piecewise-linear function as illustrated in Fig. 1(a). Assuming edge eis

an outgoing edge of node i(with Bi,e =−1), ﬁnding the root of Ri→e(µ;xe)is equivalent to solving Ri→e(µ; 0) = xe.

Consider the conﬁguration in Fig. 1(b), where the solution of Ri→e(µ; 0) = yoccurs at a plateau, such that the

solution (denoted as µ∗

y) is degenerate; when the ﬂow xechanges inﬁnitesimally from y+ǫto y−ǫ, the solution of

the Lagrange multiplier changes discontinuously from µ∗

y+ǫto µ∗

y−ǫ. In this case, the slope βi→eof the cavity energy

Φi→e(xe)changes discontinuously from xe=y+ǫto xe=y−ǫ, while the curvature αi→eis ill-deﬁned at xe=y. The

proﬁles of the message function Φi→e(xe)in the smooth and non-smooth cases are illustrated in Fig. 2. If the normal

messages {βi→e, αi→e}are used when the message function Φi→e(xe)is non-smooth, the solution will be jumping

between the two branches, resulting in non-convergence behaviors of the MP algorithms as observed in Ref. [1].

2. Criteria for Non-smooth Message Function

As mentioned above, the message function Φi→e(xe)is non-smooth if the solution of µin Eq. (11) is degenerate. This

occurs if the optimal ﬂow x∗

k→e′(µi→e)of all descendant edges e′∈∂i\eare inactive, i.e., lying in the zero branch of the

function in Eq. (10); when Bi,exe+ Λi= 0, the ﬂow conservation equation Ri→e(µ;xe) = Pe′∈∂i\eBi,e′x∗

k→e′(µ) = 0

has degenerate solutions. In this case, all the resources Λiare transmitted along edge e, while the ﬂows on all other

edges ∂i\eadjacent to node iare idle. When Λi>0, edge i→eis a leaf in the subgraph with edges holding non-zero

ﬂows, therefore we call edge i→ea primary eﬀective leaf in such cases. Since Λi≥0and xe≥0, only the out-going

edge i→efrom node i(with Bi,e =−1) can be a primary eﬀective leaf.

5

Figure 3. Illustration of the eﬀective leaf edges. Arrows with dashed lines correspond to edges with zero optimal cavity ﬂow

x∗(µ∗)in the MP calculation (expression given in Eq. (10)) of one of its downstream edges. Edge i→eis a primary eﬀective

leaf (assuming Λi>0), as all the upstream optimal ﬂows are zero. Edge j→e′is a general eﬀective leaf, as its upstream edges

are either eﬀective leaves or attain zero optimal cavity ﬂows. Edge k→e′′ is a non-eﬀective leaf, as its upstream edge l→e′′′

is a non-eﬀective leaf and it has a non-zero optimal ﬂow x∗

l→e′′′ (µ∗

l→e′′′ )>0.

The leaf state can also propagate from primary eﬀective leaves to downstream edges. We deﬁne edge i→eto be

a general eﬀective leaf if and only if ∀e′∈∂i\e, either (i) the optimal ﬂow x∗

k→e′(µ∗

i→e) = 0 in Eq. (10) or (ii) edge

k→e′is a general eﬀective leaf. A primary leaf is by default a general eﬀective leaf. If an edge i→eis an eﬀective

leaf, we denote fi→e= 1, otherwise fi→e= 0. An example of eﬀective leaf conﬁgurations is shown in Fig 3.

It can be proved by contradiction that only out-going edges i→ewith Bi,e =−1can be general eﬀective leaves under

the condition Λi≥0. The set of eﬀective leaves in the upstream of edge i→eis ELi→e={e′|e′∈∂i\e, fk→e′= 1},

while the set of non-eﬀective leaves is NE Li→e={e′|e′∈∂i\e, fk→e′= 0}.

Since there is at most one plateau in the function Ri→e(µ;x), the cavity message Φi→e(xe)has at most one

breakpoint. For an eﬀective leaf edge i→e, we always use the breakpoint of the message function (denoted as ˜xb

i→e)

as the working point, such that Φi→e(xe)has the following expression

Φi→e(xe) = (1

2αL

i→e(xe−˜xb

i→e)2+βL

i→e(xe−˜xb

i→e) + Ei→e(˜xb

i→e)x < ˜xb

i→e,

1

2αR

i→e(xe−˜xb

i→e)2+βR

i→e(xe−˜xb

i→e) + Ei→e(˜xb

i→e)x > ˜xb

i→e.(18)

For a primary eﬀective leaf edge i→e, the breakpoint is ˜xb

i→e= Λi. For a general eﬀective leaf edge i→e, the

breakpoint is most likely (but not always) located at the value of eﬀective resource deﬁned as

Λeﬀ

i→e:= Λi+X

e′∈ELi→e

Bi,e′˜xb

i→e.(19)

D. MP Equations for Non-smooth Message Functions

The MP equations for non-smooth message functions can be obtained with the information on eﬀective leaf status

of upstream edges, where one replaces the quadratic expansion Φi→e(˜xi→e+εe)≈Φi→e(˜xi→e) + βi→eεe+1

2αi→e(εe)2

by the piecewise quadratic counterpart in Eq. (18) when edge i→eis determined to be an eﬀective leaf, and the

double-sided message parameters {αL

i→e, βL

i→e, αR

i→e, βR

i→e}are maintained and passed to its downstream edges.

For updating the messages, the computation of min{xe′≥0}|Ri=0 Pe′∈∂i\eΦk→e′(xe′) + φe′(xe′)can be tedious if

there are multiple eﬀective leaf edges in {k→e′|e′∈∂i\e}, where one needs to solve for a quadratic optimization

of every case, where one branch of each non-smooth message function is selected each time (there are 2|ELi→e|such

cases in total). To simplify this process, we propose to ﬁrstly ﬁx the ﬂow xe′of eﬀective leaves ELi→eto be their

breakpoints ˜xb

k→e′and then optimize non-eﬀective leaf edges N ELi→e

min

{εe′|e′∈NE Li→e}X

e′∈NE Li→eΦk→e′(˜xk→e′+εe′) + φ(˜xk→e′+εe′),(20)

6

s. t. 0 = X

e′∈NE Li→e

Bi,e′˜xk→e′+εe′+X

e′∈ELi→e

Bi,e′˜xb

k→e′+Bi,exe+ Λi

=X

e′∈NE Li→e

Bi,e′˜xk→e′+εe′+ Λeﬀ

i→e+Bi,exe,(21)

0≤˜xk→e′+εe′.(22)

We then perturb the optimal solution by perturbing some of the ﬂows xe′of upstream edges e′∈∂i\eby an

inﬁnitesimal amount dxas xe′=x∗

k→e′+ηe′dxfor non-eﬀective leaves and xe′= ˜xb

k→e′+ηe′dxfor eﬀective leaves

with ηe′= 0,±1. For non-smooth message function, ηe′=−1and ηe′= 1, corresponding to the left and the right

branch of Φk→e′(xe′), respectively. To obey the ﬂow conservation constraint Ri= 0, the perturbation coeﬃcient ηe′

must satisfy Pe′∈∂i\eBi,e′ηe′= 0.

If the perturbation conﬁguration {η∗

e′}leading to the lowest energy of Pe′∈∂i\eΦe′→i(xe′) + φe′(xe′)reduces the

outcome of Eq. (20), we need to consider adding the eﬀective leafs k→e′with η∗

e′6= 0 as active optimization variables

in addition to the non-eﬀective leafs. Speciﬁcally, we deﬁne ηactive ={e′|e′∈ELi→e, η ∗

e′6= 0}, and proceed to solve

min

{εe′|e′∈NE Li→e∪ηactive}X

e′∈NE Li→e∪ηactive Φk→e′(˜xk→e′+εe′) + φ(˜xk→e′+εe′),(23)

s. t. 0 = X

e′∈NE Li→e∪ηactive

Bi,e′˜xk→e′+εe′

+X

e′∈ELi→e\ηactive

Bi,e′˜xb

k→e′+Bi,exe+ Λi,(24)

0≤˜xk→e′+εe′,(25)

where we use Φk→e′=1

2αL

k→e′ε2

e′+βL

k→e′εe′if η∗

e′=−1and use Φk→e′=1

2αR

k→e′ε2

e′+βR

k→e′εe′if η∗

e′= 1.

The primal and dual variables in the optimum satisfy

x∗

k→e′(µ) = ˜xk→e′+ε∗

e′(µ) = max ˜xk→e′−µBi,e′+φ′

e′+βk→e′

αk→e′+φ′′

e′

,0,(26)

Ri→e(µ;xe) = X

e′∈NE Li→e∪ηactive

Bi,e′x∗

k→e′(µ)

+X

e′∈ELi→e\ηactive

Bi,e′˜xb

k→e′+Bi,exe+ Λi= 0.(27)

If the solution µ∗in Eq. (27) is non-degenerate, we have

βi→e=Bi,eµ∗,(28)

αi→e=X

e′∈NE Li→e∪ηactive

1

αk→e′+φ′′

e′

×Θαk→e′+φ′′

e′˜xk→e′−µ∗Bi,e′+φ′

e′+βk→e′−1

.(29)

On the other hand, if the solution µ∗is degenerate, we need to consider xe= ˜xi→e−dxto solver for βL

i→e, αL

i→e, and

consider xe= ˜xi→e+ dxto solver for βR

i→e, αR

i→e.

It can also be shown that βR

i→e> βL

i→eand the non-smooth message function Φi→e(xe)is convex.

1. Update of the Working Points

If the message function Φi→e(xe)is non-smooth, we would like to bring the working point ˜xi→eto the vicinity of

the breakpoint of the two branches. To determine whether an edge i→eis an eﬀective leaf, we perform the following

7

procedure: We check the two following criteria: (i) each edge k→e′in the upstream edge set ∂ i\esatisﬁes either

fk→e′= 1 or ˜xk→e′= 0; (ii) the diﬀerence between the current working point and the eﬀective resource |˜xi→e−Λeﬀ

i→e|is

smaller than some threshold (Λeﬀ

i→eis deﬁned in Eq. (19)). If both criteria (i) and (ii) are met, then we use the eﬀective

resource as the working point ˜xi→e= Λeﬀ

i→e, and perform the optimization min{xe′≥0}|Ri=0 Pe′∈∂ i\eΦk→e′(xe′) +

φe′(xe′); if it results in degenerate solutions of the Lagrangian multiplier µ∗for the ﬂow conservation constraint, then

edge i→eis determined as an eﬀective leaf and the double-sided messages {βL

i→e, αL

i→e, βR

i→e, αR

i→e}are computed.

Otherwise, edge i→eis a non-eﬀective leaf and the normal messages {βi→e, αi→e}are recorded.

If criteria (i) and (ii) are not met, we use the current value of the working point ˜xi→eto solve for the messages.

Similarly, if the optimization leads to degenerate solutions of µ∗, then edge i→eis determined as an eﬀective leaf.

Otherwise, edge i→eis a non-eﬀective leaf.

Similar to the case of smooth message functions in Sec. I B, the working point is updated as

˜xnew

i→e←sx∗

e+ (1 −s)˜xold

i→e,(30)

where x∗

e=arg minxe≥0Φi→e(xe) + Φj→e(xe) + φe(xe).

E. Results of MP Algorithm for Routing Game

In this section, we report results of the MP algorithms described above. Taking into account the possible non-smooth

structure of message functions Φi→e(xe), the MP algorithm converges well for various types of graphs and resource

distributions. We demonstrate the eﬀectiveness of the algorithm in Figures 4 and 5, where random regular graphs

and small-world networks are considered. The small-world networks are obtained by rewiring square lattices with

randomly chosen shortcut edges [2]. In Fig. 4, the ﬂows adjacent to the destination node Dare unconstrained (Method

I in Sec. I B 1). The MP algorithms converge to the correct equilibrium ﬂows x∗, and the empirical complexity for

computing the equilibrium ﬂows up to a certain error |xMP −x∗|is roughly O(|E|2).

In Fig. 5, we use Method II in Sec.I B 1, i.e., we put an explicit constraint to the ﬂows adjacent to the destination

node Das

RD= ΛD+X

e∈∂D

BD,exe= 0,(31)

where ΛD=−Pi6=DΛi. In this approach, the MP algorithms converge much faster; the empirical complexity for

computing the equilibrium ﬂows up to a certain error |xMP −x∗|is roughly O(|E|). However, there exists some

networks where MP with Method II does not converge, while MP with Method I converges successfully. For the

experiments in the main text, we use Method I to treat the destination node for its better convergence properties.

F. Extension to The Case of Multiple Destination

The case of multiple destinations can be studied similarly. The traﬃc ﬂows can be classiﬁed into diﬀerent classes

according to their destinations. Let Nddenotes the number of destinations, and xa

edenote the ﬂow on edge etargeted

at the a-th destination (or the a-th class), the lower-level optimization problem (for solving equilibrium ﬂows) is

deﬁned as

Φ(x) = X

e∈EZPaxa

e

0

ℓe(y)dy=X

e∈E

φeNd

X

a=1

xa

e,(32)

s.t. Ra

i:= Λa

i+X

e∈∂i

Bi,exa

e= 0,∀i, a, (33)

xe≥0,∀e, a. (34)

To accommodate the nonlinear interactions of ﬂows of diﬀerent classes {xa

e}in φe(Paxa

e), we adopt a coordinate-

descent like approach in the MP algorithm as follows. In the treatment of the ﬂows of class a, we ﬁx the ﬂows of

8

(a) (b)

(c) (d)

Figure 4. Convergence of the MP algorithm for routing games in networks to the equilibrium ﬂows x∗. Random regular graphs

of degree 3are considered in (a)(c), while small-world networks obtained by rewiring square lattices with randomly chosen

shortcut edges (rewiring probability prw = 0.05) are considered in (b)(d), respectively. The ﬂows adjacent to the destination

node Dare unconstrained (Method I in Sec. I B 1), reminiscent of a grounded node in electric circuits. In (a)(b), the MP process

for speciﬁc problem realizations is illustrated. Each sweep comprises 40|E|local MP updates. Panels (c)(d) are obtained by

averaging over 10 problem realizations, and rescaling the MP runtime by |E|2.

(a) (b)

Figure 5. Same setting as in Fig. 4, except that the ﬂows adjacent to the destination node Dexplicitly obey the constraint

RD= ΛD+Pe∈∂DBD,exe= 0, where ΛD=−Pi6=DΛi(Method II in Sec. I B 1). Each sweep comprises of 40|E|local MP

updates.

other classes {xb

e}b6=ato their working points and compute the messages of class aas

Φa

i→e(xa

e) = min

{xa

e′≥0}|Ra

i=0 X

e′∈∂i\eΦa

k→e′(xa

e′) + φe′X

b6=a

˜xb

k→e′+xa

e′

= min

{xa

e′≥0}|Ra

i=0 X

e′∈∂i\eΦa

k→e′(˜xa

k→e′+εa

e′) + φe′X

b

˜xb

k→e′+εa

e′.(35)

We further adopt the approximations Φa

k→e′(˜xa

k→e′+εa

e′)≈Φa

k→e′(˜xa

k→e′) + βa

k→e′εa

e′+1

2αa

k→e′(εa

e′)2(augmented by

a piecewise quadratic function if it is non-smooth) and φe′(Pb˜xb

k→e′+εa

e′)≈φe′(Pb˜xb

k→e′) + φ′

e′(Pb˜xb

k→e′)εa

e′+

9

(a) (b)

Figure 6. Message-passing algorithm for routing games with Nddestinations converges to the equilibrium ﬂows x∗. (a) Method

I in Sec. I B 1) is used to treat the destination nodes. (b) Method II in Sec. I B 1) is used to treat the destination nodes. Each

sweep comprises of 40|E|local MP updates.

1

2φ′′

e′(Pb˜xb

k→e′)(εa

e′)2, and solve for the coeﬃcients {αa

i→e, βa

i→e}as in the single-destination scenario. The resulting

MP algorithm has the same structure as the one of single class described above, where its eﬃcacy is shown in Fig. 6.

G. Bilevel Optimization in Routing Games

The toll optimization problem (for single destination) of the upper-level planner is deﬁned as

min

τ

H(x∗(τ)) = X

e∈E

x∗

e(τ)ℓex∗

e(τ),(36)

s. t. constraints of τ,(37)

x∗(τ) =arg min

xX

e∈EZxe

0ℓe(y) + τedy, (38)

s.t. xe≥0, Ri= 0,∀e, i. (39)

The MP algorithm for computing the equilibrium ﬂows x∗(τ)for a given τis the same as the one introduced in

Sec. I, by replacing φe(xe) = Rxe

0ℓe(y)dyby φτ

e(xe) = Rxe

0[ℓe(y) + τe]dy.

The computation of the social optimum H(x)has a similar form for computing the equilibrium ﬂows, we therefore

use a parallel MP procedure for the social cost

Hi→e(xe) = min

{xe′≥0}|RiX

e′∈∂i\eHk→e′(xe′) + σ(xe′),

= min

{xe′≥0}|RiX

e′∈∂i\e1

2γk→e′εe′2+δk→e′εe′+1

2σ′′

e′(˜xk→e′)εe′2+σ′

e′(˜xk→e′)εe′,(40)

where Hk→e′(xe′)assumes a quadratic approximation and needs to be augmented by a piecewise quadratic function

if it is non-smooth. It results in an upper-level MP algorithm having the same structure as the one for computing

the equilibrium ﬂows. The diﬀerence is that, since the ﬂow is not directly driven by the central planner, the working

point ˜xi→eis not updated at the upper level MP. Instead, the central planner updates the toll τesuch that selﬁsh

users are attracted to the solution with a lower social cost.

When the toll τeon edge eis adapted, the marginal Nash-equilibrium ﬂow xN

echanges accordingly

xN

e(τe) = arg min

xe≥0Φi→e(xe) + Φj→e(xe) + φe(xe) + τexe.(41)

For smooth message functions Φi→e(xe)and Φj→e(xe)

xN

e(τe) = max αi→e+1

2φ′′

i→e˜xi→e+αj→e+1

2φ′′

j→e˜xj→e−(βi→e+βj→e+τe+1

2φ′

i→e+1

2φ′

j→e)

αi→e+αj→e+1

2φ′′

i→e+1

2φ′′

j→e

,0,(42)

10

(a) (b)

Figure 7. The eﬀect of tolls on the reduction in fractional social cost in routing games on random regular graphs with Nd

destinations, where tolls are restricted as 0≤τe≤1. Each sweep comprises 40Nd|E|local MP updates and 100 edgewise toll

updates in a random sequential schedule. (a) N= 100. (b) N= 200.

which is a piecewise linear function of τewith two branches. For non-smooth cavity functions, xN

e(τe)can also be

obtained straightforwardly, which is a piecewise linear function of τewith multiple branches.

The goal of toll-adaptation of τeis to decrease the social cost H(x), which amounts to decrease the full social cost

on edge e

τ∗

e=arg min

τe

Hfull

e(xN

e(τe)),(43)

Hfull

e(xe) := Hi→e(xe) + Hj→e(xe) + σe(xe),(44)

where the optimization in Eq. (43) needs to obey necessary constraints on tolls (e.g., the restriction 0≤τe≤τmax

eis

considered in the main text).

As Hfull

e(xN

e(τe)) is a convex function of xN

e, it is suﬃcient to adapt τesuch that xN

e(τe)gets as close to the marginal

socially optimal ﬂow xG

eas possible, where xG

eis given by

xG

e=arg min

xe≥0Hi→e(xe) + Hj→e(xe) + σe(xe).(45)

The search for the optimal toll τ∗

ecan be done eﬃciently by utilizing the property that xN

e(τe)is a piecewise linear

function of τe.

The resulting bilevel MP algorithm is described in the main text, where the lower-level messages {α(m)

i→e, β(m)

i→e,˜xi→e}

(m∈ {L, R}) and upper-level messages {γ(m)

i→e, δ(m)

i→e}are passed along edges to compute the equilibrium ﬂows xN

eand

related quantities. These messages facilitate the computation of Hfull

e(xe)in the upper level, which is used to update

the toll variables τe. In practice, the update of tolls is less frequent then the update of other messages. In the

experiments shown in the main text, for every 2

5Nd|E|MP iterations, we randomly select an edge eand update its

toll.

1. Extension to Multiple Destinations

The toll optimization problems with multiple destinations can be tackled by the proposed bilevel MP algorithm

using the approximations in Sec. I F. The results are shown in Fig. 7, which demonstrate the eﬀectiveness of the

algorithm in reducing the social cost by adapting tolls.

H. The Bilevel Programming Approach

Here we demonstrate the results of the bilevel programming approach to the toll optimization problem. It is achieved

by expressing the solution of Eq. (38) as constraints imposed by the Karush–Kuhn–Tucker (KKT) condition, and solve

the bilevel optimization as a global nonlinear programming problem [3]. Such an approach is intrinsically diﬃcult

as (i) the constraints by the KKT conditions can be nonlinear and non-convex; (ii) the complementary slackness

conditions are combinatorial, which requires a treatment with mixed integer programming (e.g., through branch and

bound).

11

Figure 8. Run time of bilevel programming on the toll optimization problem in random regular graphs. For each network size,

20 diﬀerent problem realizations are considered; red triangles represent the cases where bilevel programming fails to ﬁnd the

solution in single trials, green dots represent successful trials.

It is diﬃcult to directly compare the MP algorithms to the bilevel programming approach, as the convergence rate

for either approach is diﬃcult to establish. Besides, the bilevel programming approach is a centralized optimization

method, which has a diﬀerent space complexity per iteration. Nevertheless, we present the results of CPU run time

(CPU in used: i5-3317U) of bilevel programming (package in used: bileveljump.jl [4]) on the toll optimization problem

in Fig. 8. There exist cases where bilevel programming fails to ﬁnd the solution in a single trial, and the run times

vary signiﬁcantly among diﬀerent problem realizations.

The bilevel programming approach is more generic and ﬂexible than the MP approach, but it does not oﬀer a

decentralized algorithm as the MP approach. Besides, the MP algorithms can be extended to the scenarios with

discrete variables, which is very diﬃcult for the global optimization approach. It is also diﬃcult to treat the toll

selection problem in the main text with the bilevel programming method, especially when the socially optimum is not

known a priori in some variants of toll-setting or network-design problems [5].

II. MESSAGE-PASSING ALGORITHMS FOR FLOW CONTROL IN UNDIRECTED NETWORKS

In this section, we provide the details of the MP algorithm for ﬂow control in undirected networks. In a simple

undirected graph G(V, E), nodes iand jcan be connected by at most one edge (i, j), where the order of node iand j

does not matter (in contrast, edges (i, j )and (j, i)are two diﬀerent edges in a directed graph). In this case, edge (i, j)

can either transmit resources from node jto node ior from node ito node j. Denoting xij (=−xji ) as the ﬂow from

node jto node i; if xij <0(or xji =−xij >0), the resources are being transmitted from node ito node j. We also

assume that the underlying graph does not have any leaf nodes, by recursively trimming leaf nodes and absorbing

their resources into neighboring nodes.

A. MP For Lower-level Optimization

The equilibrium ﬂow (in the lower-level optimization problem) is the minimizer of the problem

min

x

C(x) = X

(i,j)

1

2rij x2

ij ,(46)

s.t. Ri=Λi+X

j∈Ni

xij = 0,∀i6=D,(47)

where the reference node Dcan be arbitrarily chosen.

The above optimization problem can be mapped onto its dual problem as

min

µ

Cdual(µ) = X

(i,j)

1

2rij

(µj−µi)2−X

i

Λiµi,(48)

=: 1

2µ⊤Lµ−Λ⊤µ,(49)

12

where µiis the Lagrange multiplier (or dual variable) associated with the ﬂow conservation constraint Ri= 0, and L

is the Laplacian matrix with matrix element

Lij := X

k∈Ni

1

rik !δij −1

rij

.(50)

The solution of the dual problem can be obtained by solving the system of linear equations Lµ∗=Λ, and the

equilibrium ﬂow x∗

ij is related to the optimal Lagrange multiplier µ∗through x∗

ij =µ∗

j−µ∗

i

rij . The drawback of such

an approach is that (i) solving the systems of linear equations usually needs a centralized solver; (ii) to compute

the response of the equilibrium ﬂow to changes of the control parameters {rij}in bilevel optimization, one needs to

evaluate the pseudo-inverse of the Laplacian matrix per iteration, which can be very computationally demanding for

large networks. Instead, we proposed to use MP for computing the equilibrium ﬂows and tackle the related bilevel

optimization problem, which is a scalable and eﬃcient decentralized algorithm.

For the lower-level equilibrium ﬂow problem in Eq. (46), the MP algorithm amounts to computing the message

functions

Ci→j(xij ) = min

{xki}|Ri=0 1

2rij x2

ij +X

k∈Ni\j

Ck→i(xki),(51)

where the deﬁnition of the message function Ci→j(xij )diﬀers from the one in Eq. (6) in that it includes the interaction

term on edge (i, j ), which yields a more concise update rule in this problem. Similar to routing games, we approximate

the message function by a quadratic form Ci→j(xij ) = 1

2αi→j(xij −ˆxi→j)2+const, such that the local optimization

in Eq. (51) reduces to the computation of the real-number messages mi→j∈ {αi→j,ˆxi→j}by passing the upstream

messages {mk→i}k∈Ni\j. Here the message function Ci→j(xij )is always smooth. The messages αi→j,ˆxi→jare

computed as [1, 6, 7]

αi→j=1

Pk∈Ni\jα−1

k→i

+rij ,(52)

ˆxi→j=Λi+Pk∈Ni\jˆxk→i

1 + rij Pk∈Ni\jα−1

k→i

.(53)

1. The Reference Node D

Similar to the MP algorithm in routing games, there are two methods to deal with possible boundary conditions of

the reference node D:

• Method I: Since node Dhas no constraints on its adjacent ﬂows, it will absorb all incoming ﬂows resulting in

CD→j(xDj) = 0 and

αD→j= 0,ˆxD→j= 0.(54)

In this treatment, the Lagrange multiplier of node Dcan be set to µD= 0 in the dual problem (as node Dis

unconstrained), which corresponds to a grounded node in the electric network interpretation of the problem.

• Method II: Alternatively, one can set an explicit constraint on the ﬂows {xDj}to the reference node D

RD:= ΛD+X

j∈ND

xDj= 0,(55)

where ΛD=−Pi6=DΛi. Then the messages from the reference node Dare calculated in the same way as for

other nodes.

Similar to the routing games, Method II results in an MP algorithm with a faster convergence rate, but it may fail to

converge for some graphs while Method I can still provide valid solutions.

Method II is used in the experiments for undirected ﬂow networks in the main text.

13

(a) (b)

(c) (d)

Figure 9. MP algorithm in undirected ﬂow networks converges to the equilibrium ﬂows x∗. Random regular graphs (RRG) of

degree 3are considered in (a)(b), while square lattices (size L×L) are considered in (c)(d). Metho d I in Sec. II A 1 is used in

(a)(c) to treat the reference node D, while Method II is used in (b)(d). In (b), each sweep comprises 40|E|local MP updates.

2. Computation of the Equilibrium Flows from Messages

Upon convergence of the messages, the equilibrium ﬂow x∗

ij can be obtained by minimizing the edgewise full cost

Cfull

ij (xij ) = Ci→j(xij ) + Cj→i(xij )−1

2rij x2

ij , giving rise to

x∗

ij =αj→iˆxj→i−αi→jˆxi→j

αi→j+αj→i−rij

.(56)

3. Results of the MP Algorithm on Undirected Flow Networks

In Fig. 9, we demonstrate the performance of the MP algorithms in undirected ﬂow networks. The MP algorithms

converge in diﬀerent networks, including square lattices with many short loops. The iterations needed to obtained

a given precision seems to depend on the topologies of the networks, e.g., square lattices appear to converge slower

than random regular graphs. The method used to treat the boundary reference node Dalso impacts on the number

of iterations needed, where it is observed that in general Method II makes MP converge faster than Method I. We

conjecture that the inﬂuence of single-node boundary conditions in Eq. (54) takes more iteration steps to diﬀuse

messages to the bulk of the network.

B. MP For Bilevel Optimization

The bilevel optimization problem on undirected networks aims to tune the ﬂows of targeted edges Tsuch that they

exceed or drop below certain limits, depending on the application. We consider the former case, where the goal is to

control the ﬂows on Tsuch that ρij (xij ) = |xij|−|x0

ij |

|x0

ij|−θ≥0,∀(i, j )∈ T (with x0

ij being the ﬂow before tuning). The

task in the upper-level is to minimize the ob jective

min

r

O(x∗(r)) = X

(i,j)∈T

−ρij(x∗

ij (r))Θ−ρij (x∗

ij (r)).(57)

14

As mentioned in the main text, the impact of the variation of the control parameters rij on the upper-level objective

Ois mediated through the messages mi→j∈ {αi→j,ˆxi→j}along the pathways from the targeted edges to edge (i, j).

For a targeted edge (p, q )∈ T , the boundary conditions of the gradient is given by ∂O

∂mp→q=∂O

∂x∗

pq

∂x∗

pq

∂mp→q, where

components admit the following expressions

∂O

∂x∗

pq

=−Θ−ρpq(x∗

pq )sgn(x∗

pq)

|x0

pq|,(58)

∂x∗

pq

∂αp→q

=−ˆxp→q

αp→q+αq→p−rpq

−−x∗

pq

αp→q+αq→p−rpq

,(59)

∂x∗

pq

∂ˆxp→q

=−αp→q

αp→q+αq→p−rpq

.(60)

The gradient with respect to the control parameter rpq on the targeted edge is computed as

∂O

∂rpq

=∂O

∂x∗

pq

∂x∗

pq

∂rpq

+X

m∈{α,ˆx}∂O

∂mp→q

∂mp→q

∂rpq

+∂O

∂mq→p

∂mq→p

∂rpq . (61)

For a non-targeted edge (k, i)/∈ T , we need to ﬁrst evaluate ∂O

∂mk→i, which can be obtained by summing the gradients

on its downstream edges {i→l|l∈ Ni\k}, computed as

∂O

∂mk→i

=X

l∈Ni\kX

mi→l∈{αi→l,ˆxi→l}

∂O

∂mi→l

∂mi→l

∂mk→i

,(62)

where the gradient propagation of messages ∂ mi→l

∂mk→iadmits the following forms

∂αi→l

∂αk→i

=α−2

k→i

Pn∈Ni\lα−1

n→i2,(63)

∂αi→l

∂ˆxk→i

= 0,(64)

∂ˆxi→l

∂αk→i

=ˆxi→lrilα−2

k→i

1 + ril Pn∈Ni\lα−1

n→i

,(65)

∂ˆxi→l

∂ˆxk→i

=1

1 + ril Pn∈Ni\lα−1

n→i

.(66)

The resulting gradient w.r.t. to the control parameter rki (note that edge (k, i)/∈ T ) can be computed as

∂O

∂rki

=X

m∈{α,ˆx}∂O

∂mk→i

∂mk→i

∂rki

+∂O

∂mi→k

∂mi→k

∂rki .(67)

In Eqs. (61) and (67), the terms ∂mk→i

∂rki are computed as

∂αk→i

∂rki

= 1,(68)

∂ˆxk→i

∂rki

= ˆxk→i

−Pn∈Nk\iα−1

n→k

1 + rki Pn∈Nk\iα−1

n→k

,(69)

which closes the equations for the gradient computations.

C. The Global Optimization Approach

In this section, we provide details of the global optimization approach on undirected ﬂow networks used in the main

text. As mentioned before, e.g., in Sec. II A, we have x∗

ij =µ∗

j−µ∗

i

rij , where µ∗=L(r)†Λand L(r)†is the pseudo-inverse

15

of the Laplacian matrix L(r)deﬁned in Eq. (50). To compute the gradient ∂O

∂rij , we need to evaluate the response of

µ∗to the variation of the control parameters r, i.e.,

∂O

∂rij

=X

(p,q)∈T

∂O

∂x∗

pq 1

rpq ∂µ∗

q

∂rij

−∂µ∗

p

∂rij −1

r2

pq

(µ∗

q−µ∗

p)δ(i,j),(p,q).(70)

Furthermore, it requires to compute ∂L(r)†

∂rij for ∂µ∗

∂rij . Assuming the underlying graph will not fragment into multiple

components when adapting r,L(r)has a constant rank, then we have [8]

∂L(r)†

∂rij

=−L†∂L

∂rij

L†+L†(L†)⊤∂L⊤

∂rij (I−LL†)

+ (I−L†L)∂L⊤

∂rij (L†)⊤L†.(71)

Using the property of the pseudo-inverse of the Laplacian L†L=LL†=I−1

N11⊤and ∂L

∂rij ·1= 0 (with 1as the

all-one vector), we have

∂L(r)†

∂rij

=−L†∂L

∂rij

L†,(72)

which closes the equations for calculating the gradients.

[1] K. Y. M. Wong and D. Saad, Inference and optimization of real edges on sparse graphs: A statistical physics perspective,

Phys. Rev. E 76, 011115 (2007).

[2] B. Li, D. Saad, and A. Y. Lokhov, Reducing urban traﬃc congestion due to localized routing decisions, Phys. Rev. Research

2, 032059 (2020).

[3] B. Colson, P. Marcotte, and G. Savard, An overview of bilevel optimization, Annals of Operations Research 153, 235 (2007).

[4] J. D. Garcia, G. Bodin, davide f, I. Fiske, M. Besançon, and N. Laws, joaquimg/bileveljump.jl: v0.4.1,

https://doi.org/10.5281/zenodo.4556393 10.5281/zenodo.4556393 (2021).

[5] A. Migdalas, Bilevel programming in traﬃc planning: Models, methods and challenge, Journal of Global Optimization 7,

381 (1995).

[6] C. H. Yeung and K. Y. M. Wong, Optimal location of sources in transportation networks, Journal of Statistical Mechanics:

Theory and Experiment 2010, P04017 (2010).

[7] P. Rebeschini and S. Tatikonda, A new approach to laplacian solvers and ﬂow problems, Journal of Machine Learning

Research 20, 1 (2019).

[8] G. H. Golub and V. Pereyra, The diﬀerentiation of pseudo-inverses and nonlinear least squares problems whose variables

separate, SIAM Journal on Numerical Analysis 10, 413 (1973).