PreprintPDF Available

Bilevel Optimization in Flow Networks -- A Message-passing Approach

Authors:
  • Harbin Institute of Technology (Shenzhen)
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Optimizing embedded systems, where the optimization of one depends on the state of another, is a formidable computational and algorithmic challenge, that is ubiquitous in real-world systems. We study flow networks, where bilevel optimization is relevant to traffic planning, network control and design, and where flows are governed by an optimization requirement subject to the network parameters. We employ message-passing algorithms in flow networks with sparsely coupled structures to adapt network parameters that govern the network flows, in order to optimize a global objective. We demonstrate the effectiveness and efficiency of the approach on randomly generated graphs.
Content may be subject to copyright.
arXiv:2108.00960v1 [math.OC] 2 Aug 2021
Bilevel Optimization in Flow Networks – A Message-passing Approach
Bo Li1and David Saad1
1Non-linearity and Complexity Research Group, Aston University, Birmingham, B4 7ET, United Kingdom
Optimizing embedded systems, where the optimization of one depends on the state of another, is
a formidable computational and algorithmic challenge, that is ubiquitous in real-world systems. We
study flow networks, where bilevel optimization is relevant to traffic planning, network control and
design, and where flows are governed by an optimization requirement subject to the network pa-
rameters. We employ message-passing algorithms in flow networks with sparsely coupled structures
to adapt network parameters that govern the network flows, in order to optimize a global objective.
We demonstrate the effectiveness and efficiency of the approach on randomly generated graphs.
Many problems in science and engineering involve hier-
archical optimization or decision-making, whereby some
of the variables cannot be freely chosen but are governed
by another optimization problem [1]. As a motivating
example, consider the task of designing a network (e.g.,
a road or communication network) that maximizes the
throughput of commodities or information flow. While
the designer controls the network parameters (upper-
level optimization), traffic flows are determined by the
network users who maximize their own benefit (lower-
level optimization) [2]. Therefore, the designer needs to
adapt the network intricately, taking into account of the
reaction of network users. Similarly, many physical sys-
tems admit a certain extremization principle for given
controllable system parameters, e.g., minimal free energy
in thermal equilibrium [3] given interaction strengths,
electric flows in resistor networks that minimize the en-
ergy dissipation [4, 5] and, entropy maximization and
parameter optimization are used across disciplines in in-
ference and learning tasks [6, 7]. Adapting system pa-
rameters to extremize a given objective requires bilevel
optimization, which considers both system parameters
and the inherent optimization of the physical or human-
made system variables.
These examples of bilevel optimization are intrinsi-
cally difficult to solve [8]. In fact, even the simple in-
stance where both levels are linear programming tasks
was shown to be NP-hard [9, 10]. Generic methods for
bilevel optimization include (i) expressing the lower-level
optimization problem as nonlinear constraints and solv-
ing the bilevel problem as global optimization [11, 12]; (ii)
gradient-descent method by computing the descent direc-
tion of the upper-level objectives while keeping the valid
lower-level state variables [13, 14]. The former introduces
complicated nonlinear constraints, making the reduced
single-level problem difficult to solve in general, while
the latter is generally challenging due to the difficulty
in computing the descent direction [8]. Moreover, such
generic methods do not utilize existing system structure
to simplify the task. In this Letter, we consider flow op-
timization problems on sparse networks; by virtue of the
sparsely-coupled structures, the message-passing (MP)
approach, an inherently distributed algorithm, appears
to be effective and efficient in both single and bilevel
optimization, as demonstrated below for applications in
routing and flow control.
Routing Game – We focus on a network planning prob-
lem in the routing game setting, widely used in modeling
route choices of drivers [15]. Users on the road network
make their route choices in a selfish and rational manner,
where the corresponding Nash equilibrium is generally
not the most beneficial for the system utility as a whole,
measured by the total travel time of all users [2, 16].
The operator’s task is to set the appropriate tolls or re-
wards on network edges in order to reduce the total travel
time while taking into account the reactions of users to
the tolls. The toll-setting problem has attracted signifi-
cant interest in the field of traffic engineering and oper-
ations research, where theoretical results are limited to
simple networks or cannot accommodate effectively toll
constraints [17–19]. Recently, the idea of reducing traffic
congestion by economic incentives to influence drivers’
behaviors has regained interest [20–22], partly due to the
deployment of smart devices and data availability [23–
25]. Here, we focus on the algorithmic aspect of toll
optimization.
The road network is represented by a directed graph
G(V, E ), where Vis the set of nodes (junctions) and E
the set of directed edges (unidirectional roadways), hav-
ing one connected component. Users routing from an
origin node i0to a destination node Dwould select a
path P= ((i0, i1),(i1, i2), ..., (in2, in1),(in1,D)) by
minimizing their total travel time Pe∈P e(xe), or alter-
native cost, where the edge flow xerepresents the number
of users choosing edge eand e(xe)is the corresponding
latency function. It is assumed that eis monotonically
increasing with the edge flow xe. We consider the limit
of a large number of users, each of which controls an in-
finitesimal fraction of the overall traffic, such that the
edge flow xeis a continuous variable. This is termed the
non-atomic game setting [16]. The social cost is defined
as the total travel time of all users H=PeExee(xe).
As the equilibrium reached by the selfish decisions of
users does not generally achieve the lowest social cost,
we seek to place tolls {τe}on edges to influence users’
route choices. Gauging the monetary penalty at the same
2
(a) (b)
Figure 1. (a) Top: a directed road network section with a
junction node iand four roadways. Bottom: the correspond-
ing factor graph representation, where the flows adjacent to
junction node isatisfy the flow conservation constraint; node
iis called a factor node and marked by a square. (b) Bilevel
MP for the toll-planning problem. Blue arrows indicate the
direction of messages. The lower and upper levels solve the
problem for the Nash equilibrium and social optimum, respec-
tively. The equilibrium flow x
eis determined in the lower
level, while the toll τeis set in the upper level.
scale as latency, users will choose a path Pthat min-
imizes the combined total journey cost in latency and
tolls Pe∈P e(xe) + τe. If tolls can be placed freely
on all edges, marginal cost pricing is known to induced
socially optimal flow [18]. However, it is usually infeasi-
ble to set an unbounded toll on every road, which ren-
ders marginal cost pricing not applicable in practice. We
therefore consider restricted tolls 0τeτmax
e; an
edge eis not chargeable when τmax
e= 0. For simplicity,
we do not consider the income from tolls to contribute
to the social cost [26]. In total, Λiusers are traveling
from node ito a universal destination D, where the case
with multiple destinations is discussed in the supplemen-
tal material (SM) [27]. The resulting edge flows satisfy
the non-negativity xe0and the flow conservation con-
straints
Ri= Λi+X
ein
i
xeX
eout
i
xe= 0,(1)
where in
iand out
iare the sets of incoming and outgoing
edges adjacent to node i. It has been established that the
edge flows in user equilibrium can be obtained by mini-
mizing a potential function [28, 29] Φ = PeEφe(xe) :=
PeE´xe
0e(y)dysubject to the constraints of Eq. (1).
The lower-level optimization is a nonlinear min-cost
flow problem, where edge flows are coupled through the
conservation constraints in Eq. (1), represented as factor
nodes in Fig. 1(a). We employ the MP approach devel-
oped in [30] to tackle the nonlinear optimization prob-
lem. It turns the global optimization of the potential
(i.e., minxΦ(x)) into a local computation of the follow-
ing message functions
Φie(xe) = min
{xe0}|Ri=0 X
e∂i\eΦei(xe) + φe(xe),
(2)
where ∂i =in
iout
iand Φie(xe)relates to the opti-
mal potential function contributed by the flows adjacent
to node iwhere the flow on edge eis set to xe, taking into
account flow conservation at node i. In Eq. (2), denoting
e= (k, i), we can write Φei(xe) = Φke(xe); there-
fore only factor-to-variable messages are needed. The
message Φke(xe)can be obtained recursively by an
expression similar to Eq. (2), but using the incoming mes-
sages from its upstream edges {lk|(l, k)k\i}. The
computation of messages involves only a few variables
when the network is sparse, which can be performed ef-
ficiently. Upon computing the messages iteratively un-
til convergence, we can determine the equilibrium flow
x
eon edge e= (i, j)by minimizing the edgewise full
energy dictated by the nonlinear cost φe(xe)and mes-
sages from both ends of edge e, defined as Φful l
e(xe) =
Φie(xe) + Φje(xe) + φe(xe).
This algorithm can be demanding when different values
of xeare needed to determine the profile of the message
Φie(xe). To reduce the computational cost, we approx-
imate the message near the working point ˜xieas
Φiexie+εe)Φie(˜xie) + βieεe+1
2αieεe2,
(3)
where βieand αieare the first and second deriva-
tives of Φieevaluated at ˜xie, assuming the deriva-
tives exist. The coefficients βieand αieare updated
by solving Eq. (2) and by using {˜xke, βke, αke|e=
(k, i)i\e}, while the working point ˜xieis updated
by pushing it towards the minimizer x
eof the full energy
Φfull
e(xe)gradually. The resulting algorithm only requires
to maintain a few numbers rather than the full profile of
Φie, making it tractable. It has been shown to work re-
markably well in many network flow problems [31], while
the messages may oscillate and not converge in prob-
lems which possess non-smooth characteristics [30], in-
cluding in the routing games analyzed here. We discover
that the non-negativity constraints on flows may result
in non-smooth message Φie(xe)(i.e., its first derivative
is discontinuous) with at most one break point, which
makes the approximation of Eq. (3) inadequate. A sim-
ple solution is to approximate Φie(xe)by a continuous
and piecewise-quadratic function with at most one break
point, where each branch mis a quadratic function gov-
erned by three numbers {˜x(m)
ie, β(m)
ie, α(m)
ie}. Taking into
account the non-smooth structures, the MP algorithms
converge well even in very loopy networks and provide
the correct flows for routing game with single-level opti-
mization [27].
For bilevel optimization, we notice that the cost
function of the upper layer H(x)has a similar struc-
3
ture as the one of the lower layer. Therefore, one
can apply a similar MP procedure as Hie(xe) =
min{xe}|Ri=0 Pe∂i\e[Hei(xe) + xee(xe)]. The
message Hie(xe)can also be approximated by a
piecewise-quadratic function with at most one break
point, where each branch mhas the form H(m)
iexie+
εe)H(m)
ie(˜xie) + γ(m)
ieεe+1
2δ(m)
ieεe2. As the equi-
librium state is determined in the lower level, the same
working points {˜xie}in the lower-level MP are also used
for the upper level. The landscape of the edgewise full
cost Hfull
e(xe) = Hie(xe)+ Hje(xe) + xe(xe)provides
the essential information for setting the toll. Specifically,
the toll is updated by minτeHfull
e(x
e(τe)), where the toll-
dependent equilibrium flow x
eis provided by the lower-
level messages. The basic structure of such bilevel MP is
illustrated in Fig. 1(b), while the details of the message
updates are provided in the SM [27].
We demonstrate the effectiveness of the proposed al-
gorithm in random regular graphs (RRG) in Fig. 2(a),
where we use an affine latency model e(xe) = te(1 +
sxe/ce), with teand ceas the free traveling time and edge
capacity and sas a sensitivity measure of latency to con-
gestion, which is commonly used in routing games and
traffic modeling [16]. Details of the parameter setting,
experiments on other network topologies and the cases
of multiple destinations are discussed in the SM [27].
Though the bilevel message-passing does not generally
converge to a set of unique optimal tolls due to the non-
convex nature of the problem, we found that the social
costs are reduced when tolls are updated during MP. The
scaling relation in Fig. 2(a) empirically indicates that the
number of updates is about O(|E|2)for achieving a given
cost. Moreover, the MP algorithm can be implemented
in a fully distributed manner, unlike the generic global
optimization approach [27]. In practice, it may be in-
feasible to charge every edge separately. So we consider
the problem of choosing a subset of chargeable edges and
then optimizing only the tolls on these edges, which is
an even more difficult task than the toll-setting prob-
lem [32]. An empirical solution is to choose the edges
where the equilibrium flows {xN
e}are much larger than
the socially optimal flows {xS
e}, which appears to work
well as seen in Fig. 2(b). However, the social optimum
may be unknown a priori in some variants of toll-setting
or network-design problems [33]. So we also use the cri-
teria based on the potential reduction in edgewise full
cost Hfull
e(x
e)due to tolls, which also select effectively
the chargeable links as seen in Fig. 2(b).
Flow Control – When objective functions at both layers
are extensive, as in the routing game, the MP algorithm
based on edgewise updates of tolls using localized infor-
mation turns out to be effective. However, in some cases,
it may be necessary to consider the influence of the con-
trol variable updates on other locations of the networks.
To showcase this, we consider the problem of tuning net-
(a) (b)
Figure 2. Effect of tolls on the fractional social cost reduc-
tion, defined as (H(x(τ)) HS)/(HNHS)where HSand
HNrepresent the social costs at the social optimum and the
Nash equilibrium without tolls. Random regular graphs of
degree 3and affine latency functions are used. Each data
point is the average of 10 different network realizations. (a)
Fractional cost reduction during the bilevel MP updates for
different system sizes, where each sweep consists of 40|E|local
MP steps and 100 edgewise toll updates in a random sequen-
tial schedule. A fixed number of sweeps without toll updates
are performed to warm up the system. Inset: panel (a) with
x-axis as MP steps rescaled by |E|2. (b) Fractional cost re-
duction as a function of the fraction fof chargeable edges on
a random regular graph with N= 200. A random selection of
edges to be charged is compared with selections based on the
equilibrium flows difference xN
exS
eand edgewise full cost
reduction Hfull
e(x
e).
work flows to achieve certain functionality. In this exam-
ple, resources need to be transported from source nodes
to destination along edges in an undirected network
G(V, E ), where the equilibrium edge flows {xij }minimize
the transportation cost C=P(i,j )E
1
2rij x2
ij , subject to
flow conservation constraints similar to Eq. (1). The ma-
jor difference of this model from the routing games is
that the network is undirected, where edge (i, j)can ac-
commodate either the flow from node jto ior from node
ito j. The objective is to control the network param-
eters {rij }in order to reduce or increase the flows on
some edges. The task of reducing certain edge flows has
applications in power grid congestion mitigation in the
direct-current (DC) approximation [34], where rij is re-
lated to reactance of edge (i, j ), controllable through de-
vices in a flexible alternating current transmission system
(FACTS) [35]. On the other hand, the task of increasing
certain edge flows has been used to model the tunability
of network functions, which is applicable in mechanical
and biological networks [36] as well as learning machines
in metamaterials [37].
As an example, we consider the task of flow control
such that the relative increments of the magnitude of
flows on the targeted set of edges Texceed a certain limit
θ[36], i.e., ρij =|xij|−|x0
ij|
|x0
ij|θ0,(i, j )∈ T (with x0
ij
being the flow prior to tuning). It can be achieved by
minimizing the hinge loss (upper-level objective) O=
P(i,j)∈T ρij Θ(ρij ), where Θ(·)is the Heaviside step
function. The task of congestion mitigation of flows in
power grids can be studied similarly. We adopt the usual
4
MP algorithm to tackle the the lower-level optimization
problem, resulting in the local message functions
Cij(xij ) = min
{xki}|Ri=0 1
2rij x2
ij +X
k∈Ni\j
Cki(xki),
(4)
where Niis the set of neighboring nodes adjacent to node
i. The definition of the message function Cij(xij )dif-
fers from the one of Eq. (2) in that it includes the in-
teraction term on edge (i, j ), which yields a more con-
cise update rule in this problem. Similar to Eq. (3), we
approximate the message function by a quadratic form
Cij(xij ) = 1
2αij(xij ˆxij)2+const, such that the lo-
cal optimization in Eq. (4) reduces to the computation of
the real-valued messages mij∈ {αij,ˆxij}by pass-
ing the upstream messages {mki}k∈Ni\j, as illustrated
on the left panel of Fig. 2(a) [27]. Upon convergence, the
equilibrium flow x
ij can be obtained by minimizing the
edgewise full cost Cfull
ij (xij ) = Cij(xij) + Cji(xij )
1
2rij x2
ij .
The variation of the control parameters {rij }will im-
pact on the messages {mij}, which in turn affects the
equilibrium flows xand therefore the upper-level objec-
tive O(x). Specifically, one considers the effect of the
change of rij on the targeted edge flows {x
pq}(p,q)∈T ,
derived by computing the gradient of the objective func-
tion with respect to the messages O
∂mij. The targeted
edges provide the boundary conditions in the form of
O
∂mpq=O
∂x
pq
∂x
pq
∂mpq,(p, q) T . As the messages
from node ito jare functions of the upstream mes-
sages, i.e., mij=mij({mki}k∈Ni\j), the gradients
on edge ijare passed backward to its upstream edges
{ki}k∈Ni\jthrough the chain rule, as illustrated in
the right panel of Fig. 2(a). The full gradient on a non-
targeted edge kican be obtained by summing the
gradients on its downstream edges, computed as
O
∂mki
=X
l∈Ni\kX
mil∈{αil,ˆxil}
O
∂mil
∂mil
∂mki
.(5)
The gradient messages {O
∂mki}are passed in a random
and asynchronous manner, resulting in a decentralized
algorithm.
The gradient with respect to the control parameter on
the non-targeted edge (k, i)can be obtained straightfor-
wardly as
O
∂rki
=X
m∈{α,ˆx}O
∂mki
∂mki
∂rki
+O
∂mik
∂mik
∂rki ,
(6)
which serves to update the control parameter in a gradi-
ent descent manner rki rki sO
∂rki with certain step
size s. In this particular flow model, the gradient O
∂rki can
also be calculated exactly, leading to a global gradient
descent (GGD) algorithm. However, the GGD approach
(a) (b)
(c) (d)
Figure 3. Bilevel optimization in the flow control applica-
tion. A random regular graph (with N= 200, degree 3) and
a square lattice of size 15 ×15 are used in the experiments.
The source node, destination node and the targeted edges are
randomly selected. The control parameters are bounded to be
rij [0.9,1.1]. (a) Left: MP for solving the lower-level equi-
librium flow problem. Right: computation of the gradients of
the upper-level objective function O, which are communicated
in the reversed direction. (b) Comparison of the gradients at
initial rcomputed by the MP approach (obtained by fixing
rand passing messages {mij}and gradients {O
∂mij}) and
the GGD approach, with |T | = 5, θ = 0.1. Inset: mean square
error (MSE) of the gradients by the MP approach during the
message and gradient passing, in comparison to the GGD ap-
proach. The lower-level MP is performed until convergence
before passing the gradients. Each sweep consists of 4|E|local
MP steps in a random sequential schedule. (c) MP for min-
imizing the upper-level objective function Owith θ= 0.1,
where one randomly selected control parameter is updated
following the descent direction every 4|E|/10 steps. (d) Frac-
tion of successfully tuned cases (satisfying O= 0)Psuccess out
of 100 different problem realizations with |T | = 5, as a func-
tion of the threshold θ. Each realization has a different pair
of source and destination nodes.
requires computing the inverse of the Laplacian matrix
in every iteration, which can be time-consuming for large
networks. On the contrary, the gradients are computed
in a local and distributed manner in the MP approach.
Similar ideas of gradient propagation of MP have been
proposed in [38, 39] in the context of approximate infer-
ence, which are usually implemented in the reverse order
of MP updates and in a centralized manner, unlike the
decentralized approach presented here.
The gradient computed by the MP algorithm provides
an excellent estimation to the exact gradient for differ-
ent types of networks, as illustrated in Fig. 2(b). In
bilevel optimization of flow networks, we do not wait un-
til the convergence of the gradient passing, but update
the control parameters during the MP iterations in order
to make the algorithm more efficient. It provides ap-
5
proximated gradient information, which is already effec-
tive enough for the optimization of the global objective,
as shown in Fig. 2(c). The MP approach yields similar
success rates in managing the network flows for differ-
ent thresholds compared to the GGD approach as shown
in Fig. 2(d), demonstrating the effectiveness of the MP
approach for the bilevel optimization.
In summary, we propose MP algorithms for solving
bilevel optimization in flow networks, focusing on appli-
cations in the routing game and flow control problems.
In routing games, the objective functions in both levels
admit a similar structure, which leads to two sets of sim-
ilar messages being passed in both levels. Updates of the
control variables based on localized information appear
to be effective for toll optimization in this case. However,
the long-range impact of control variable changes should
be considered in some applications. This is accommo-
dated by a separate distributed gradient passing process,
which is shown to be effective and computationally effi-
cient for optimization in flow control problems. Leverag-
ing the sparse network structures, the MP approach offers
efficient and intrinsically distributed algorithms in con-
trast to global optimization methods such as nonlinear
programming, although the latter is generic and applica-
ble to many other problems, but is not always scalable.
The MP approach could provide effective algorithms for
bilevel optimization problems that are intractable or dif-
ficult to solve by global optimization approaches. For
instance, the MP algorithms can be easily extended to
atomic routing games where network flows are discrete
variables [40] and are difficult to solve via nonlinear pro-
gramming. This could potentially be done through the
techniques of [41–43]. We believe that these MP methods
provide a valuable element in the toolbox for solving dif-
ficult bilevel optimization problems, especially in systems
with sparsely-coupled structures.
We thank K. Y. Michael Wong, Chi Ho Yeung and
Tat Shing Choi for helpful discussions. B.L. and D.S.
acknowledge support from the Leverhulme Trust (RPG-
2018-092), European Union’s Horizon 2020 research and
innovation programme under the Marie Skłodowska-
Curie Grant Agreement No. 835913. D.S. acknowledges
support from the EPSRC programme grant TRANSNET
(EP/R035342/1).
[1] A. Sinha, P. Malo, and K. Deb, A review on bilevel opti-
mization: From classical to evolutionary approaches and
applications, IEEE Transactions on Evolutionary Com-
putation 22, 276 (2018).
[2] J. G. Wardrop, Some theoretical aspects of road traffic
research., Proceedings of the Institution of Civil Engi-
neers 1, 325 (1952).
[3] M. Plischke and B. Bergersen, Equilibrium Statistical
Physics, 3rd ed. (WORLD SCIENTIFIC, Singapore,
2006).
[4] W. Thomson and P. G. Tait, Treatise on Natural Phi-
losophy, 2nd ed., Cambridge Library Collection - Math-
ematics, Vol. 1 (Cambridge University Press, 2009).
[5] P. G. Doyle and J. L. Snell, Random Walks and Electric
Networks (Mathematical Association of America, 1984).
[6] E. T. Jaynes, Information theory and statistical mechan-
ics, Phys. Rev. 106, 620 (1957).
[7] K. Murphy, Machine Learning: A Probabilistic Perspec-
tive, Adaptive Computation and Machine Learning series
(MIT Press, Cambridge, Massachusetts, 2012).
[8] B. Colson, P. Marcotte, and G. Savard, An overview of
bilevel optimization, Annals of Operations Research 153,
235 (2007).
[9] R. G. Jeroslow, The polynomial hierarchy and a simple
model for competitive analysis, Mathematical Program-
ming 32, 146 (1985).
[10] P. Hansen, B. Jaumard, and G. Savard, New branch-
and-bound rules for linear bilevel programming, SIAM
Journal on Scientific and Statistical Computing 13, 1194
(1992).
[11] J. F. Bard and J. E. Falk, An explicit solution to the
multi-level programming problem, Computers & Opera-
tions Research 9, 77 (1982).
[12] J. F. Bard, Convex two-level optimization, Mathematical
Programming 40-40, 15 (1988).
[13] C. D. Kolstad and L. S. Lasdon, Derivative evaluation
and computational experience with large bilevel mathe-
matical programs, Journal of Optimization Theory and
Applications 65, 485 (1990).
[14] G. Savard and J. Gauvin, The steepest descent direction
for the nonlinear bilevel programming problem, Opera-
tions Research Letters 15, 265 (1994).
[15] M. Patriksson, The traffic assignment problem : models
and methods (Dover Publications, Inc, New York, 2015).
[16] T. Roughgarden, Selfish routing and the price of anarchy
(MIT Press, Cambridge, Massachusetts, 2005).
[17] M. J. Beckmann, C. B. McGuire, and C. B. Winsten,
Studies in the Economics of Transportation (Yale Uni-
versity Press, New Haven, 1956).
[18] M. Smith, The marginal cost taxation of a transportation
network, Transportation Research Part B: Methodologi-
cal 13, 237 (1979).
[19] R. Cole, Y. Dodis, and T. Roughgarden, How much can
taxes help selfish routing?, Journal of Computer and Sys-
tem Sciences 72, 444 (2006), network Algorithms 2005.
[20] S. Çolak, A. Lima, and M. C. González, Understanding
congested travel in urban areas, Nature Communications
7, 10793 (2016), article.
[21] Electronic road pricing,
https://web.archive.org/web/20110605101108/http://www.lta.gov.sg/motoring_matters/index_motoring_erp.htm
(2019).
[22] N. Barak, Israel tries battling traffic
jams with cash handouts, ISRAEL21c,
https://www.israel21c.org/israel-tries-battling-traffic-jams-with-cash- handouts/
(2019).
[23] J. Alonso-Mora, S. Samaranayake, A. Wallar, E. Fraz-
zoli, and D. Rus, On-demand high-capacity ride-sharing
via dynamic trip-vehicle assignment, Proceedings of the
National Academy of Sciences 114, 462 (2017).
[24] S. Lim, H. Balakrishnan, D. Gifford, S. Madden, and
D. Rus, Stochastic motion planning and applications to
traffic, The International Journal of Robotics Research
30, 699 (2011).
6
[25] B. Li, D. Saad, and A. Y. Lokhov, Reducing urban traffic
congestion due to localized routing decisions, Phys. Rev.
Research 2, 032059 (2020).
[26] G. Karakostas and S. G. Kolliopoulos, The efficiency
of optimal taxes, in Combinatorial and Algorithmic As-
pects of Networking, edited by A. López-Ortiz and A. M.
Hamel (Springer Berlin Heidelberg, Berlin, Heidelberg,
2005) pp. 3–12.
[27] See Supplemental Material for details, which includes
Refs [44–47].
[28] D. Monderer and L. S. Shapley, Potential games, Games
and Economic Behavior 14, 124 (1996).
[29] H. Bar-Gera, Origin-based algorithm for the traffic
assignment problem, Transportation Science 36, 398
(2002).
[30] K. Y. M. Wong and D. Saad, Inference and optimiza-
tion of real edges on sparse graphs: A statistical physics
perspective, Phys. Rev. E 76, 011115 (2007).
[31] K. Y. M. WONG, D. SAAD, and C. H. YEUNG, Dis-
tributed optimization in transportation and logistics net-
works, IEICE Transactions on Communications E99.B,
2237 (2016).
[32] M. Hoefer, L. Olbrich, and A. Skopalik, Taxing subnet-
works, in Internet and Network Economics, edited by
C. Papadimitriou and S. Zhang (Springer Berlin Heidel-
berg, Berlin, Heidelberg, 2008) pp. 286–294.
[33] A. Migdalas, Bilevel programming in traffic planning:
Models, methods and challenge, Journal of Global Op-
timization 7, 381 (1995).
[34] A. Wood, B. Wollenberg, and G. Sheblé, Power Gen-
eration, Operation, and Control (Wiley, Hoboken, New
Jersey, 2013).
[35] X.-P. Zhang, C. Rehtanz, and B. Pal, Flexible AC
Transmission Systems: Modelling and Control (Springer,
Berlin, Heidelberg, 2006).
[36] J. W. Rocks, H. Ronellenfitsch, A. J. Liu, S. R. Nagel,
and E. Katifori, Limits of multifunctionality in tunable
networks, Proceedings of the National Academy of Sci-
ences 116, 2506 (2019).
[37] M. Stern, D. Hexner, J. W. Rocks, and A. J. Liu, Su-
pervised learning in physical networks: From machine
learning to learning machines, Phys. Rev. X 11, 021045
(2021).
[38] F. Eaton and Z. Ghahramani, Choosing a variable to
clamp, in Proceedings of the Twelth International Confer-
ence on Artificial Intelligence and Statistics, Proceedings
of Machine Learning Research, Vol. 5, edited by D. van
Dyk and M. Welling (PMLR, Hilton Clearwater Beach
Resort, Clearwater Beach, Florida USA, 2009) pp. 145–
152.
[39] J. Domke, Learning graphical model parameters with ap-
proximate marginal inference, IEEE Transactions on Pat-
tern Analysis and Machine Intelligence 35, 2454 (2013).
[40] R. W. Rosenthal, The network equilibrium problem in
integers, Networks 3, 53 (1973).
[41] C. H. Yeung and D. Saad, Competition for shortest paths
on sparse graphs, Phys. Rev. Lett. 108, 208701 (2012).
[42] C. H. Yeung, D. Saad, and K. Y. M. Wong, From the
physics of interacting polymers to optimizing routes on
the london underground, Proceedings of the National
Academy of Sciences 110, 13717 (2013).
[43] H. F. Po, C. H. Yeung, and D. Saad, Futility of being
selfish in optimized traffic, Phys. Rev. E 103, 022306
(2021).
[44] J. D. Garcia, G. Bodin, davide f, I. Fiske, M. Be-
sançon, and N. Laws, joaquimg/bileveljump.jl: v0.4.1,
https://doi.org/10.5281/zenodo.4556393 10.5281/zen-
odo.4556393 (2021).
[45] C. H. Yeung and K. Y. M. Wong, Optimal location of
sources in transportation networks, Journal of Statisti-
cal Mechanics: Theory and Experiment 2010, P04017
(2010).
[46] P. Rebeschini and S. Tatikonda, A new approach to
laplacian solvers and flow problems, Journal of Machine
Learning Research 20, 1 (2019).
[47] G. H. Golub and V. Pereyra, The differentiation of
pseudo-inverses and nonlinear least squares problems
whose variables separate, SIAM Journal on Numerical
Analysis 10, 413 (1973).
arXiv:2108.00960v1 [math.OC] 2 Aug 2021
Bilevel Optimization in Flow Networks – A Message-passing Approach
– Supplemental Material
Bo Li1and David Saad1
1Non-linearity and Complexity Research Group, Aston University, Birmingham, B4 7ET, United Kingdom
I. MESSAGE-PASSING ALGORITHMS FOR ROUTING GAME
In this section, we provide details of the message-passing (MP) algorithm for the routing game in road networks,
modeled by a directed graph G(V, E ).
A. Problem Setting and Notation
A directed edge ein the directed graph is represented by an ordered tuple e= (i, j ), where node iis the head and
node jis the tail of edge e, i.e., i=h(e), j =t(e). Note that there can be at most two directed edges connecting node
iand node j, i.e., e= (i, j ), e= (j, i).
We write the set of incoming edges to node ias in
i={e|eE, t(e) = i}, the set of outgoing edges from node ias
out
i={e|eE, h(e) = i}and the set of edges adjacent to node ias ∂i =in
iout
i. For convenience, we define the
incident operator B:EV, with matrix elements
Bi,e =
1,if ein
i
1,if eout
i
0,otherwise.
(1)
Consider the scenario where all users travel to a universal destination D. The edge flows resulting from users’ path
choices satisfy the flow conservation constraints,
Ri:= Λi+X
ein
i
xeX
eout
i
xe
= Λi+X
e∂i
Bi,exe= 0,i6=D,(2)
and the non-negativity constraints
xe0,e. (3)
Due to the flow conservation constraint, any resource on a leaf node iwith only one outgoing edge (i.e., |out
i|=
1,|in
i|= 0) must be transmitted to its only neighboring node j. Similarly, if a leaf node iwith only one incoming
edge (i.e., |in
i|= 1,|out
i|= 0) is the destination node, then traffic must first arrive at its only neighboring node j,
and then go through the edge (j, i)to the destination. In the former case, one can remove the leaf node iand add Λi
resources to its neighboring node j. In the latter, one can simply set node jas the destination. By preprocessing the
network using the above reduction, we can reduce the network to have no leaf nodes.
Denoting e(xe)as the latency function on edge e, the Wardrop equilibrium can be obtained by minimizing the
following potential function
Φ(x) = X
eEZxe
0
e(y)dy=: X
eE
φe(xe),(4)
subject to the flow conservation (2) and non-negativity constraints (3). In the presence of tolls {τe}on edges, one
replaces the latency function e(xe)in Eq. (4) by τ
e(xe) = e(xe) + τe, assuming the same gauge between latency and
toll can be used for all users (more precisely τ
e(xe) = e(xe) + χτewith χbeing a certain coefficient converting money
to time which is set to one in some appropriate unit).
On the other hand, the social cost is defined as
H(x) = X
eE
xee(xe) =: X
eE
σe(xe),(5)
where the corresponding minimizer is the social optimum. Tolls are not assumed to contribute to the social cost H(x).
2
B. MP Equations for Smooth Message Functions
The MP equation for minimizing the potential Φ(x)reads
Φie(xe) = min
{xe0}|Ri=0 X
e∂i\eΦei(xe) + φe(xe),(6)
where the message function Φie(xe)is called the cavity energy in the jargon of statistical physics. Denoting e= (i, j)
and e= (k, i), we can write Φei(xe) = Φke(xe).
In this framework, one needs to keep track of the profile of the message functions Φie(xe), which is only practical
if they are restricted to a certain family of functions and are easy to optimize. One can approximate the message
function Φie(xe)by its series expansion around the working point ˜xie[1]
Φie(xe) = Φiexie+εe)
Φie(˜xie) + βie|˜xie·εe+1
2αie|˜xie·εe2,(7)
where βieand αieare the first and second derivatives of Φieevaluated at the working point ˜xie, assuming
the message function Φie(xe)is smooth in the vicinity of ˜xie. The MP equations have been derived in [1] for
undirected flow networks. Here, we extend it to directed graph with non-negativity flow constraints.
Similarly, the interaction term φe(xe)is also approximated as φe(xe)φe(˜xie)+φ
exke)εe+1
2φ′′
exke)εe2.
To solve the local optimization problem in Eq. (6) over the variables on edges {ke|ei\e}, we introduce the
Lagrangian
Lie=X
e∂i\e1
2αkeεe2+βkeεe+1
2φ′′
exke)εe2+φ
exke)εe
+µieRi+X
e∂i\e
λexke+εe),(8)
where µieand λeare the Lagrange multipliers for the flow conservation constraint Ri= 0 and flow non-negativity
constraint xe0, respectively. Solving the extremum equation ∂Lie
∂εe= 0 gives
ε
e(µie) = max 1
αke+φ′′
eµieBi,e+φ
e+βke,˜xke,(9)
and the corresponding optimal cavity flow is
x
ke(µie) = ˜xke+ε
e(µie) = max ˜xkeµieBi,e+φ
e+βke
αke+φ′′
e
,0.(10)
The Lagrange multiplier (or the dual variable) µieneeds to satisfy
Rie(µie;xe) := X
e∂i\e
Bi,ex
ke(µie) + Bi,exe+ Λi= 0.(11)
The function Rie(µ;xe)is a non-increasing piece-wise linear function of µ. To determine the value of µ
ieat the
optimum, we need to find the root of Rie(µ;xe), which can be done in finite steps by following the breakpoints of
the piece-wise linear function Rie(µ;xe). Upon obtaining the optimal dual variable µ
ie, the messages βieand
αieare calculated by
βie=Φ
ie(xe)
∂xe
=∂L
ie
∂xe
=Bi,eµ
ie,(12)
αie=2Φ
ie(xe)
∂x2
e
=Bi,e
∂µ
ie
∂xe
=Bi,exe
∂µ µ=µ
ie1
=
∂µ X
e∂i\e
Bi,ex
ke(µ)µ=µ
ie1
=X
e∂i\e
1
αke+φ′′
e
Θαke+φ′′
e˜xkeµ
ieBi,e+φ
e+βke1
,(13)
3
where Θ(·)is the Heaviside step function. The shadow price interpretation of Lagrangian multiplier has been used
in Eq. (12) and the inverse function theorem has been used in Eq. (13). In the implementation of the algorithm, we
take xe= ˜xiein solving Eq. (11).
1. Destination node D
There are two ways to treat the destination node D:
Method I: Since the destination node Dhas no constraint, it will absorb all incoming flows (like a grounded node
in an electric circuit). So it has no preference for network flows of the incident edges, such that ΦD→e(xe) = 0
and
αD→e= 0, βD→e= 0.(14)
• Method II: Alternatively, one can set an explicit constraint on the flows to the destination node D
RD:= ΛD+X
eD
BD,exe= 0,(15)
where ΛD=Pi6=DΛi. Then the messages from the destination node Dare calculated in the same way as
other nodes given by Eqs. (12) and (13).
Method I is used for the experiments of routing games in the main text.
2. Working Points
We also need a scheme to update the the working points {˜xie}at which the messages {αie, βie}are defined.
Here, we suggest to update the working point ˜xiesuch that it gets closer to the equilibrium flow x
e[1]
x
e=arg min
xe0Φie(xe) + Φje(xe) + φe(xe)
=arg min
xe01
2αie+1
2φ′′
exie)xe˜xie2+βie+1
2φ
exie)xe˜xie
+1
2αje+1
2φ′′
exje)xe˜xje2+βje+1
2φ
exje)xe˜xje
= max αie+1
2φ′′
ie˜xie+αje+1
2φ′′
je˜xje(βie+βje+1
2φ
ie+1
2φ
je)
αie+αje+1
2φ′′
ie+1
2φ′′
je
,0.(16)
Furthermore, a learning rate sis applied to update the working point
˜xnew
iesx
e+ (1 sxold
ie,(17)
such that ˜xiedoes not jump too drastically; otherwise the messages αieand βiewill approximate the curvature
and slope of the message function Φie(xe)less precisely.
C. Non-Smooth Message Functions
1. Qualitative Picture
The MP algorithms in Sec. I B work well if the smoothness assumption of the message function Φie(xe)holds.
However, it is not always the case in the routing game problem, where the non-smoothness is induced by the non-
negativity constraints of Eq. (3). Direct implementation of the MP algorithms in Sec. I B leads to oscillations of the
messages when the traffic patterns are sparse. In fact, similar non-convergence phenomena have been noticed in the
system with a non-smooth energy function [1].
To better understand this phenomenon, we examine Rie(µ;xe)as a function of the Lagrange multiplier µin
Eq. (11), of which the root µ(satisfying Rie(µ, xe) = 0) will determine βieand αiein Eqs. (12) and (13).
4
(a) (b)
Figure 1. (a) The net resource Rie(µ;x)defined in Eq. (11) is a non-increasing piecewise-linear function of the Lagrange
multiplier µ. Cases (i) and (ii) correspond to edges eincoming to node i(Bi,e= 1), while case (iii) corresponds to edge e
outgoing of node i(Bi,e=1). (b) The roots of Rie(µ;xe)in the vicinity of xe=xand xe=y. It is assumed that edge eis
an outgoing edge of node i(with Bi,e =1) such that finding the root of Rie(µ;xe)is equivalent to solving Rie(µ; 0) = xe.
For infinitesimal ǫ, if the flow xechanges from x+ǫto xǫ, the solution of the Lagrange multiplier changes continuously from
µ
x+ǫto µ
xǫ. On the other hand, there is a plateau at Rie(µ; 0) = y, so that when the flow xechanges from y+ǫto yǫ,
the solution of the Lagrange multiplier changes discontinuously from µ
y+ǫto µ
yǫ.
(a) (b)
Figure 2. (a) Smooth message function Φie(xe), corresponding to xe=xin Fig. 1(b). (b) Non-smooth message function
Φie(xe)with one breakpoint, corresponding to xe=yin Fig. 1(b), where the first and second derivatives of Φie(xe)are
discontinuous near xe=y.
The function Rie(µ;xe)is non-increasing piecewise-linear function as illustrated in Fig. 1(a). Assuming edge eis
an outgoing edge of node i(with Bi,e =1), finding the root of Rie(µ;xe)is equivalent to solving Rie(µ; 0) = xe.
Consider the configuration in Fig. 1(b), where the solution of Rie(µ; 0) = yoccurs at a plateau, such that the
solution (denoted as µ
y) is degenerate; when the flow xechanges infinitesimally from y+ǫto yǫ, the solution of
the Lagrange multiplier changes discontinuously from µ
y+ǫto µ
yǫ. In this case, the slope βieof the cavity energy
Φie(xe)changes discontinuously from xe=y+ǫto xe=yǫ, while the curvature αieis ill-defined at xe=y. The
profiles of the message function Φie(xe)in the smooth and non-smooth cases are illustrated in Fig. 2. If the normal
messages {βie, αie}are used when the message function Φie(xe)is non-smooth, the solution will be jumping
between the two branches, resulting in non-convergence behaviors of the MP algorithms as observed in Ref. [1].
2. Criteria for Non-smooth Message Function
As mentioned above, the message function Φie(xe)is non-smooth if the solution of µin Eq. (11) is degenerate. This
occurs if the optimal flow x
ke(µie)of all descendant edges e∂i\eare inactive, i.e., lying in the zero branch of the
function in Eq. (10); when Bi,exe+ Λi= 0, the flow conservation equation Rie(µ;xe) = Pe∂i\eBi,ex
ke(µ) = 0
has degenerate solutions. In this case, all the resources Λiare transmitted along edge e, while the flows on all other
edges ∂i\eadjacent to node iare idle. When Λi>0, edge ieis a leaf in the subgraph with edges holding non-zero
flows, therefore we call edge iea primary effective leaf in such cases. Since Λi0and xe0, only the out-going
edge iefrom node i(with Bi,e =1) can be a primary effective leaf.
5
Figure 3. Illustration of the effective leaf edges. Arrows with dashed lines correspond to edges with zero optimal cavity flow
x(µ)in the MP calculation (expression given in Eq. (10)) of one of its downstream edges. Edge ieis a primary effective
leaf (assuming Λi>0), as all the upstream optimal flows are zero. Edge jeis a general effective leaf, as its upstream edges
are either effective leaves or attain zero optimal cavity flows. Edge ke′′ is a non-effective leaf, as its upstream edge le′′′
is a non-effective leaf and it has a non-zero optimal flow x
le′′′ (µ
le′′′ )>0.
The leaf state can also propagate from primary effective leaves to downstream edges. We define edge ieto be
a general effective leaf if and only if ei\e, either (i) the optimal flow x
ke(µ
ie) = 0 in Eq. (10) or (ii) edge
keis a general effective leaf. A primary leaf is by default a general effective leaf. If an edge ieis an effective
leaf, we denote fie= 1, otherwise fie= 0. An example of effective leaf configurations is shown in Fig 3.
It can be proved by contradiction that only out-going edges iewith Bi,e =1can be general effective leaves under
the condition Λi0. The set of effective leaves in the upstream of edge ieis ELie={e|e∂i\e, fke= 1},
while the set of non-effective leaves is NE Lie={e|ei\e, fke= 0}.
Since there is at most one plateau in the function Rie(µ;x), the cavity message Φie(xe)has at most one
breakpoint. For an effective leaf edge ie, we always use the breakpoint of the message function (denoted as ˜xb
ie)
as the working point, such that Φie(xe)has the following expression
Φie(xe) = (1
2αL
ie(xe˜xb
ie)2+βL
ie(xe˜xb
ie) + Eiexb
ie)x < ˜xb
ie,
1
2αR
ie(xe˜xb
ie)2+βR
ie(xe˜xb
ie) + Eiexb
ie)x > ˜xb
ie.(18)
For a primary effective leaf edge ie, the breakpoint is ˜xb
ie= Λi. For a general effective leaf edge ie, the
breakpoint is most likely (but not always) located at the value of effective resource defined as
Λeff
ie:= Λi+X
eELie
Bi,e˜xb
ie.(19)
D. MP Equations for Non-smooth Message Functions
The MP equations for non-smooth message functions can be obtained with the information on effective leaf status
of upstream edges, where one replaces the quadratic expansion Φie(˜xie+εe)Φie(˜xie) + βieεe+1
2αie(εe)2
by the piecewise quadratic counterpart in Eq. (18) when edge ieis determined to be an effective leaf, and the
double-sided message parameters {αL
ie, βL
ie, αR
ie, βR
ie}are maintained and passed to its downstream edges.
For updating the messages, the computation of min{xe0}|Ri=0 Pe∂i\eΦke(xe) + φe(xe)can be tedious if
there are multiple effective leaf edges in {ke|e∂i\e}, where one needs to solve for a quadratic optimization
of every case, where one branch of each non-smooth message function is selected each time (there are 2|ELie|such
cases in total). To simplify this process, we propose to firstly fix the flow xeof effective leaves ELieto be their
breakpoints ˜xb
keand then optimize non-effective leaf edges N ELie
min
{εe|eNE Lie}X
eNE LieΦke(˜xke+εe) + φxke+εe),(20)
6
s. t. 0 = X
eNE Lie
Bi,e˜xke+εe+X
eELie
Bi,e˜xb
ke+Bi,exe+ Λi
=X
eNE Lie
Bi,e˜xke+εe+ Λeff
ie+Bi,exe,(21)
0˜xke+εe.(22)
We then perturb the optimal solution by perturbing some of the flows xeof upstream edges ei\eby an
infinitesimal amount dxas xe=x
ke+ηedxfor non-effective leaves and xe= ˜xb
ke+ηedxfor effective leaves
with ηe= 0,±1. For non-smooth message function, ηe=1and ηe= 1, corresponding to the left and the right
branch of Φke(xe), respectively. To obey the flow conservation constraint Ri= 0, the perturbation coefficient ηe
must satisfy Pe∂i\eBi,eηe= 0.
If the perturbation configuration {η
e}leading to the lowest energy of Pe∂i\eΦei(xe) + φe(xe)reduces the
outcome of Eq. (20), we need to consider adding the effective leafs kewith η
e6= 0 as active optimization variables
in addition to the non-effective leafs. Specifically, we define ηactive ={e|eELie, η
e6= 0}, and proceed to solve
min
{εe|eNE Lieηactive}X
eNE Lieηactive Φkexke+εe) + φ(˜xke+εe),(23)
s. t. 0 = X
eNE Lieηactive
Bi,e˜xke+εe
+X
eELie\ηactive
Bi,e˜xb
ke+Bi,exe+ Λi,(24)
0˜xke+εe,(25)
where we use Φke=1
2αL
keε2
e+βL
keεeif η
e=1and use Φke=1
2αR
keε2
e+βR
keεeif η
e= 1.
The primal and dual variables in the optimum satisfy
x
ke(µ) = ˜xke+ε
e(µ) = max ˜xkeµBi,e+φ
e+βke
αke+φ′′
e
,0,(26)
Rie(µ;xe) = X
eNE Lieηactive
Bi,ex
ke(µ)
+X
eELie\ηactive
Bi,e˜xb
ke+Bi,exe+ Λi= 0.(27)
If the solution µin Eq. (27) is non-degenerate, we have
βie=Bi,eµ,(28)
αie=X
eNE Lieηactive
1
αke+φ′′
e
×Θαke+φ′′
e˜xkeµBi,e+φ
e+βke1
.(29)
On the other hand, if the solution µis degenerate, we need to consider xe= ˜xiedxto solver for βL
ie, αL
ie, and
consider xe= ˜xie+ dxto solver for βR
ie, αR
ie.
It can also be shown that βR
ie> βL
ieand the non-smooth message function Φie(xe)is convex.
1. Update of the Working Points
If the message function Φie(xe)is non-smooth, we would like to bring the working point ˜xieto the vicinity of
the breakpoint of the two branches. To determine whether an edge ieis an effective leaf, we perform the following
7
procedure: We check the two following criteria: (i) each edge kein the upstream edge set i\esatisfies either
fke= 1 or ˜xke= 0; (ii) the difference between the current working point and the effective resource |˜xieΛeff
ie|is
smaller than some threshold (Λeff
ieis defined in Eq. (19)). If both criteria (i) and (ii) are met, then we use the effective
resource as the working point ˜xie= Λeff
ie, and perform the optimization min{xe0}|Ri=0 Pe i\eΦke(xe) +
φe(xe); if it results in degenerate solutions of the Lagrangian multiplier µfor the flow conservation constraint, then
edge ieis determined as an effective leaf and the double-sided messages {βL
ie, αL
ie, βR
ie, αR
ie}are computed.
Otherwise, edge ieis a non-effective leaf and the normal messages {βie, αie}are recorded.
If criteria (i) and (ii) are not met, we use the current value of the working point ˜xieto solve for the messages.
Similarly, if the optimization leads to degenerate solutions of µ, then edge ieis determined as an effective leaf.
Otherwise, edge ieis a non-effective leaf.
Similar to the case of smooth message functions in Sec. I B, the working point is updated as
˜xnew
iesx
e+ (1 sxold
ie,(30)
where x
e=arg minxe0Φie(xe) + Φje(xe) + φe(xe).
E. Results of MP Algorithm for Routing Game
In this section, we report results of the MP algorithms described above. Taking into account the possible non-smooth
structure of message functions Φie(xe), the MP algorithm converges well for various types of graphs and resource
distributions. We demonstrate the effectiveness of the algorithm in Figures 4 and 5, where random regular graphs
and small-world networks are considered. The small-world networks are obtained by rewiring square lattices with
randomly chosen shortcut edges [2]. In Fig. 4, the flows adjacent to the destination node Dare unconstrained (Method
I in Sec. I B 1). The MP algorithms converge to the correct equilibrium flows x, and the empirical complexity for
computing the equilibrium flows up to a certain error |xMP x|is roughly O(|E|2).
In Fig. 5, we use Method II in Sec.I B 1, i.e., we put an explicit constraint to the flows adjacent to the destination
node Das
RD= ΛD+X
eD
BD,exe= 0,(31)
where ΛD=Pi6=DΛi. In this approach, the MP algorithms converge much faster; the empirical complexity for
computing the equilibrium flows up to a certain error |xMP x|is roughly O(|E|). However, there exists some
networks where MP with Method II does not converge, while MP with Method I converges successfully. For the
experiments in the main text, we use Method I to treat the destination node for its better convergence properties.
F. Extension to The Case of Multiple Destination
The case of multiple destinations can be studied similarly. The traffic flows can be classified into different classes
according to their destinations. Let Nddenotes the number of destinations, and xa
edenote the flow on edge etargeted
at the a-th destination (or the a-th class), the lower-level optimization problem (for solving equilibrium flows) is
defined as
Φ(x) = X
eEZPaxa
e
0
e(y)dy=X
eE
φeNd
X
a=1
xa
e,(32)
s.t. Ra
i:= Λa
i+X
e∂i
Bi,exa
e= 0,i, a, (33)
xe0,e, a. (34)
To accommodate the nonlinear interactions of flows of different classes {xa
e}in φe(Paxa
e), we adopt a coordinate-
descent like approach in the MP algorithm as follows. In the treatment of the flows of class a, we fix the flows of
8
(a) (b)
(c) (d)
Figure 4. Convergence of the MP algorithm for routing games in networks to the equilibrium flows x. Random regular graphs
of degree 3are considered in (a)(c), while small-world networks obtained by rewiring square lattices with randomly chosen
shortcut edges (rewiring probability prw = 0.05) are considered in (b)(d), respectively. The flows adjacent to the destination
node Dare unconstrained (Method I in Sec. I B 1), reminiscent of a grounded node in electric circuits. In (a)(b), the MP process
for specific problem realizations is illustrated. Each sweep comprises 40|E|local MP updates. Panels (c)(d) are obtained by
averaging over 10 problem realizations, and rescaling the MP runtime by |E|2.
(a) (b)
Figure 5. Same setting as in Fig. 4, except that the flows adjacent to the destination node Dexplicitly obey the constraint
RD= ΛD+PeDBD,exe= 0, where ΛD=Pi6=DΛi(Method II in Sec. I B 1). Each sweep comprises of 40|E|local MP
updates.
other classes {xb
e}b6=ato their working points and compute the messages of class aas
Φa
ie(xa
e) = min
{xa
e0}|Ra
i=0 X
e∂i\eΦa
ke(xa
e) + φeX
b6=a
˜xb
ke+xa
e
= min
{xa
e0}|Ra
i=0 X
e∂i\eΦa
kexa
ke+εa
e) + φeX
b
˜xb
ke+εa
e.(35)
We further adopt the approximations Φa
kexa
ke+εa
e)Φa
ke(˜xa
ke) + βa
keεa
e+1
2αa
ke(εa
e)2(augmented by
a piecewise quadratic function if it is non-smooth) and φe(Pb˜xb
ke+εa
e)φe(Pb˜xb
ke) + φ
e(Pb˜xb
ke)εa
e+
9
(a) (b)
Figure 6. Message-passing algorithm for routing games with Nddestinations converges to the equilibrium flows x. (a) Method
I in Sec. I B 1) is used to treat the destination nodes. (b) Method II in Sec. I B 1) is used to treat the destination nodes. Each
sweep comprises of 40|E|local MP updates.
1
2φ′′
e(Pb˜xb
ke)(εa
e)2, and solve for the coefficients {αa
ie, βa
ie}as in the single-destination scenario. The resulting
MP algorithm has the same structure as the one of single class described above, where its efficacy is shown in Fig. 6.
G. Bilevel Optimization in Routing Games
The toll optimization problem (for single destination) of the upper-level planner is defined as
min
τ
H(x(τ)) = X
eE
x
e(τ)ex
e(τ),(36)
s. t. constraints of τ,(37)
x(τ) =arg min
xX
eEZxe
0e(y) + τedy, (38)
s.t. xe0, Ri= 0,e, i. (39)
The MP algorithm for computing the equilibrium flows x(τ)for a given τis the same as the one introduced in
Sec. I, by replacing φe(xe) = Rxe
0e(y)dyby φτ
e(xe) = Rxe
0[e(y) + τe]dy.
The computation of the social optimum H(x)has a similar form for computing the equilibrium flows, we therefore
use a parallel MP procedure for the social cost
Hie(xe) = min
{xe0}|RiX
e∂i\eHke(xe) + σ(xe),
= min
{xe0}|RiX
e∂i\e1
2γkeεe2+δkeεe+1
2σ′′
exke)εe2+σ
exke)εe,(40)
where Hke(xe)assumes a quadratic approximation and needs to be augmented by a piecewise quadratic function
if it is non-smooth. It results in an upper-level MP algorithm having the same structure as the one for computing
the equilibrium flows. The difference is that, since the flow is not directly driven by the central planner, the working
point ˜xieis not updated at the upper level MP. Instead, the central planner updates the toll τesuch that selfish
users are attracted to the solution with a lower social cost.
When the toll τeon edge eis adapted, the marginal Nash-equilibrium flow xN
echanges accordingly
xN
e(τe) = arg min
xe0Φie(xe) + Φje(xe) + φe(xe) + τexe.(41)
For smooth message functions Φie(xe)and Φje(xe)
xN
e(τe) = max αie+1
2φ′′
ie˜xie+αje+1
2φ′′
je˜xje(βie+βje+τe+1
2φ
ie+1
2φ
je)
αie+αje+1
2φ′′
ie+1
2φ′′
je
,0,(42)
10
(a) (b)
Figure 7. The effect of tolls on the reduction in fractional social cost in routing games on random regular graphs with Nd
destinations, where tolls are restricted as 0τe1. Each sweep comprises 40Nd|E|local MP updates and 100 edgewise toll
updates in a random sequential schedule. (a) N= 100. (b) N= 200.
which is a piecewise linear function of τewith two branches. For non-smooth cavity functions, xN
e(τe)can also be
obtained straightforwardly, which is a piecewise linear function of τewith multiple branches.
The goal of toll-adaptation of τeis to decrease the social cost H(x), which amounts to decrease the full social cost
on edge e
τ
e=arg min
τe
Hfull
e(xN
e(τe)),(43)
Hfull
e(xe) := Hie(xe) + Hje(xe) + σe(xe),(44)
where the optimization in Eq. (43) needs to obey necessary constraints on tolls (e.g., the restriction 0τeτmax
eis
considered in the main text).
As Hfull
e(xN
e(τe)) is a convex function of xN
e, it is sufficient to adapt τesuch that xN
e(τe)gets as close to the marginal
socially optimal flow xG
eas possible, where xG
eis given by
xG
e=arg min
xe0Hie(xe) + Hje(xe) + σe(xe).(45)
The search for the optimal toll τ
ecan be done efficiently by utilizing the property that xN
e(τe)is a piecewise linear
function of τe.
The resulting bilevel MP algorithm is described in the main text, where the lower-level messages {α(m)
ie, β(m)
ie,˜xie}
(m∈ {L, R}) and upper-level messages {γ(m)
ie, δ(m)
ie}are passed along edges to compute the equilibrium flows xN
eand
related quantities. These messages facilitate the computation of Hfull
e(xe)in the upper level, which is used to update
the toll variables τe. In practice, the update of tolls is less frequent then the update of other messages. In the
experiments shown in the main text, for every 2
5Nd|E|MP iterations, we randomly select an edge eand update its
toll.
1. Extension to Multiple Destinations
The toll optimization problems with multiple destinations can be tackled by the proposed bilevel MP algorithm
using the approximations in Sec. I F. The results are shown in Fig. 7, which demonstrate the effectiveness of the
algorithm in reducing the social cost by adapting tolls.
H. The Bilevel Programming Approach
Here we demonstrate the results of the bilevel programming approach to the toll optimization problem. It is achieved
by expressing the solution of Eq. (38) as constraints imposed by the Karush–Kuhn–Tucker (KKT) condition, and solve
the bilevel optimization as a global nonlinear programming problem [3]. Such an approach is intrinsically difficult
as (i) the constraints by the KKT conditions can be nonlinear and non-convex; (ii) the complementary slackness
conditions are combinatorial, which requires a treatment with mixed integer programming (e.g., through branch and
bound).
11
Figure 8. Run time of bilevel programming on the toll optimization problem in random regular graphs. For each network size,
20 different problem realizations are considered; red triangles represent the cases where bilevel programming fails to find the
solution in single trials, green dots represent successful trials.
It is difficult to directly compare the MP algorithms to the bilevel programming approach, as the convergence rate
for either approach is difficult to establish. Besides, the bilevel programming approach is a centralized optimization
method, which has a different space complexity per iteration. Nevertheless, we present the results of CPU run time
(CPU in used: i5-3317U) of bilevel programming (package in used: bileveljump.jl [4]) on the toll optimization problem
in Fig. 8. There exist cases where bilevel programming fails to find the solution in a single trial, and the run times
vary significantly among different problem realizations.
The bilevel programming approach is more generic and flexible than the MP approach, but it does not offer a
decentralized algorithm as the MP approach. Besides, the MP algorithms can be extended to the scenarios with
discrete variables, which is very difficult for the global optimization approach. It is also difficult to treat the toll
selection problem in the main text with the bilevel programming method, especially when the socially optimum is not
known a priori in some variants of toll-setting or network-design problems [5].
II. MESSAGE-PASSING ALGORITHMS FOR FLOW CONTROL IN UNDIRECTED NETWORKS
In this section, we provide the details of the MP algorithm for flow control in undirected networks. In a simple
undirected graph G(V, E), nodes iand jcan be connected by at most one edge (i, j), where the order of node iand j
does not matter (in contrast, edges (i, j )and (j, i)are two different edges in a directed graph). In this case, edge (i, j)
can either transmit resources from node jto node ior from node ito node j. Denoting xij (=xji ) as the flow from
node jto node i; if xij <0(or xji =xij >0), the resources are being transmitted from node ito node j. We also
assume that the underlying graph does not have any leaf nodes, by recursively trimming leaf nodes and absorbing
their resources into neighboring nodes.
A. MP For Lower-level Optimization
The equilibrium flow (in the lower-level optimization problem) is the minimizer of the problem
min
x
C(x) = X
(i,j)
1
2rij x2
ij ,(46)
s.t. Rii+X
j∈Ni
xij = 0,i6=D,(47)
where the reference node Dcan be arbitrarily chosen.
The above optimization problem can be mapped onto its dual problem as
min
µ
Cdual(µ) = X
(i,j)
1
2rij
(µjµi)2X
i
Λiµi,(48)
=: 1
2µLµΛµ,(49)
12
where µiis the Lagrange multiplier (or dual variable) associated with the flow conservation constraint Ri= 0, and L
is the Laplacian matrix with matrix element
Lij := X
k∈Ni
1
rik !δij 1
rij
.(50)
The solution of the dual problem can be obtained by solving the system of linear equations Lµ=Λ, and the
equilibrium flow x
ij is related to the optimal Lagrange multiplier µthrough x
ij =µ
jµ
i
rij . The drawback of such
an approach is that (i) solving the systems of linear equations usually needs a centralized solver; (ii) to compute
the response of the equilibrium flow to changes of the control parameters {rij}in bilevel optimization, one needs to
evaluate the pseudo-inverse of the Laplacian matrix per iteration, which can be very computationally demanding for
large networks. Instead, we proposed to use MP for computing the equilibrium flows and tackle the related bilevel
optimization problem, which is a scalable and efficient decentralized algorithm.
For the lower-level equilibrium flow problem in Eq. (46), the MP algorithm amounts to computing the message
functions
Cij(xij ) = min
{xki}|Ri=0 1
2rij x2
ij +X
k∈Ni\j
Cki(xki),(51)
where the definition of the message function Cij(xij )differs from the one in Eq. (6) in that it includes the interaction
term on edge (i, j ), which yields a more concise update rule in this problem. Similar to routing games, we approximate
the message function by a quadratic form Cij(xij ) = 1
2αij(xij ˆxij)2+const, such that the local optimization
in Eq. (51) reduces to the computation of the real-number messages mij∈ {αij,ˆxij}by passing the upstream
messages {mki}k∈Ni\j. Here the message function Cij(xij )is always smooth. The messages αij,ˆxijare
computed as [1, 6, 7]
αij=1
Pk∈Ni\jα1
ki
+rij ,(52)
ˆxij=Λi+Pk∈Ni\jˆxki
1 + rij Pk∈Ni\jα1
ki
.(53)
1. The Reference Node D
Similar to the MP algorithm in routing games, there are two methods to deal with possible boundary conditions of
the reference node D:
• Method I: Since node Dhas no constraints on its adjacent flows, it will absorb all incoming flows resulting in
CD→j(xDj) = 0 and
αD→j= 0,ˆxD→j= 0.(54)
In this treatment, the Lagrange multiplier of node Dcan be set to µD= 0 in the dual problem (as node Dis
unconstrained), which corresponds to a grounded node in the electric network interpretation of the problem.
Method II: Alternatively, one can set an explicit constraint on the flows {xDj}to the reference node D
RD:= ΛD+X
j∈ND
xDj= 0,(55)
where ΛD=Pi6=DΛi. Then the messages from the reference node Dare calculated in the same way as for
other nodes.
Similar to the routing games, Method II results in an MP algorithm with a faster convergence rate, but it may fail to
converge for some graphs while Method I can still provide valid solutions.
Method II is used in the experiments for undirected flow networks in the main text.
13
(a) (b)
(c) (d)
Figure 9. MP algorithm in undirected flow networks converges to the equilibrium flows x. Random regular graphs (RRG) of
degree 3are considered in (a)(b), while square lattices (size L×L) are considered in (c)(d). Metho d I in Sec. II A 1 is used in
(a)(c) to treat the reference node D, while Method II is used in (b)(d). In (b), each sweep comprises 40|E|local MP updates.
2. Computation of the Equilibrium Flows from Messages
Upon convergence of the messages, the equilibrium flow x
ij can be obtained by minimizing the edgewise full cost
Cfull
ij (xij ) = Cij(xij ) + Cji(xij )1
2rij x2
ij , giving rise to
x
ij =αjiˆxjiαijˆxij
αij+αjirij
.(56)
3. Results of the MP Algorithm on Undirected Flow Networks
In Fig. 9, we demonstrate the performance of the MP algorithms in undirected flow networks. The MP algorithms
converge in different networks, including square lattices with many short loops. The iterations needed to obtained
a given precision seems to depend on the topologies of the networks, e.g., square lattices appear to converge slower
than random regular graphs. The method used to treat the boundary reference node Dalso impacts on the number
of iterations needed, where it is observed that in general Method II makes MP converge faster than Method I. We
conjecture that the influence of single-node boundary conditions in Eq. (54) takes more iteration steps to diffuse
messages to the bulk of the network.
B. MP For Bilevel Optimization
The bilevel optimization problem on undirected networks aims to tune the flows of targeted edges Tsuch that they
exceed or drop below certain limits, depending on the application. We consider the former case, where the goal is to
control the flows on Tsuch that ρij (xij ) = |xij|−|x0
ij |
|x0
ij|θ0,(i, j )∈ T (with x0
ij being the flow before tuning). The
task in the upper-level is to minimize the ob jective
min
r
O(x(r)) = X
(i,j)∈T
ρij(x
ij (r))Θρij (x
ij (r)).(57)
14
As mentioned in the main text, the impact of the variation of the control parameters rij on the upper-level objective
Ois mediated through the messages mij∈ {αij,ˆxij}along the pathways from the targeted edges to edge (i, j).
For a targeted edge (p, q ) T , the boundary conditions of the gradient is given by O
∂mpq=O
∂x
pq
∂x
pq
∂mpq, where
components admit the following expressions
O
∂x
pq
=Θρpq(x
pq )sgn(x
pq)
|x0
pq|,(58)
∂x
pq
∂αpq
=ˆxpq
αpq+αqprpq
x
pq
αpq+αqprpq
,(59)
∂x
pq
ˆxpq
=αpq
αpq+αqprpq
.(60)
The gradient with respect to the control parameter rpq on the targeted edge is computed as
O
∂rpq
=O
∂x
pq
∂x
pq
∂rpq
+X
m∈{α,ˆx}O
∂mpq
∂mpq
∂rpq
+O
∂mqp
∂mqp
∂rpq . (61)
For a non-targeted edge (k, i)/∈ T , we need to first evaluate O
∂mki, which can be obtained by summing the gradients
on its downstream edges {il|l∈ Ni\k}, computed as
O
∂mki
=X
l∈Ni\kX
mil∈{αil,ˆxil}
O
∂mil
∂mil
∂mki
,(62)
where the gradient propagation of messages mil
∂mkiadmits the following forms
∂αil
∂αki
=α2
ki
Pn∈Ni\lα1
ni2,(63)
∂αil
ˆxki
= 0,(64)
ˆxil
∂αki
=ˆxilrilα2
ki
1 + ril Pn∈Ni\lα1
ni
,(65)
ˆxil
ˆxki
=1
1 + ril Pn∈Ni\lα1
ni
.(66)
The resulting gradient w.r.t. to the control parameter rki (note that edge (k, i)/∈ T ) can be computed as
O
∂rki
=X
m∈{α,ˆx}O
∂mki
∂mki
∂rki
+O
∂mik
∂mik
∂rki .(67)
In Eqs. (61) and (67), the terms ∂mki
∂rki are computed as
∂αki
∂rki
= 1,(68)
ˆxki
∂rki
= ˆxki
Pn∈Nk\iα1
nk
1 + rki Pn∈Nk\iα1
nk
,(69)
which closes the equations for the gradient computations.
C. The Global Optimization Approach
In this section, we provide details of the global optimization approach on undirected flow networks used in the main
text. As mentioned before, e.g., in Sec. II A, we have x
ij =µ
jµ
i
rij , where µ=L(r)Λand L(r)is the pseudo-inverse
15
of the Laplacian matrix L(r)defined in Eq. (50). To compute the gradient O
∂rij , we need to evaluate the response of
µto the variation of the control parameters r, i.e.,
O
∂rij
=X
(p,q)∈T
O
∂x
pq 1
rpq ∂µ
q
∂rij
∂µ
p
∂rij 1
r2
pq
(µ
qµ
p)δ(i,j),(p,q).(70)
Furthermore, it requires to compute ∂L(r)
∂rij for µ
∂rij . Assuming the underlying graph will not fragment into multiple
components when adapting r,L(r)has a constant rank, then we have [8]
∂L(r)
∂rij
=L∂L
∂rij
L+L(L)∂L
∂rij (ILL)
+ (ILL)∂L
∂rij (L)L.(71)
Using the property of the pseudo-inverse of the Laplacian LL=LL=I1
N11and ∂L
∂rij ·1= 0 (with 1as the
all-one vector), we have
∂L(r)
∂rij
=L∂L
∂rij
L,(72)
which closes the equations for calculating the gradients.
[1] K. Y. M. Wong and D. Saad, Inference and optimization of real edges on sparse graphs: A statistical physics perspective,
Phys. Rev. E 76, 011115 (2007).
[2] B. Li, D. Saad, and A. Y. Lokhov, Reducing urban traffic congestion due to localized routing decisions, Phys. Rev. Research
2, 032059 (2020).
[3] B. Colson, P. Marcotte, and G. Savard, An overview of bilevel optimization, Annals of Operations Research 153, 235 (2007).
[4] J. D. Garcia, G. Bodin, davide f, I. Fiske, M. Besançon, and N. Laws, joaquimg/bileveljump.jl: v0.4.1,
https://doi.org/10.5281/zenodo.4556393 10.5281/zenodo.4556393 (2021).
[5] A. Migdalas, Bilevel programming in traffic planning: Models, methods and challenge, Journal of Global Optimization 7,
381 (1995).
[6] C. H. Yeung and K. Y. M. Wong, Optimal location of sources in transportation networks, Journal of Statistical Mechanics:
Theory and Experiment 2010, P04017 (2010).
[7] P. Rebeschini and S. Tatikonda, A new approach to laplacian solvers and flow problems, Journal of Machine Learning
Research 20, 1 (2019).
[8] G. H. Golub and V. Pereyra, The differentiation of pseudo-inverses and nonlinear least squares problems whose variables
separate, SIAM Journal on Numerical Analysis 10, 413 (1973).
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Materials and machines are often designed with particular goals in mind, so that they exhibit desired responses to given forces or constraints. Here we explore an alternative approach, namely physical coupled learning. In this paradigm, the system is not initially designed to accomplish a task, but physically adapts to applied forces to develop the ability to perform the task. Crucially, we require coupled learning to be facilitated by physically plausible learning rules, meaning that learning requires only local responses and no explicit information about the desired functionality. We show that such local learning rules can be derived for any physical network, whether in equilibrium or in steady state, with specific focus on two particular systems, namely disordered flow networks and elastic networks. By applying and adapting advances of statistical learning theory to the physical world, we demonstrate the plausibility of new classes of smart metamaterials capable of adapting to users’ needs in situ.
Article
Full-text available
Balancing traffic flow by influencing drivers' route choices to alleviate congestion is becoming increasingly more appealing in urban traffic planning. Here, we introduce a discrete dynamical model comprising users who make their own routing choices on the basis of local information and those who consider routing advice based on localized inducement. We identify the formation of traffic patterns, develop a scalable optimization method for identifying control values used for user guidance, and test the effectiveness of these measures on synthetic and real-world road networks.
Article
Full-text available
Significance Functionally optimized networks are ubiquitous in nature, e.g., in allosteric proteins that change conformation upon binding to a ligand or vascular networks that distribute oxygen and nutrients in animals or plants. Many of these networks are multifunctional, with proteins that can catalyze more than one substrate or vascular networks that can deliver enhanced flow to more than one localized region of the network. This work investigates the question of how many simultaneous functions a given network can be designed to fulfill, uncovering a phase transition that is related to other constraint–satisfaction transitions such as the jamming transition.
Article
Full-text available
Bilevel optimization is defined as a mathematical program, where an optimization problem contains another optimization problem as a constraint. These problems have received significant attention from the mathematical programming community. Only limited work exists on bilevel problems using evolutionary computation techniques; however, recently there has been an increasing interest due to the proliferation of practical applications and the potential of evolutionary algorithms in tackling these problems. This paper provides a comprehensive review on bilevel optimization from the basic principles to solution strategies; both classical and evolutionary. A number of potential application problems are also discussed. To offer the readers insights on the prominent developments in the field of bilevel optimization, we have performed an automated text-analysis of an extended list of papers published on bilevel optimization to date. This paper should motivate evolutionary computation researchers to pay more attention to this practical yet challenging area.