Conference PaperPDF Available

City Metro Network Expansion with Reinforcement Learning

Authors:
City Metro Network Expansion with Reinforcement Learning
Yu Wei
School of Electronic and Information
Engineering, Xi’an Jiaotong
University
weiyu123112@163.com
Minjia Mao
School of Mathematics and Statistics,
Xi’an Jiaotong University
maominjia@foxmail.com
Xi Zhao∗†
School of Management, Xi’an
Jiaotong University
Zhaoxi1@mail.xjtu.edu.cn
Jianhua Zou
School of Electronic and Information
Engineering, Xi’an Jiaotong
University
jhzou@sei.xjtu.edu.cn
Ping An
School of Management, Xi’an
Jiaotong University
sdzx119@163.com
ABSTRACT
City metro network expansion, included in the transportation net-
work design, aims to design new lines based on the existing metro
network. Existing methods in the eld of transportation network
design either (i) can hardly formulate this problem eciently, (ii)
depend on expert guidance to produce solutions, or (iii) appeal
to problem-specic heuristics which are dicult to design. To ad-
dress these limitations, we propose a reinforcement learning based
method for the city metro network expansion problem. In this
method, we formulate the metro line expansion as a Markov deci-
sion process (MDP), which characterizes the problem as a process
of sequential station selection. Then, we train an actor-critic model
to design the next metro line on the basis of the existing metro
network. The actor is an encoder-decoder network with an atten-
tion mechanism to generate the parameterized policy which is used
to select the stations. The critic estimates the expected cumula-
tive reward to assist the training of the actor by reducing training
variance. The proposed method does not require expert guidance
during design, since the learning procedure only relies on the re-
ward calculation to tune the policy for better station selection. Also,
it avoids the diculty of heuristics designing by the policy for-
malizing the station selection. Considering origin-destination (OD)
trips and social equity, we expand the current metro network in
Xi’an, China, based on the real mobility information of 24,770,715
mobile phone users in the whole city. The results demonstrate the
advantages of our method compared with existing approaches.
Xi Zhao is the rst corresponding author of this paper.
Also with the Key Lab of the Ministry of Education for Process Control & Eciency
Engineering, Xi’an, 710049, China.
Jianhua Zou is the second corresponding author of this paper.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
KDD ’20, August 23–27, 2020, Virtual Event, CA, USA
©2020 Association for Computing Machinery.
ACM ISBN 978-1-4503-7998-4/20/08. . . $15.00
https://doi.org/10.1145/3394486.3403315
CCS CONCEPTS
Computing methodologies Planning and scheduling
;
Reinforcement Learning;
Transportation network design
Metro network expansion.
KEYWORDS
Metro network expansion; Reinforcement learning; Actor-critic
model; Social equity
ACM Reference Format:
Yu Wei, Minjia Mao, Xi Zhao, Jianhua Zou, and Ping An. 2020. City Metro
Network Expansion with Reinforcement Learning. In Proceedings of the 26th
ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD
’20), August 23–27, 2020, Virtual Event, CA, USA. ACM, New York, NY, USA,
11 pages. https://doi.org/10.1145/3394486.3403315
1 INTRODUCTION
The city metro network plays an important role in the public trans-
portation system. As the city has developed, new transportation
demands have led to the expansion of the metro network. The last
few years have witnessed a tremendous expansion of the metro
network [
27
]. Meanwhile, the expansion of the metro network in
turn has a profound impact on the city. Expanded lines may change
the mobility trend of city population. Metro network expansion
and city dynamics are mutually inuenced, therefore it is more
reasonable to expand the metro network by gradually designing
new lines according to the latest city dynamic. In this study, we
design the next metro line, consisting of stations and line routing, to
expand the existing metro network. This process can be conducted
repeatedly to achieve a multi-line metro network expansion [
4
,
20
].
Metro network expansion is included in the transportation net-
work design. Usually, the objective of transportation network de-
sign is mobility-based, such as maximizing satised OD trips [
21
].
As society progresses, sustainability has increasingly become the
demand of city development. The need for sustainability prompts
governments to realize other impacts of the transportation system,
and thereby inuences their transport policy [
24
]. Among these
impacts, the importance of social equity [
2
] has been acknowl-
edged, and there have been several realistic transportation plans
considering social equity [
1
]. The metro network, an important
city transportation system, has a great inuence on social equity.
Applied Data Science Track Paper
KDD '20, August 23–27, 2020, Virtual Event, USA
2646
Therefore, in this work, we consider both OD trips and social equity
to expand the city metro network.
There are various studies dealing with transportation network
design problems [
8
,
15
], and they can be mainly divided into two
categories. One category is based on mathematical programming.
Several studies [
11
,
12
,
22
,
30
] formulate transportation network
design problems as non-linear integer programming models, and
adopt solvers to obtain solutions. However, their formulations call
for an exponential number of shape constraints to ensure the ra-
tionality of the transportation network, which hinders solving the
problem eciently [
5
]. Besides the formulations, their solution
technologies are intractable for a realistic sized problem. To make
large-scale problems solvable, they predene corridors based on
expert knowledge to limit the search space, and only consider de-
signing the transportation network in these corridors. Their results
depend on expert guidance, and the best solution may be left out.
In summary, these studies can hardly formulate the transportation
network design problem appropriately [
5
,
19
], and their solutions
heavily depend on expert guidance, which lacks reliability.
Another category alternative to mathematical programming is
based on heuristics. Owais et al. [
26
] utilize Genetic Algorithm
to generate a bus route network from an existing transportation
network. Dufourd et al. [
6
] design a tabu search heuristic for lo-
cating a rapid transit line. Yang et al. [
32
] propose an ant colony
algorithm to design a bus network, with a specication of origin
and destination by experts. For dierent transportation network
design problems, these studies design problem-specic heuristics
to obtain the solutions. However, the problem-specic heuristics
can be dicult to design, especially in cases like a metro network
with rigorous shape constraints. The commonly used operators in
Genetic Algorithm [
26
] and Tabu Search [
6
], which focus on the
connectivity of the route, are likely to lead to an infeasible metro
line, which makes these methods inecient for designing a metro
network, and provides no guarantee as to the quality of solutions.
Therefore, designing methods without heuristics is an urgent need.
Considering all of the above, existing methods in transporta-
tion network design can hardly be applied to the metro network
expansion problem. Instead, we need an ecient formulation and
a generic method to solve the metro network expansion problem
without expert guidance.
In this paper, we propose a RL based method to solve the city
metro network expansion problem. We consider the metro line
expansion as a process of sequential station selection, and then nat-
urally formulate this process as a Markov decision process (MDP).
To ensure reasonable connection patterns between stations of a
metro line, we also design feasibility rules based on the selected
station sequence for the next station selection. This formulation
eciently characterizes the expansion of the metro line, without
heavy constraints like existing studies [30].
Following this formulation, we propose an actor-critic model
[
17
] to generate the next metro line. The actor adopts an encoder-
decoder network with an attention mechanism to represent the
parameterized policy, which maps the current metro network state
to a probability distribution for station selection. Specically, in
the actor, an encoder characterizes the timely metro station infor-
mation in the expansion process, an RNN decoder characterizes
the sequence information of the selected stations, and an atten-
tion layer [
29
] integrates these two sets of information to produce
a probability distribution over feasible candidate stations. In the
critic, a neural network is used to estimate the expected cumulative
reward. Only requiring the reward calculation, we employ a policy
gradient algorithm [
10
] to train our network to nd a high-priority
metro line. Without expert guidance, the learning procedure drives
the policy to keep track of superior solutions during the search and
to nd better solutions. Its natural exploration mechanism deter-
mines that our method is suitable for large scale solution space.
The parameterized policy formalizes the station selection, avoiding
the diculties of heuristics design.
Using the real city-scale human mobility information of
24,770,715 mobile phone users obtained from a citywide 3G cel-
lular network, we expand the existing metro network in Xi’an,
China. The results demonstrate the eectiveness of our method.
Our contributions are as follows:
We incorporate social equity concerns with mobility de-
mands into metro network expansion. By proposing a
weighted sum reward construction, our RL method can take
multi-factors into consideration.
We formulate the metro line expansion problem as a Markov
decision process, and design feasibility rules based on the
selected station sequence to ensure the reasonable connec-
tion patterns of the metro line, which is a more ecient
formulation method than integer programming models.
We are the rst to propose a RL based method to solve the
city metro network expansion problem. With the exploration
mechanism of RL, our method can generate solutions with-
out expert guidance.
We use real city-scale human mobility information to expand
a metro network. The experimental results demonstrate the
eectiveness of our method.
2 RELATED WORK
2.1 Transportation Network Design
Several studies [
8
,
15
] have reviewed the transportation network
design literature, and these existing methods mainly fall into two
categories, mathematical programming methods and heuristics
methods. Mathematical programming methods formulate this prob-
lem as nonlinear integer programming models, and usually obtain
the solutions using a solver. Gutiérrez-Jarpa et al. [
11
] rst select
a set of corridors with higher passenger trac by using greedy
generation heuristics, and then consider designing metro lines in
these predened corridors. Wei et al. [
30
] predene corridors and
introduce a bi-objective model to expand the metro network in
Wuxi, China. However, their huge constraints, which ensure the
rationality of the transportation network, lead to an ineective solu-
tion method in a large-scale space, unless based on expert guidance
to predene the corridors.
As for the second category, search-based heuristic methods, such
as Simulated Annealing [
7
] and Tabu Search [
6
], rst generate the
initial solutions, and then modify the initial solutions with the help
of heuristics to get better solutions. Genetic Algorithm [
26
] ran-
domly generates initial routes, and then design operators to evolve
routes for better solutions. However, the commonly used heuristic
Applied Data Science Track Paper
KDD '20, August 23–27, 2020, Virtual Event, USA
2647
operators in transportation networks may lead to infeasible solu-
tions for the metro expansion problem. It is hard to come out with
appropriate heuristics for problems with heavy constraints like
metro expansion.
2.2 Reinforcement Learning
The strength of RL lies in its powerful decision-making ability. RL
has made great progress in complicated tasks like playing Atari
games [
25
], recommender systems [
35
], combinatorial optimiza-
tion [
3
] and so on. Existing RL methods can be divided into three
categories [
10
]: actor-only, critic-only, and actor-critic methods.
With regard to metro network expansion, we nd that it can be
formulated as a sequential decision-making process. Then, through
technology combing, we nally employ an actor-critic method to
expand the metro network.
3 PROBLEM DEFINITION
In this paper, we design the next metro line to expand the current
metro network in a target city. The metro line is determined by
stations and arcs connecting the stations, and is allowed to intersect
with existing lines to form transfer stations. We dene the metro
expansion problem as follows.
For a target city, we divide it into
n×n
grids in a two-dimensional
space
{дi}n21
i=0
. Each grid
дi
is a square with a width of
d0
, and
its center is a candidate station
i
. We dene the expansion of
the metro network on an undirected graph
G=(N,E)
, where
N=0,1, .. ., n21
contains all candidate stations and
E=
{(i,j):i,jN}
contains all edges which directly connect the sta-
tions
i
and
j
. Among the
G
, several nodes and the edges connecting
these nodes form the existing metro network, and these nodes
are the candidate transfer stations connecting existing lines and
the newly-built line. Each grid
дi
is associated with a compound
index of development
Di
, and any two candidate stations
i
and
j
are associated with a travel ow capture, which contains the total
OD trips starting at one of the two stations
i
and
j
and ending at
another. We denote the travel ow capture between station
i
and
j
as odi,j(=odj,i).
We present the expanded metro line as an ordered station se-
quence
Z=(z1,z2, . . . , zT),ziN
, where the adjacent stations are
directly connected. In practice, the expanded line
Z
should satisfy
the following constraints:
The consecutive stations must follow the minimum-
maximum distance rules [22]. That is to say, the separation
between two consecutive stations
d(zi,zi+1)
must satisfy
dmi n d(zi,zi+1) dma x ,i {
1
,
2
, . . . , T
1
}
, where
dmi n
and
dmax
are the minimum and maximum separation
between any two consecutive stations.
The line shape should ensure reasonable connection patterns
between stations, avoiding sub-tour and squiggly lines [
30
].
The number of the stations Tis limited by N.
The budget for construction is limited by B0.
Based on the above, the newly satised OD trips by the expanded
line Zare dened as follows:
Rod (Z)=Õ
iÕ
j
odi,j+Õ
iÕ
k
xi,k×odi,k,(1)
where
i,j(i<j)
are the stations on the expanded line
Z
,
k(k,i)
is the station on existing lines and not on
Z
, and
xi,k
is set to 1
if there is a path connecting the two stations
i
and
k
along the
metro network; otherwise it is set to 0. In Equation (1), the rst
term presents the direct OD trips achieved by
Z
, and the second
term presents the OD trips between
Z
and existing lines through
transfer stations.
As to the social equity, we consider the distributable benets for
the grids traversed by the expanded line
Z
. According to Appendix
C, the social equity indicator of Zis calculated as:
Rac (Z)=Õ
i
Acдi,(2)
where
i
are the stations on the expanded line
Z
, and
Acдi
referring
to Equation (15) is the accessibility index of grid дi.
In our study, the objective is to design the next metro line
Z
based on the current metro network to maximize the satised trans-
portation demands which are dened as the weighted sum of newly
satised OD trips and social equity
ω(Z|G)=α1×Rod (Z)+α2×Rac (Z),(3)
where
G
is the underlying network that denes the problem,
α1
and
α2
are the weights of added OD trips
Rod (Z)
and social equity
Rac (Z), and α1+α2=1.
4 METHOD
According to the denition in Section 3, along a metro line, a pre-
ceding station determines the area to locate its succeeding station,
and the preceding section determines the layout of the succeed-
ing section. The subsequent station locating is inuenced by the
previous stations, so that the generation of a metro line can be
viewed as a process of sequential decisions about where to locate
the stations. In our study, we consider metro line expansion as a
sequential station selection process, and leverage RL to optimize
the sequential decisions to obtain the next metro line.
4.1 RL Formulation
In this section, we formulate the metro line expansion as a MDP.
Taking an ordered station sequence
Z=(z1,z2, . . . , zT),ziN
generated during an episode as an example, the elements of this
MDP are as follows:
State space
S
. A state
st S
is dened to characterize the
selected station sequence
Zt1=(z1,z2, . . . , zt1)
before
step t, where t={1, ..,T,T+1}and Z0is an empty set.
Action space
A
. The action
at A
is dened as the station
ztNselected at step t.
A deterministic state transition function
p(st+1|st,at)
. When
the agent selects an action
at
at state
st
, the transition func-
tion determines the next state st+1:Zt=(z1,z2, . . . , zt).
Reward function
r(st,at)
. We expect the expanded metro line
Z
to achieve more transportation demands, thus we restrict
our attention to the objective value
ω(Z|G)
of a complete
sequence (refer to Equation (3) for more details). In our study,
the reward function is set to 0 if
st(t<T+
1
)
is not a terminal
state; otherwise it is set to
ω(Z|G)
at the terminal step
T+
1.
Within the MDP, a parameterized policy
πθ
:
S P (A)
, which
maps states to a probability distribution over the actions, is used
Applied Data Science Track Paper
KDD '20, August 23–27, 2020, Virtual Event, USA
2648
Figure 1: Framework. The bule (orange) line in the input is the existing (expanded) metro line.
to select actions. Here
P(A)
is the set of probability of selecting
each action and θis the adjustable parameters.
In addition to the policy
πθ
, in order to ensure the metro line
achieves the constraints in Section 3, we also design feasibility rule
F(st,G)
, which is based on the agent selecting station sequence and
the existing metro network, to determine the optional actions in
step
t
(see Appendix A for more details about agent interaction
with the environment). Thus, based on the policy
πθ(at|st)
and
feasibility rules
F(st,G)
, the agent interacts with the environment
to select the action as follows:
p(at|st,G)=πθ(at|st)F(st,G),(4)
where
represents the generalized operator of two functions (see
Section 4.3 for more details).
Now, for a metro line
Z
, its generating probability, according to
the probability chain rule, is
p(Z|G)=
T
Ö
t=1
p(zt|Zt1,G),(5)
where
p(zt|Zt1,G)
(calculated as Equation (4)) presents the prob-
ability of selecting the next station
zt
based on selected station
sequence
Zt1
at step
t
. This metro line
Z
is associated with a
cumulative return
ω(Z|G)
, based on the denition of our reward
function r(st,at).
Our goal is to nd a policy
πθ
to maximize the expected cumu-
lative reward which, given a network G, is dened as
J(θ|G)=EZπθω(Z|G).(6)
4.2 Framework
According to the RL formulation in Section 4.1, the dimension
of the action space in our metro network expansion problem is
related to the number of stations. For a realistic size problem, the
high-dimensional action space causes value-based methods, such
as DQN, to be less suitable [
23
]. Thus, we employ policy-based
technology to solve the problem, which directly parameterizes the
policy. Figure 1 depicts our actor-critic framework. The actor takes
the metro network as input, and outputs the expanded metro line.
Its core component is a policy network, which maps metro network
states to the action space for the station selection, one station at each
step. By concatenating these selected stations in order, the expanded
metro line is generated. The reward is calculated with the generated
metro line. The critic estimates the expected cumulative reward to
assist the training of the actor by reducing training variance. Next,
we elaborate the policy network in the actor.
4.3 Policy Network Architecture
During the generation of a metro line, the selected stations aect the
subsequent optional stations, which is guaranteed by the feasibility
rules in Section A. We expect the policy to capture the dependencies
between stations that can coexist to generate a feasible metro line,
and give high probability to the dependent station sequences that
achieve more transportation demands.
Figure 2: Policy network. It takes the state stas input, and
generate the probability distribution to select a station.
To achieve the above requirements, we employ an encoder-
decoder neural network coupling with an attention layer to pa-
rameterize our station selection policy, as shown in Figure 2. The
policy network takes the metro network state as input, and outputs
the selected station, one station at a step. Specically, the encoder
creates the representations for the metro network, one represen-
tation at a step. The decoder employs an RNN to characterize the
sequence information of the selected stations during the generation
of a metro line. Taking these two sets of information, an attention
layer [
29
], which is able to exibly model the dependencies between
stations without regard to their distance in a station sequence [
28
],
generates a probability distribution over all stations to guide the
station selection, one station at a step. Next, we elaborate on how
our policy network works.
At each step
t
, the state
st
characterizes the timely metro network
information, aiming to distinguish the state changes caused by
agent actions. To achieve this, we represent each station
i
by a tuple
Xt
i=li,vt
i
, where
li
is the two-dimensional coordinates of station
Applied Data Science Track Paper
KDD '20, August 23–27, 2020, Virtual Event, USA
2649
i
in the grid space, and the natural number
vt
i[0,t1]
indicates
this station
i
is selected at step
vt
i
(
vt
i=
0means that this station has
not been selected before step
t
). Then, the state
st
is the sequential
concatenation of the tuples of all stations
st={Xt
0,Xt
1, . . . , Xt
n21}
,
which consists of the location feature and the station selection
feature. Here the station number starts at 0 and ends at n21.
With this input state
st
, the encoder computes the embeddings
for the stations through two 1-dimensional convolutional neural
networks respectively, each with
d
lters, one for the location
feature and another for the station selection feature. We denote
these two embedded features as
Lt
and
Vt
, and these two embedded
features have a common dimension dfor each station.
Next, the decoding happens. The decoder consists of a Long
Short Term Memory (LSTM) [
14
]. Again, we use the ordered station
sequence
Z=(z1,z2, . . . , zT),ziN
in Section 4.1 as an example.
At step
t
, it takes its own hidden state
ht1
at step
t
1and the
embedded location feature
Lt
zt1,:
of the last selected station
zt1
as inputs to generate the current hidden state htas follows:
ht,ct=LST M Lt
zt1,:,ht1,ct1,(7)
where
ct
presents the cell state of LSTM itself at step
t
. When
t=
1,
h0and c0are initialized to the zero tensor with dimension d.
After decoding, the attention layer takes the embedded features
of stations as reference and the hidden state
ht
as query to generate
the probability distribution over total stations. Specically, at step
t, the attention mechanism is modeled as follows:
qt
i=vT
atanh WaLt
i,:+Vt
i,:+ht ,i0,1,· · · ,n21(8)
ut=qt(H·F(st,G)),(9)
p(zt|Zt1,G)=sof tmax ut,(10)
where softmax normalizes the vector
ut
to generate a probability
distribution over total stations,
Lt
i,:
and
Vt
i,:
are the embedded fea-
tures of station
i
,
ht
is the hidden state of the decoder, the vector
qt
is the sequential concatenation of attention score
qt
i
,
H
is a huge
constant, the binary vector
F(st,G)
reects the feasibility rules,
represents the element-wise sum operator, and the matrix
Wa
and
the vector vaare training parameters.
At step
t
, the agent selects the station
zt
according to the proba-
bility distribution in Equation (10). Then, the metro network state
and the feasibility rules change accordingly. The policy network
takes these as inputs for the next station selection. The process of
choosing stations is repeated until any termination condition is
reached (see Appendix A for more details about the agent interac-
tion with the environment). The generation process of a metro line
is presented in Algorithm 1.
4.4 Actor-Critic Training
In this section, we aim to train the policy network which is param-
eterized with parameters
θ
to maximize the expected cumulative
reward in Equation (6). We use the policy gradient based actor-critic
algorithm [
10
], in which the actor is synonymous with our policy
network to generate a probability distribution over actions and the
critic is used to estimate the expected cumulative reward of the
next metro line for reducing the training variance. Specically, the
Algorithm 1 Gener ation (G,N,B0)
Input:
Metro network graph
G
, the maximum number of stations
N, budget B0
Output: A metro line ZT=(z1,z2,· · · ,zT)
1: Initialize the metro network state s1
2: for t=1, ..., Ndo
3: Update the feasibility rules vector F(st,G)
4: Update the cost b0
5: if F(st,G).any() and b0<B0then
6:
Compute embedded features
Lt
and
Vt
of the current
state by the encoder
7:
Update hidden state
ht
based on the decoder according
to Equation (7)
8:
Generate a probability distribution over total stations by
the attention mechanism according to Equation (8), Equation (9)
and Equation (10)
9:
Select the station
zt
with the probability
p(zt|st,G)
in
Equation (10)
10: Updata the metro network state st+1
11: else
12: Terminate the metro line expansion
13: end if
14: end for
15: return Solution =ZT
gradient of Equation (6) according to the study [31] is
J(θ|G)=EZπθ[(ω(Z|G) b(G))∇θlogpθ(Z|G)],(11)
where
ω(Z|G)
presents the transportation demands achieved by
the metro line
Z
,
b(G)
is the estimated expected cumulative reward
of the next metro line, and
pθ(Z|G)
is the generating probability of
Z
. According to Equation (11), during the interaction between the
agent and the environment, if a generated metro line
Z
achieves
more transportation demands than the current estimated expected
cumulative reward
b(G)
, the policy network is trained to increase
the probability of this line
Z
, and more transportation demands lead
to larger increases. In our study, the critic is a neural network which
consists of three convolutional layers and two fully-connected lay-
ers. It takes the initial state of the metro network as input, and
outputs a scalar to estimate the expected cumulative reward of the
next metro line. The training details are shown in Algorithm 2.
5 EXPERIMENTS
In this section, we conduct a case study to demonstrate the eec-
tiveness of our RL method. Our codes are available online1.
5.1 Data and Pre-processing
The case study is conducted based on the metro network in Xi’an,
Shaanxi Province, China. Its rst line started operation on Septem-
ber 16, 2011, and four lines were in operation by January 2020. Our
experimental data, coming from a citywide 3G cellular data net-
work, records the location information of 24,770,715 mobile phone
users in Xi’an from October 1, 2015, to October 31, 2015. With the
1https://github.com/weiyu123112/City-Metro-Network-Expansion-with-RL
Applied Data Science Track Paper
KDD '20, August 23–27, 2020, Virtual Event, USA
2650
(a) The CDF of the radius of gyration (b) Metro network (c) House price (RMB)
Figure 3: The current city operational status. (a) presents the cumulative distribution function of the radius of gyration. (b)
shows the four metro lines currently operating in Xi’an, of which the two red lines were opened before our experimental data,
and the two green lines were opened later. (c) presents the distribution of grid average house price in 2015, which is used to
characterize the index of development of grid.
Algorithm 2 Actor-Critic Training
Require:
Metro network graph
G
, batch size
B
, training epoch
E
,
buer R, the maximum number of stations N, budget B0
1: Initialize actor parameters θ
2: Initialize critic parameters θc
3: Initialize buer R
4: for epoch =1, .. ., Edo
5: for instance =1, . .., Bdo
6: Zi=Generat ion (G,N,B0)
7:
Calculate the satised transportation demands
ω(Zi|G)
8: Store the generated metro line Ziand ω(Zi|G)in R
9: end for
10: Calculate the b(G)by critic
11:
Update the actor parameters using the sampled gradient:
J(θ|G) 1
BÍB
i=1(ω(Zi|G) b(G))∇θlogpθZi|G
12: Update critic by minimizing the loss:
13: Lc1
BÍB
i=1(ω(Zi|G) b(G))2
14: Update the parameters of actor θAdam(θ,J(θ|G))
15: Update the parameters of critic θcAdam(θc,Lc)
16: end for
help of a Hadoop cluster with 3 master nodes and 10 slave nodes
2
,
we process each user’s original location records to a sequence of
stay points that represents this user’s space activity trajectory, re-
ferring to study [
33
]. Each stay point corresponds to a semantic
location with exact latitude and longitude.
Our metro network expansion is conducted on a city represented
by grids. To determine the grid size, with the above user trajectory
data, we calculate each user’s radius of gyration which is a met-
ric to distinguish users’ mobility patterns, and chose the distinct
geographic distance which separates total users equally into two
main groups as the grid size [
34
]. Figure 3(a) depicts the cumulative
distribution function of the radius of gyration, in which users with
the radius of gyration less than 1094 meters account for 50.2% of
the total. Finally, we set each grid as a square with 1000 meters
2
Each master node uses a dual Intel E5-2680 v4 CPU @ 2.4GHz with 14 cores. Each
slave node has a dual Intel E5-2650 v4 CPU @ 2.4GHz with 12 cores. The total RAM
is 1.5 TB and the total storage is 260 TB. The nodes run on CentOS release 6.8 with
Hadoop 2.6.0-cdh5.5.0.
width and divide the study area into 29
×
29 grids. Correspondingly,
we set the size of the lter in Appendix A as 5
×
5according to
study [
22
], which means that the distance between the adjacent
stations is between 1000 meters and 2000 meters.
With the above in mind, the realistic metro network with 4 lines
in Xi’an is presented in Figure 3(b), where the red lines represent the
2 lines that existed before October 2015, the green lines represent
the subsequently opened 2 lines, and the dots represent stations.
Mapping the above user stay points into grids, we calculate the OD
trips between any two grids. Figure 3(c) presents the distribution
of the average house price in 2015. The average house price of grid
дi
is used to characterize its index of development
Di
, which is
applied to the calculation of social equity in Appendix C.
In this study, we use only the two red lines opened before the
experimental data as the existing metro network, and then design
the next metro line. As for the cost of construction, we set 5 billion
RMB for each station, and 1 billion RMB per kilometer line referring
to the study [30].
5.2 Baselines and Performance Evaluation
We compare our RL method with the following baselines. The
implementation details of each method are in Appendix B.
Mathematical Programming Method (MP)
[
30
]. This
method formulates the metro network expansion problem
as a mathematical integer programming model. With pre-
dened corridors and specied end stations, this method
adopts a solver to obtain solutions.
Greedy Strategy Method (GS)
[
21
]. This method rst se-
lects the edge that satises the maximum objective, such as
OD trips, and then gradually extends the current metro line
by adding the surrounding stations, yielding the maximum
objective.
Genetic Algorithm Method (GA)
[
26
]. This method rst
generates the initial population (the set of feasible metro
lines), and then selects individuals (metro lines) to conduct
crossover and mutation for better generations.
Ant Colony Algorithm (ACA)
[
32
]. This method rst in-
troduces a pheromone related probabilistic rule to guide the
station selection. Then, according to the objective satised
Applied Data Science Track Paper
KDD '20, August 23–27, 2020, Virtual Event, USA
2651
(a) The rst predened corridor (b) The second predened corridor (c) The third predened corridor.
Figure 4: The predened corridors and the specied end stations. The areas between two yellow lines are the predened
corridors, and the green triangles are the specied end stations. In this experiment, the metro lines are only allowed to be
built within these corridors, and the metro lines of MP need to end at the specied end stations.
by the newly generated metro lines, this method updates the
pheromone of each edge for better station selection.
Existing study [
12
] employs the satised OD trips to evaluate the
solution performance. Our study also uses satised OD trips, but
also considers the social equity factor to expand the metro network
as shown in Equation (3). Therefore, setting the weights
α1
and
α2
to dierent values, we use the following three objective functions
to evaluate the performance of dierent methods:
ω1=Rod (Z),(12)
ω2=0.5Rod (Z)+0.5Rac (Z),(13)
ω3=Rac (Z),(14)
where
Rod (Z)
is the added OD trips by metro line
Z
(refer to Equa-
tion (1)) and
Rac (Z)
is the social equity indicator of
Z
(refer to
Equation (2)). Before metric calculation, we rescale the OD trips be-
tween grids and the house price of each grid into
[0,1]
by dividing
them by their respective maximums.
5.3 Comparison with Baselines
5.3.1 Comparison with MP in predefined corridors. To make a com-
parison between the MP and our RL method, we predene 3 corri-
dors, and utilize both methods to build a new metro line in each
corridor. Except for the corridors, we specify the end stations and
enumerate all the edges that satisfy our constraints in Section 3
for MP, as these are an essential part of MP. Figure 4 presents the
3 corridors and the corresponding end stations, and Table 1 illus-
trates the performance. The performance of the expanded metro
line varies greatly from corridor to corridor, which indicates that
expert guidance seriously aects the performance of metro lines.
Even within a single corridor, the poor performance suggests that
specifying end stations may have a bad impact on the performance
of the metro line. The results of the MP must go through the spec-
ied end stations. Due to budget limitations, the areas satisfying
more transportation demands may not have the opportunity to
build a metro line, which can explain this poor performance. When
the budget is small, this situation is more obvious. Therefore, while
the MP can get an optimal solution on a small scale, its results
rely heavily on expert guidance, which lacks reliability. On the
contrary, our RL method requires no expert guidance and obtains
near-optimal solutions. In the next section, we will demonstrate
the advantages of our RL method over the whole city space.
Table 1: Performances of MP and our RL method in prede-
ned corridors.
Budget=210 Budget=270
Obj. MP RL MP RL
ω141.043 44.161 45.899 44.888
Corridor1 ω242.53 46.820 50.053 50.393
ω344.162 49.430 54.842 54.772
ω151.404 54.376 59.160 58.694
Corridor2 ω248.217 54.406 60.014 58.575
ω345.501 55.495 65.189 64.541
ω139.858 39.222 42.186 39.623
Corridor3 ω238.826 40.574 42.240 42.259
ω339.724 42.123 43.724 44.431
5.3.2 Comparison with heuristic methods in the whole city space.
In this section, we conduct both heuristic baselines and our RL
method to design the next metro line, using the whole city rather
than specied corridors. Each method is executed ve times, and
the average performances are shown in Table 2. Our RL method
achieves signicant performance improvements in all cases. The GS
only focuses on the local information. It cannot consider the remote
transportation demands to make a better decision, and naturally
achieves a poor performance. For the GA, its heuristic operators are
likely to lead to invalid solutions during the crossover and mutation,
due to the shape constraints of the metro line. The low eciency
of heuristic operators hinders the evolution of better solutions.
For the ACA, without the specied initial station, its strategy of
randomly selecting the initial station makes no guarantee for a
good performance. The statistical comparison results are shown in
Appendix D.
In addition to satisfying transportation demands, the stability of
the solution is also important, since a method with large solution
dierences cannot provide convincing results for city decision mak-
ers. According to Table 2, our RL method has a relatively steady
performance over the GA and ACA (small standard deviation). To
intuitively perceive the dierences between the results, we further
map the solutions of each method on the city grids. As shown
Applied Data Science Track Paper
KDD '20, August 23–27, 2020, Virtual Event, USA
2652
Table 2: Performances of heuristic methods and our RL method in the whole city space. Each entry is the nal average objective
±standard deviation across 5 trials.
Budget =210 Budget=270
ω1ω2ω3ω1ω2ω3
GS 19.082±0.000 26.718±0.000 24.333±0.000 19.082±0.000 28.614±0.000 35.319±0.000
GA 25.672±2.013 20.969±3.703 20.793±2.529 28.904±0.547 25.160±1.476 27.666±1.318
ACA 29.957±0.949 39.611±0.700 48.836±2.236 30.264±1.129 40.476±1.332 50.830±1.761
RL 59.268±0.161 56.177±0.320 57.765±0.062 64.306±0.582 62.106±0.088 65.775±0.166
(a) GA (b) ACA (c) Our RL method
Figure 5: The expanded metro lines achieving the maximum and minimum OD trips of dierent methods with a budget of 270
billion RMB across 5 trials. The blue (violet) line achieves the maximum (minimum) OD trips. Our RL method has a relatively
steady performance over GA and ACA.
in Figure 5, the solutions of our RL method dier relatively little
in shape, while the solutions of GA and ACA are quite dierent.
Therefore, considering the satised transportation demands and
the stability of the solution, our method has great advantages.
5.4 Comparison with Realistically Planned
Metro Line
In this section, we explain our solutions from a realistic perspective.
After using estimation to make multiple plans within a reasonable
cost range, we will use the most practical one, which uses a budget
of 270 billion RMB, as an example. The results are shown in Figure 6.
When considering only the OD trips, the expanded metro line
starts from the Xi’an Nan Railway Station at the lower left corner,
passes through the city center, and then goes in the direction of the
Terracotta Army at the upper right corner, as shown in Figure 6(a).
When considering OD trips and social equity as equally important,
the expanded metro line is more like a partial combination of two
lines planned in reality: its lower left part is similar to the real-
istically planned line 6, and its upper right part is similar to the
line 3 currently in operation, as shown in Figure 6(b). This case
shows the rationality of considering social equity in transportation
planning. Figure 6(c) is the result of considering only social equity.
By comparing with Figure 3(c), we nd that this metro line has
passed through the grids with a high development level, which
intuitively demonstrates the eectiveness of our method.
Dierent objectives lead to dierent metro networks. The ob-
jectives of metro expansion vary with dierent cities and stages,
which may require existing methods, whether heuristics or pre-
dened corridors, to be revised. By changing the reward function,
our method can be easily extended to dierent objectives without
problem-specic knowledge. Therefore, our method is generic, and
suitable for metro expansion.
5.5 Multiple Lines Expansion
Considering OD trips and social equity as equally important, we
sequentially design the second metro line with our expanded line
in Figure 6(b) as existing. The violet line in Figure 7 represents
the expanded line. Its shape is like a partial combination of the 2
subsequently opened lines after October 2015. In this gradual way,
we expand the metro network with multiple lines.
6 CONCLUSION
This paper presents a RL based method to solve the city metro
network expansion problem. By formulating metro line expansion
as a process of sequential station selection, we train an actor-critic
model to design the next metro line. Through a case study, our
method shows great advantages over baselines, achieving higher
transportation demands and showing better stability, even with-
out expert guidance. In addition, compared with the realistically
planned metro lines, the eectiveness of our method is further
conrmed.
REFERENCES
[1]
Elisabete Arsenio, Karel Martens, and Floridea Di Ciommo. 2016. Sustainable
urban mobility plans: Bridging climate change and equity targets? Research in
Transportation Economics 55 (2016), 30–39.
[2]
Hamid Behbahani, Sobhan Nazari, Masood Jafari Kang, and Todd Litman. 2019.
A conceptual framework to formulate transportation network design problem
considering social equity criteria. Transportation Research Part A: Policy and
Practice 125 (2019), 171–183.
[3]
Irwan Bello, Hieu Pham, Quoc V Le, Mohammad Norouzi, and Samy Bengio.
2016. Neural combinatorial optimization with reinforcement learning. arXiv
preprint arXiv:1611.09940 (2016).
[4]
Giuseppe Bruno, Michel Gendreau, and Gilbert Laporte. 2002. A heuristic for
the location of a rapid transit line. Computers & Operations Research 29, 1 (2002),
1–12.
[5]
Partha Chakroborty. 2003. Genetic algorithms for optimal urban transit network
design. Computer-Aided Civil and Infrastructure Engineering 18, 3 (2003), 184–200.
Applied Data Science Track Paper
KDD '20, August 23–27, 2020, Virtual Event, USA
2653
(a) ω1(b) ω2(c) ω3
Figure 6: The expanded metro lines under dierent objectives. The blue lines are our expanded results.
Figure 7: Multiple lines expansion. Taking into considera-
tion OD trips and social equity, the blue and violet lines are
the rst and second expanded lines, respectively.
[6]
Hélène Dufourd, Michel Gendreau, and Gilbert Laporte. 1996. Locating a transit
line using tabu search. Location Science 4, 1-2 (1996), 1–19.
[7]
Wei Fan and Randy B Machemehl. 2006. Using a simulated annealing algorithm
to solve the transit route network design problem. Journal of transportation
engineering 132, 2 (2006), 122–132.
[8]
Reza Zanjirani Farahani, Elnaz Miandoabchi, Wai Yuen Szeto, and Hannaneh
Rashidi. 2013. A review of urban transportation network design problems. Euro-
pean Journal of Operational Research 229, 2 (2013), 281–302.
[9]
Xavier Glorot and Yoshua Bengio. 2010. Understanding the diculty of training
deep feedforward neural networks. In Proceedings of the thirteenth international
conference on articial intelligence and statistics. 249–256.
[10]
Ivo Grondman, Lucian Busoniu, Gabriel AD Lopes, and Robert Babuska. 2012.
A survey of actor-critic reinforcement learning: Standard and natural policy
gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications
and Reviews) 42, 6 (2012), 1291–1307.
[11]
Gabriel Gutiérrez-Jarpa, Gilbert Laporte, and Vladimir Marianov. 2018. Corridor-
based metro network design with travel ow capture. Computers & Operations
Research 89 (2018), 58–67.
[12]
Gabriel Gutiérrez-Jarpa, Carlos Obreque, Gilbert Laporte, and Vladimir Marianov.
2013. Rapid transit network design for optimal cost and origin–destination
demand capture. Computers & Operations Research 40, 12 (2013), 3000–3009.
[13]
Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and
David Meger. 2018. Deep reinforcement learning that matters. In Thirty-Second
AAAI Conference on Articial Intelligence.
[14]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural
computation 9, 8 (1997), 1735–1780.
[15]
Konstantinos Kepaptsoglou and Matthew Karlaftis. 2009. Transit route network
design problem. Journal of transportation engineering 135, 8 (2009), 491–505.
[16]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic opti-
mization. arXiv preprint arXiv:1412.6980 (2014).
[17] Vijay R Konda and John N Tsitsiklis. 2000. Actor-critic algorithms. In Advances
in neural information processing systems. 1008–1014.
[18]
Michael Kuntz and Marco Helbich. 2014. Geostatistical mapping of real estate
prices: an empirical comparison of kriging and cokriging. International Journal
of Geographical Information Science 28, 9 (2014), 1904–1921.
[19]
Gilbert Laporte and Juan A Mesa. 2015. The design of rapid transit networks. In
Location science. Springer, 581–594.
[20]
Gilbert Laporte, Juan A Mesa, and Francisco A Ortega. 2000. Optimization
methods for the planning of rapid transit systems. European Journal of Operational
Research 122, 1 (2000), 1–10.
[21]
Gilbert Laporte, Juan A Mesa, Francisco A Ortega, and Ignacio Sevillano. 2005.
Maximizing trip coverage in the location of a single rapid transit alignment.
Annals of Operations Research 136, 1 (2005), 49–63.
[22]
Gilbert Laporte and Marta MB Pascoal. 2015. Path based algorithms for metro
network design. Computers & Operations Research 62 (2015), 78–94.
[23]
Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez,
Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with
deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015).
[24]
Kevin Manaugh, Madhav G Badami, and Ahmed M El-Geneidy. 2015. Integrating
social equity into urban transportation planning: A critical evaluation of equity
objectives and measures in transportation plans in North America. Transport
policy 37 (2015), 167–176.
[25]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis
Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep
reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).
[26]
Mahmoud Owais and Mostafa K Osman. 2018. Complete hierarchical multi-
objective genetic algorithm for transit network design problem. Expert Systems
with Applications 114 (2018), 143–154.
[27]
Yanshuo Sun, Paul Schonfeld, and Qianwen Guo. 2018. Optimal extension of
rail transit lines. International Journal of Sustainable Transportation 12, 10 (2018),
753–769.
[28]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all
you need. In Advances in Neural Information Processing Systems. 5998–6008.
[29]
Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer networks. In
Advances in Neural Information Processing Systems. 2692–2700.
[30]
Yi Wei, Jian Gang Jin, Jingfeng Yang, and Linjun Lu. 2019. Strategic network
expansion of urban rapid transit systems: A bi-objective programming model.
Computer-Aided Civil and Infrastructure Engineering 34, 5 (2019), 431–443.
[31]
Ronald J Williams. 1992. Simple statistical gradient-following algorithms for
connectionist reinforcement learning. Machine learning 8, 3-4 (1992), 229–256.
[32]
Zhongzhen Yang, Bin Yu, and Chuntian Cheng. 2007. A parallel ant colony
algorithm for bus network optimization. Computer-Aided Civil and Infrastructure
Engineering 22, 1 (2007), 44–55.
[33]
Yang Ye, Yu Zheng, Yukun Chen, Jianhua Feng, and Xing Xie. 2009. Mining
individual life pattern based on location history. In 2009 tenth international
conference on mobile data management: systems, services and middleware. IEEE,
1–10.
[34]
Junjun Yin, Aiman Soliman, Dandong Yin, and Shaowen Wang. 2017. Depicting
urban boundaries from a mobility network of spatial interactions: a case study of
Great Britain with geo-located Twitter data. International Journal of Geographical
Information Science 31, 7 (2017), 1293–1313.
[35]
Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Long Xia, Jiliang Tang, and Dawei Yin.
2018. Recommendations with negative feedback via pairwise deep reinforcement
learning. In Proceedings of the 24th ACM SIGKDD International Conference on
Knowledge Discovery & Data Mining. 1040–1048.
ACKNOWLEDGMENTS
The corresponding author Xi Zhao is a Tang Scholar. This work was
supported by the National Natural Science Foundation of China
(Grant No. 91746111).
Applied Data Science Track Paper
KDD '20, August 23–27, 2020, Virtual Event, USA
2654
Figure 8: Action direction rules.
Table 3: Bootstrap mean and 95% condence bounds for the experiments. 10K bootstrap iterations and the pivotal method are
used.
Budget =210 Budget =270
ω1ω2ω3ω1ω2ω3
GS 19.082
(19.082,19.082)
26.718
(26.718,26.718)
24.333
(24.333,24.333)
19.082
(19.082,19.082)
28.614
(28.614,28.614)
35.319
(35.319,35.319)
GA 25.672
(23.985,27.164)
20.969
(17.720,23.410)
20.793
(18.568,22.552)
28.904
(28.505, 29.335)
25.160
(24.017, 26.288)
27.666
(26.852,28.843)
ACA 29.957
(29.273,30.778)
39.611
(39.093,40.135)
48.836
(47.286,50.719)
30.264
(29.396, 31.118)
40.476
(39.384, 41.474)
50.830
(49.346, 52.143)
RL 59.268
(59.143,59.392)
56.177
(55.917, 56.408)
57.765
(57.718,57.811)
64.306
(63.755, 64.762)
62.106
(62.035, 62.171)
65.775
(65.667, 65.910)
A FEASIBILITY RULES
In this section, we present the feasibility rules
F(st,G)
in Equa-
tion (4), which constrains the agent’s action, to ensure that the
expanded metro line satises the constraints in Section 3. For the
rst constraint, we use a certain lter with the current selected
station in the center to limit the selecting range of optional stations
for the agent. The lter is a square shape with
m×m
grids, and
the agent can only select the next station within this lter. For the
second constraint, in terms of shape assurance, we design the action
direction rules based on historical actions to determine the optional
action in the next step as shown in Figure 8. Except for this action
direction rules, the existing metro network also aects the action
selection. During the expansion, the expanded metro line is allowed
to connect with existing lines to form transfer stations, but it cannot
coincide with the existing lines. To achieve the above, we present
the feasibility rules
F(st,G)
as a binary vector with dimension
n2
,
where the element is set to 1 if this station is optional; otherwise
the element is set to 0.
In addition to the feasibility rules
F(st,G)
, for the third constraint,
once the number of selected stations reaches the upper limit
N
, the
expansion process is terminated. For the fourth constraint, per unit
length of the metro line and per station consume a certain cost.
During the interaction between the agent and the environment,
once the cost of construction exceeds the budget
B0
, the expansion
process is terminated.
With these above, the agent interacts with the environment as
follows. Initially, all of the elements of the vector
F(st,G)
are 1,
and the agent selects a station based on the parameterized policy
πθ
. Then, the
F(st,G)
is updated according to the lter and the
action direction rules. From this time on, the agent is only allowed
to select a station with the element as 1 in the
F(st,G)
according to
the policy
πθ
. This process will be repeated until all the elements of
the vector
F(st,G)
are 0, or the number of selected stations reaches
the upper limit
N
, or the cost of construction exceeds the budget
B
.
An example is shown in Figure 9.
B METHOD DETAILS
This section describes the implementation details of our RL method
and the compared methods in Section 5.2. All the experiments are
conducted on a desktop with an E5-2680 v4 @ 2.40GHz CPU and a
TITAN Xp GPU.
For our RL method, we set the two 1-dimensional convolutional
neural networks in encoder with 128 lters, and the LSTM in de-
coder with a state size of 128. The parameters in our network are
Applied Data Science Track Paper
KDD '20, August 23–27, 2020, Virtual Event, USA
2655
Figure 9: An example of metro line expansion.
initialized with Xavier initialization [
9
]. We adopt the Adam op-
timizer [
16
] with learning rate 10
4
to train our network. During
the training, the batch size
B
is 128, and the training epoch
E
is
3500. After training, we employ the parameters which achieve the
maximum objective in the training process to generate the metro
line.
For the MP, we predene the corridor and end stations, and
employ Gurobi to obtain the new metro line. The newly constructed
metro line must stay in the predened corridor and end at the
specied end stations.
For the GA, its tness function is the objective in Equation (3).
After our parameter tuning, we set the initial population size as 500,
and the maximum number of iterations as 3500. The crossover is
conducted at the common station of two selected metro lines, and
each station on a metro line is allowed to mutate. After crossover
and mutation, the original metro lines that fail to evolve and the new
metro lines that satisfy our constraints in Section 3 are preserved.
Considering the low eciency of heuristic operators, we set both
the crossover probability and mutation probability as 0.9. At last, the
metro line with the maximum tness during the evolution process
is output as the nal solution.
For the ACA, we set the maximum number of iterations as 3500,
with each iteration containing 128 instances. All the other parame-
ters are the same as study [
32
]. In each iteration, the initial station
for each instance is randomly selected, and then the subsequent sta-
tion selection is guided by the probabilistic rule. At last, the metro
line with the maximum objective during the evolution process is
output as the nal solution.
In the phase of mapping the solution on the city grids, the nal
solutions of all methods are modied using gurobi to be more
realistic.
C SOCIAL EQUITY
As an important transportation system, the metro network benets
the areas traversed by it for convenient access to other areas, which
may be reected in people’s access to education, economic activity,
and other aspects. Referring to [
2
], we consider an accessibility
variable as each area’s distributable benet obtaining from the
metro network, and adopt a utilitarianism theory to aggregate each
area’s distributable benet to measure the social equity.
For an area i, its accessibility index Aciis calculated as:
Aci=Õ
j
DjFcij ,(15)
where
ci j
is the generalized travel cost between area
i
and area
j
,
Fcij
represents the function of resistance against traveling
between area
i
and area
j
, and
Dj
represents the compound index
of development of area j. Specically, Fcij is dened as:
Fcij =Fti j =eβti j ,(16)
where
ti j
is the travel time between area
i
and area
j
, and
β
is an
adjustment parameter. Djis dened as:
Dj=Õ
k
wkdj,k,(17)
where
dj,k
is an economic variable of type
k
for area
j
and
wk
is
the weight. In our study, we consider that
ti j
is proportional to the
distance between
i
and
j
, and take the house price of area
j
as its
index of development Dj[18].
Then, under a utilitarianism theory, the social equity indicator
Rac in the transportation network planning is dened as:
Rac =Õ
i
Aci,(18)
Rac
measures the total benets the transportation network brings
to society. We prefer the metro network to achieve greater social
equity indicator Rac .
D STATISTICAL COMPARISON OF
DIFFERENT METHODS’ PERFORMANCES
In order to make our experimental results more convincing, we em-
ploy the bootstrap and signicance testing to evaluate the methods’
performances in Section 5.3.2, referring to study [
13
]. Table 3 shows
the bootstrap mean and 95% condence bounds on our experiments,
which demonstrates that our method performs best in a statistical
sense. In fact, we have conducted power analysis for the choice of
the sample size, and the analysis demonstrates that 5 trials in our
experiments are enough.
We further conduct a dierence test to compare the performances
of dierent methods, referring to study [
13
]. Limited by space, we
only present the comparison of our RL method with ACA under
a budget of 270 billion RMB and the same objective
ω1
. Assuming
the null hypothesis
H0
, that there is no dierence in performance
between these two methods, the bootstrap condence interval test
estimates the condence interval for the dierence between the
mean performances of these two methods as [33.1925, 34.927] at
signicance level 0.05. The condence interval does not include
0 and both bounds are positive. Thus the null hypothesis
H0
is
rejected, and we can be condent at 95% that the performance of
our RL method is superior to that of the ACA method.
Applied Data Science Track Paper
KDD '20, August 23–27, 2020, Virtual Event, USA
2656
... Because of large power consumption or privacy concerns, GPS sensors are not installed on all vehicles or enabled by all mobile subscribers. This limits many applications that rely on group behavior analysis of a large amount of users, such as urban planning optimization [2], [24] and human mobility analysis [5], [6], etc. Second, in places with ...
... To further improve map matching performance, we exploit global heuristics observed from real driving scenarios, such as preferring the routes with more proportion of major roads, less frequency of turns and U-turns. To incorporate these heuristics, inspired by the recent advance of reinforcement learning (RL) approaches [24], [28]- [30], we customize the basic map matching model into a reinforcement learning framework. ...
Article
Full-text available
This paper presents a novel map matching framework that adopts deep learning techniques to map a sequence of cell tower locations to a trajectory on a road network. Map matching is an essential pre-processing step for many applications, such as traffic optimization and human mobility analysis. However, most recent approaches are based on hidden Markov models (HMMs) or neural networks that are hard to consider high-order location information or heuristics observed from real driving scenarios. In this paper, we develop a deep reinforcement learning based map matching framework for cellular data, named as DMM, which adopts a recurrent neural network (RNN) coupled with a reinforcement learning scheme to identify the most-likely trajectory of roads given a sequence of cell towers. To transform DMM into a practical system, several challenges are addressed by developing a set of techniques, including spatial-aware representation of input cell tower sequences, an encoder-decoder based RNN network for map matching model with variable-length input and output, and a global heuristics-driven reinforcement learning based scheme for optimizing the parameters of the encoder-decoder map matching model. Extensive experiments on a large-scale anonymized cellular dataset reveal that DMM provides high map matching accuracy and fast inference time.
... Eective expansion design requires accurate demand predictions for new stations during the planning year. While previous studies have made signicant contributions to metro network expansion design using optimization models and reinforcement learning algorithms [22,27,28,33], they often overlook the demand prediction step. Many studies either use the current year's all-mode travel demand as a proxy for metro demand in the planning year or rely on unvalidated estimated demands. ...
... To determine the grid size, with the above user trajectory data, we calculate each user's radius of gyration which is a metric to distinguish users' mobility patterns, and chose the distinct geographic distance which separates total users equally into two main groups as the grid size [27], [28]. Figure 4 depicts the cumulative distribution function of the radius of gyration, in which users with the radius of gyration less than 1094 meters account for 50.2% of the total. ...
Article
Full-text available
Administrative divisions are regional divisions of the state for the purpose of hierarchical administration. In recent years, the process of urbanization has greatly promoted the urban development. This development is not only reflected in the expansion of urban areas but also in economic and social patterns. All these changes affect the way the urban operates. Then, a concern arising from the changing urban dynamics is that whether current administrative division accords with urban development? Existing studies conceptualize the urban space as the environment created by human activities, and elaborate the importance of urban boundaries respecting to human activities in urban management. Following this concept, we delineate the urban interior boundaries formed by human activities. Specifically, taking Xi’an in Shaanxi Province of China as an example, this study first explores the region-based human crowd mobility patterns to verify that human mobility can establish a stable correlation between regions, or capture the objective correlations between regions. Then, the above human crowd patterns have been found to be applicable for mining unusual urban regions from the perspective of anomaly detection, and empirical evidence has found that these regions are of great significance for understanding the urban spatial structure. Finally, we employ the community detection technology to naturally delimit the urban interior boundaries formed by human mobility, and make a comparison with the official urban boundaries. Some unexpected communities that are closely linked due to human activities appear from the results, and these findings help the urban planners re-examine the administrative division.
... But to make their method feasible, they assumed that the network is a connected graph, which is not needed in our paper. The work [19] presented a RL-based method to solve the city metro network expansion problem. Our main difference lies in the different methods used to extract the information on PT graph. ...
Preprint
Designing Public Transport (PT) networks able to satisfy mobility needs of people is essential to reduce the number of individual vehicles on the road, and thus pollution and congestion. Urban sustainability is thus tightly coupled to an efficient PT. Current approaches on Transport Network Design (TND) generally aim to optimize generalized cost, i.e., a unique number including operator and users' costs. Since we intend quality of PT as the capability of satisfying mobility needs, we focus instead on PT accessibility, i.e., the ease of reaching surrounding points of interest via PT. PT accessibility is generally unequally distributed in urban regions: suburbs generally suffer from poor PT accessibility, which condemns residents therein to be dependent on their private cars. We thus tackle the problem of designing bus lines so as to minimize the inequality in the geographical distribution of accessibility. We combine state-of-the-art Message Passing Neural Networks (MPNN) and Reinforcement Learning. We show the efficacy of our method against metaheuristics (classically used in TND) in a use case representing in simplified terms the city of Montreal.
... In this paper, the parcel singulation task is formulated as Markov decision process(MDP) problem with a nonstationary environment [11,12]. In real-world circumstances, the input state space of MDP varies owing to the uncertainty of input parcels at each time-step. ...
Article
Full-text available
In the rapidly expanding logistics sector, parcel singulation has emerged as a significant bottleneck. To address this, we propose an automated parcel singulator utilizing a sparse actuator array, which presents an optimal balance between cost and efficiency, albeit requiring a sophisticated control policy. In this study, we frame the parcel singulation issue as a Markov Decision Process with a variable state space dimension, addressed through a deep reinforcement learning (RL) algorithm complemented by a State Space Standardization Module (S3). Distinct from previous RL approaches, our methodology initially considers the non-stationary environment during the problem modeling phase. To counter this challenge, the S3 module standardizes the dynamic input state, thereby stabilizing the RL training process. We validate our method through simulation experiments in complex environments, comparing it with several baseline algorithms. Results indicate that our algorithm excels in parcel singu-lation tasks, achieving a higher success rate and enhanced efficiency.
Preprint
Full-text available
Multi-Objective Reinforcement Learning (MORL) aims to learn a set of policies that optimize trade-offs between multiple, often conflicting objectives. MORL is computationally more complex than single-objective RL, particularly as the number of objectives increases. Additionally, when objectives involve the preferences of agents or groups, ensuring fairness is socially desirable. This paper introduces a principled algorithm that incorporates fairness into MORL while improving scalability to many-objective problems. We propose using Lorenz dominance to identify policies with equitable reward distributions and introduce {\lambda}-Lorenz dominance to enable flexible fairness preferences. We release a new, large-scale real-world transport planning environment and demonstrate that our method encourages the discovery of fair policies, showing improved scalability in two large cities (Xi'an and Amsterdam). Our methods outperform common multi-objective approaches, particularly in high-dimensional objective spaces.
Article
Developing smart cities is vital for ensuring sustainable development and improving human well-being. One critical aspect of building smart cities is designing intelligent methods to address various decision-making problems that arise in urban areas. As machine learning techniques continue to advance rapidly, a growing body of research has been focused on utilizing these methods to achieve intelligent urban decision making. In this survey, we conduct a systematic literature review on the application of machine learning methods in urban decision making, with a focus on planning, transportation, and healthcare. First, we provide a taxonomy based on typical applications of machine learning methods for urban decision making. We then present background knowledge on these tasks and the machine learning techniques that have been adopted to solve them. Next, we examine the challenges and advantages of applying machine learning in urban decision making, including issues related to urban complexity, urban heterogeneity and computational cost. Afterward and primarily, we elaborate on the existing machine learning methods that aim to solve urban decision making tasks in planning, transportation, and healthcare, highlighting their strengths and limitations. Finally, we discuss open problems and the future directions of applying machine learning to enable intelligent urban decision making, such as developing foundation models and combining reinforcement learning algorithms with human feedback. We hope this survey can help researchers in related fields understand the recent progress made in existing works, and inspire novel applications of machine learning in smart cities.
Article
Mobility service route design requires demand information to operate in a service region. Transit planners and operators can access various data sources including household travel survey data and mobile device location logs. However, when implementing a mobility system with emerging technologies, estimating demand becomes harder because of limited data resulting in uncertainty. This study proposes an artificial intelligence-driven algorithm that combines sequential transit network design with optimal learning to address the operation under limited data. An operator gradually expands its route system to avoid risks from inconsistency between designed routes and actual travel demand. At the same time, observed information is archived to update the knowledge that the operator currently uses. Three learning policies are compared within the algorithm: multi-armed bandit, knowledge gradient, and knowledge gradient with correlated beliefs. For validation, a new route system is designed on an artificial network based on public use microdata areas in New York City. Prior knowledge is reproduced from the regional household travel survey data. The results suggest that exploration considering correlations can achieve better performance compared to greedy choices and other independent belief-based techniques in general. In future work, the problem may incorporate more complexities such as demand elasticity to travel time, no limitations to the number of transfers, and costs for expansion.
Conference Paper
Full-text available
Recommender systems play a crucial role in mitigating the problem of information overload by suggesting users' personalized items or services. The vast majority of traditional recommender systems consider the recommendation procedure as a static process and make recommendations following a fixed strategy. In this paper, we propose a novel recommender system with the capability of continuously improving its strategies during the interactions with users. We model the sequential interactions between users and a recommender system as a Markov Decision Process (MDP) and leverage Reinforcement Learning (RL) to automatically learn the optimal strategies via recommending trial-and-error items and receiving reinforcements of these items from users' feedback. Users' feedback can be positive and negative and both types of feedback have great potentials to boost recommendations. However, the number of negative feedback is much larger than that of positive one; thus incorporating them simultaneously is challenging since positive feedback could be buried by negative one. In this paper, we develop a novel approach to incorporate them into the proposed deep recommender system (DEERS) framework. The experimental results based on real-world e-commerce data demonstrate the effectiveness of the proposed framework. Further experiments have been conducted to understand the importance of both positive and negative feedback in recommendations.
Article
Full-text available
In recent years, significant progress has been made in solving challenging problems across various domains using deep reinforcement learning (RL). Reproducing existing work and accurately judging the improvements offered by novel methods is vital to maintaining this rapid progress. Unfortunately, reproducing results for state-of-the-art deep RL methods is seldom straightforward. In particular, non-determinism in standard benchmark environments, combined with variance intrinsic to the methods, can make reported results difficult to interpret. Without significance metrics and tighter standardization of experimental reporting, it is difficult to determine whether improvements over the prior state-of-the-art are meaningful. In this paper, we investigate challenges posed by reproducibility, proper experimental techniques, and reporting procedures. We illustrate the variability in reported metrics and results when comparing against common baselines, and suggest guidelines to make future results in deep RL more reproducible. We aim to spur discussion about how to ensure continued progress in the field, by minimizing wasted effort stemming from results that are non-reproducible and easily misinterpreted.
Article
Full-text available
Existing urban boundaries are usually defined by government agencies for administrative, economic, and political purposes. Defining urban boundaries that consider socio-economic relationships and citizen commute patterns is important for many aspects of urban and regional planning. In this paper, we describe a method to delineate urban boundaries based upon human interactions with physical space inferred from social media. Specifically, we depicted the urban boundaries of Great Britain using a mobility network of Twitter user spatial interactions, which was inferred from over 69 million geo-located tweets. We define the non-administrative anthropographic boundaries in a hierarchical fashion based on different physical movement ranges of users derived from the collective mobility patterns of Twitter users in Great Britain. The results of strongly connected urban regions in the form of communities in the network space yield geographically cohesive, non-overlapping urban areas, which provide a clear delineation of the non-administrative anthropographic urban boundaries of Great Britain. The method was applied to both national (Great Britain) and municipal scales (the London metropolis). While our results corresponded well with the administrative boundaries, many unexpected and interesting boundaries were identified. Importantly, as the depicted urban boundaries exhibited a strong instance of spatial proximity, we employed a gravity model to understand the distance decay effects in shaping the delineated urban boundaries. The model explains how geographical distances found in the mobility patterns affect the interaction intensity among different non-administrative anthropographic urban areas, which provides new insights into human spatial interactions with urban space.
Article
With the development of urbanization and the extension of city boundaries, the expansion of rapid transit systems based on the existing lines becomes an essential issue in urban transportation systems. In this study, the network expansion problem is formulated as a bi‐objective programming model to minimize the construction cost and maximize the total travel demand covered by the newly introduced transit lines. To solve the bi‐objective mixed‐integer linear program, an approach called minimum distance to the utopia point is applied. Thus, the specific trade‐off is suggested to the decision makers instead of a series of optimal solutions. A real‐world case study based on the metro network in Wuxi, China, is conducted, and the results demonstrate the effectiveness and efficiency of the proposed model and solution method. It is found that the utopia method can not only provide a reasonable connecting pattern of the network expansion problem but also identify the corridors with high priority under the limited budget condition.
Article
Transit Network Design Problem is a multi-disciplinary problem that is considered one of the most intractable problems for real size networks. In the late 90s, Meta-heuristics started to prove more reliability to the problem. Genetic Algorithm (GA) is one of the popular Meta-heuristics which is usually implemented because it is simply adapted to the problem. In this study, GA is presented as a complete constructive multi-objective algorithm that creates its own routes from scratch then assembles the routes into efficient transit networks. Finally, it handles the multi-criteria nature of the problem until producing the optimal (near optimal) Pareto front solutions. A new frequency setting algorithm is also developed based on simulation results at the bus stop level which takes the bi-level decision making of both users and operators implicitly. Experimental studies on two real size networks are conducted to validate the methodology performance and robustness.
Article
In recent years, researchers have developed new methods to measure how transport decisions affect different groups of society. An example is the distribution of impacts (benefits and costs) from roadway investments, and the degree that the results are considered equitable (also called fair or just). Such decisions affect people’s ability to access services and activities, and therefore their economic opportunities and development. This study suggests ways of incorporating social equity measures in transportation network planning. It describes various equity impacts that can result from transportation planning decisions, discusses various social equity concepts and theories, reviews previous attempts to incorporate equity considerations into transport networks modeling, and suggests a framework for simultaneously optimizing network design and achieving social equity objectives. According to this framework, network design can be formulated using bi-level integer programming models corresponding to seven major social equity approaches along with the classical approach of “Total Travel Time Minimization.” An accessibility variable is used as the distributable benefit. This approach is more comprehensive and flexible than previous equity impact models. The proposed framework can be used to evaluate and optimize the equity impacts of various infrastructure investment decisions.
Article
A bilevel model for optimizing the extension of a rail transit line over a planning horizon is presented. In the upper level planning problem, planners make decisions regarding construction and investment in each period with the objective of net present worth maximization. In the lower level operational program, transit operators maximize the social welfare by setting the fare, headway, and vehicle size subject to the train capacity constraint. After the exploration of the model's structure, a tighter reformulated program is proposed and solved as a dynamic program. Numerical studies demonstrate that even without a budget limit phased development can be preferable to a one-time extension.
Article
We consider a metro network design problem in which the objective is to maximize the origin/destination traffic captured by the system. The lines of the network are located within some corridors that are also determined by the procedure. The amount of captured traffic depends on the ratio between travel time by metro and travel time using alternative modes. There is a limited construction budget. Lower bounds are imposed on the angles between alignments, which allows the generation of different network shapes. A matheuristic is proposed to solve the problem. The method is applied to a test case from the city of Concepción, Chile.
Article
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Article
The European Commission (EC) introduced the concept of Sustainable Urban Mobility Plans (SUMPs) as a new planning paradigm with a focus on people's needs – Planning for people. This represents a change from traditional planning approaches centred on motorized road traffic/infrastructure provision and a shift towards more sustainable transport options. SUMPs require a long-term and sustainable vision for cities and these are to pay special attention to the participation of citizens and stakeholders and to coordination of policies across sectors (transport, land use, health, energy, and so on). The EC guidelines on developing and implementing SUMPs (EC, 2013) establish the following primary objectives of this “new way of planning urban mobility”: accessibility and quality of life, as well as sustainability, economic viability, social equity, health and environment quality. Since urban areas in Europe account for 23%–25% of CO2 emissions from transport (EC, 2013b; EEA, 2014), SUMPs are expected to contribute to meet long-term climate change policy goals. However, it is less clear how SUMPs can contribute to address key societal challenges such as equity issues in accessibility. According to the EC guidelines SUMPs are still non-existing concepts in most European member states. However, several cities in Europe and beyond have already formulated and adopted SUMPs. This paper is built on a review of former voluntary SUMPs developed in Portugal. A sample of forty case studies is considered in the analysis. It aims: a) to understand how climate change goals and equity issues in accessibility have been addressed through the first generation of SUMPs; b) to reflect on the role of SUMPs as tools to answer climate change goals without putting at risk social equity issues, and c) to outline further research needs in the SUMP approach. The research results are expected to give insights into social equity needs in urban transport and climate change adaptation policies in Europe.