ArticlePDF Available

TEA-GCN: Transformer-Enhanced Adaptive Graph Convolutional Network for Traffic Flow Forecasting

MDPI
Sensors
Authors:

Abstract and Figures

Traffic flow forecasting is crucial for improving urban traffic management and reducing resource consumption. Accurate traffic conditions prediction requires capturing the complex spatial-temporal dependencies inherent in traffic data. Traditional spatial-temporal graph modeling methods often rely on fixed road network structures, failing to account for the dynamic spatial correlations that vary over time. To address this, we propose a Transformer-Enhanced Adaptive Graph Convolutional Network (TEA-GCN) that alternately learns temporal and spatial correlations in traffic data layer-by-layer. Specifically, we design an adaptive graph convolutional module to dynamically capture implicit road dependencies at different time levels and a local-global temporal attention module to simultaneously capture long-term and short-term temporal dependencies. Experimental results on two public traffic datasets demonstrate the effectiveness of the proposed model compared to other state-of-the-art traffic flow prediction methods.
Content may be subject to copyright.
Citation: He, X.; Zhang, W.;
Li, X.; Zhang, X. TEA-GCN:
Transformer-Enhanced Adaptive
Graph Convolutional Network for
Traffic Flow Forecasting. Sensors 2024,
24, 7086.
https://doi.org/10.3390/s24217086
Academic Editor: Enrico Meli
Received: 27 September 2024
Revised: 29 October 2024
Accepted: 31 October 2024
Published: 4 November 2024
Copyright: © 2024 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
sensors
Article
TEA-GCN: Transformer-Enhanced Adaptive Graph Convolutional
Network for Traffic Flow Forecasting
Xiaxia He 1, Wenhui Zhang 2, Xiaoyu Li 3,* and Xiaodan Zhang 1
1School of Information Science and Technology, Beijing University of Technology, Beijing 100124, China;
hexiaxia@emails.bjut.edu.cn (X.H.); zhangxiaodan@bjut.edu.cn (X.Z.)
2School of Information Engineering, Jiangxi Vocational College of Industry & Enginneering,
Nanchang 330013, China; aloxi@163.com
3Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100083, China
*Correspondence: lixy01@aircas.ac.cn
Abstract: Traffic flow forecasting is crucial for improving urban traffic management and reducing
resource consumption. Accurate traffic conditions prediction requires capturing the complex spatial-
temporal dependencies inherent in traffic data. Traditional spatial-temporal graph modeling methods
often rely on fixed road network structures, failing to account for the dynamic spatial correlations that
vary over time. To address this, we propose a Transformer-Enhanced Adaptive Graph Convolutional
Network (TEA-GCN) that alternately learns temporal and spatial correlations in traffic data layer-
by-layer. Specifically, we design an adaptive graph convolutional module to dynamically capture
implicit road dependencies at different time levels and a local-global temporal attention module to
simultaneously capture long-term and short-term temporal dependencies. Experimental results on
two public traffic datasets demonstrate the effectiveness of the proposed model compared to other
state-of-the-art traffic flow prediction methods.
Keywords: graph convolutional networks; traffic flow forecasting; adaptive graph learning
1. Introduction
With the rapid development of urban intelligent transportation, the traffic data collec-
tion sensors have been widely deployed and applied, and a large amount of traffic data has
been accumulated in real time. In addition, accurate traffic flow forecasting is conducive to
improving the management of urban traffic systems and reducing the resource consump-
tion. Therefore, how to efficiently use these collected data for traffic flow forecasting is a
core issue and research hotpot in the intelligent transportation field. So far, a large number
of researchers have carried out extensive researches on the topic of traffic flow forecasting,
and have achieved abundant research results.
Early traffic flow forecasting methods focus on analyzing the temporal dependence of
data [
1
4
], that is, learning the trend of traffic flow from historical observation sequences.
For example, Autoregressive Integrated Moving Average model (ARIMA) [
5
] is a tradi-
tional time series modeling method, which is widely used in traffic flow forecasting tasks.
The Historical Average model (HA) [
6
] is the favored model in the industry due to its
excellent efficiency and accuracy. Although these temporal-based analysis methods have
achieved satisfactory results, they only consider the temporal correlation of sequence data.
With the development of urbanization, the traffic data exhibits complex structural charac-
teristics, so the previous temporal correlation modeling methods are ineffective to fully
learn the characteristics of traffic network.
Fortunately, graph structures can represent such complex road network structures. Ben-
efiting from the powerful structure capture ability of Graph Neural Network
(GNNs) [710]
,
a series of GNN-based traffic flow forecasting methods have been proposed [
11
,
12
]. They
Sensors 2024,24, 7086. https://doi.org/10.3390/s24217086 https://www.mdpi.com/journal/sensors
Sensors 2024,24, 7086 2 of 13
generally integrate graph neural networks into Recurrent Neural Networks (RNN) or Con-
volutional Neural Networks (CNN) to capture the complex spatial-temporal dependencies
of traffic data [
13
,
14
]. For example, ASTGCN [
15
] simultaneously utilizes GCN to capture
spatial features and Temporal Convolutional Network (TCN) to model temporal depen-
dencies. Although these spatial-temporal correlation modeling methods have achieved
relatively satisfactory forecasting performance, there still exist two obvious challenges.
First, the traffic data show the characteristics of dynamic, complexity and uncertainty,
in which the spatial dependence between roads changes dynamically over time. Existing
GNN-based methods ignore other complex relationships in traffic data except for physical
connections, which cannot accurately model the spatial-temporal correlation of traffic
data. Specifically, explicit physical structures are difficult to accurately reflect the real
dependencies and connection strengths between roads, i.e., the missing edge connections
between roads with similar traffic conditions. Based on this challenge, Wu et al. [
16
]
constructed an adaptive adjacency matrix and learned it through the node embedding to
capture the hidden spatial dependence of traffic data. But they ignored the dynamic of
road spatial dependence over time. Learning a fixed spatial relationship from traffic data
is not sufficient to reflect the changes in road dependencies. Therefore, how to accurately
model the dynamic spatial dependence of traffic data is a challenge.
Second, existing temporal dependency modeling methods are ineffective to deal with
the long-term sequence information. The RNN-based methods require multiple iterations
to process long-term sequences, resulting in the high computational cost and the problem
of gradient disappearance. The CNN-based methods exploit one-dimensional convolution
to mine the temporal dependency between sequences, and its receptive field increases
linearly with the depth of convolutional layers, which is difficult to capture the long-term
correlation. So, how to process long-term traffic data is also a challenge.
To tackle above challenges, in this paper, we propose a novel transformer-enhanced
adaptive graph convolutional network for traffic flow forecasting. Its brief illustration is
show in Figure 1. Specifically, the adaptive graph convolutional module is proposed to
model dynamic spatial dependencies of complex traffic data, and the local-global temporal
attention module is proposed to explore the temporal dependencies of traffic data under
different periods, which improves the robustness of the proposed model. The contributions
of this paper are listed as follows:
TCN-a
𝑿
𝑿
1 𝑇
𝑨
1-D Convolutional Layer
Local-Global Temporal
Attention Layer
Adaptive Graph
Convolutional Layer
1-th Spatial-Temporal Block
Local-Global Temporal
Attention Layer
Adaptive Graph
Convolutional Layer
2-th Spatial-Temporal Block
Local-Global Temporal
Attention Layer
Adaptive Graph
Convolutional Layer
L-th Spatial-Temporal Block Predictor
𝑇 𝑇𝑡
𝒀
𝑿𝟏𝑭(𝟏) 𝒁(𝟏) 𝑭(𝟐) 𝒁(𝟐) 𝑭(𝑳) 𝒁(𝑳) 𝒁
𝑿𝟏
Temporal Transformer
TCN-b
FUSE
𝑭(𝟏)
𝑯(𝟏)
𝑻(𝟏)
𝑭(𝑳)𝑻
𝑭(𝑳)
softmax
𝑨𝒂𝒅𝒑
𝑨𝒂𝒕𝒕
(𝑳)
Concatenation
𝑨
𝒁(𝑳)
Local-Global Temporal Attention Layer Adaptive Graph Convolutional Layer
1-D Convolutional Layer
1-D Convolutional Layer
Figure 1. The brief illustration of the proposed model.
We propose a novel GNN-based traffic flow forecasting model that alternatively
updates the graph structure and the data representation layer-by-layer, which simulta-
neously captures the complex spatial-temporal dependencies of traffic data.
Sensors 2024,24, 7086 3 of 13
We creatively design an adaptive graph convolutional module to capture the rich
and implicit dynamic road spatial dependence under different time levels, which is
obviously different from other GNN-based methods.
We propose a local-global temporal attention module to model both long-term and
short-term temporal dependence of the traffic data.
The rest of this paper mainly consists of the following parts: In Section 2, we review
existing traffic forecasting methods. In Section 3, we provide a problem statement. The
proposed Transformer-Enhanced Adaptive Graph Convolutional Network is described in
detail in Section 4. Then, we verify the effectiveness of the proposed method through a
series of experiments in Section 5. Finally, we conclude this paper in Section 6.
2. Related Work
In this section, we review several related researches about traffic flow forecasting.
Early traffic flow forecasting methods are mainly based on statistical models.
ARIMA [5]
is an important method specially designed for time series forecasting. Besides, Kalman
filter [1] is also widely used in traffic flow forecasting tasks.
With the popularity of deep learning, many researchers have introduced deep learning
into traditional traffic flow forecasting algorithms and achieved the satisfactory perfor-
mance [
17
]. Lv et al. [
18
] proposed a novel traffic flow forecasting method on the basis of
deep learning technology, which firstly exploits autoencoder to learn the traffic flow fea-
tures. To describe the stochastic and nonlinear natures of traffic data, Hu et al. [
4
] utilized
RNN to model the temporal dependence of data, which further verifies the effectiveness of
deep learning in traffic flow prediction tasks.
Due to the complex, non-Euclidean structure of traffic data, traditional temporal
dependence modeling methods are able to fully extract these characteristics. Rencent
research has applied GNNs to traffic flow forecasting tasks, capturing spatial-temporal
dependencies and achieving satisfactory performance. One type of research integrating
GNNs with RNN or CNN to recursively mine the complex temporal and spatial dependence
hidden in traffic data [
13
,
19
,
20
]. Zhao et al. [
21
] proposed a temporal graph convolutional
network for urban road network traffic forecasting, which utilizes GCN to learn spatial
dependencies and GRU to learn the trend of traffic data in the time dimension. Graph
WaveNet constructs an adaptive adjacency matrix to capture the implicit spatial dependence
in traffic data [
16
]. On this basis, Bai et al. [
19
] proposed an adaptive graph convolutional
recurrent network to capture fine-grained spatial and temporal dependencies in traffic
sequences through the adaptive node parameter learning and adaptive graph generation.
Another type of research focuses on developing a spatial-temporal fusion graph for
traffic data and using GNNs to capture spatial-temporal correlations synchronously
[14,22].
Ref. [
23
] employs dynamic hypergraph structure learning and interactive graph convolu-
tion to capture high-order spatio-temporal relationships and diverse transitional patterns
in traffic data. Ref. [
24
] proposes a novel spatio-temporal graph neural network model that
conjointly captures high-order spatio-temporal relationships and diverse traffic patterns
within traffic data. To further consider the importance of pivotal nodes, which exhibit more
complex spatio-temporal dependencies than other nodes. Ref. [
25
] proposes a novel pivotal
node identification module that identifies and models pivotal nodes and their complex
spatio-temporal dependencies in the traffic network.
Benefiting from the ability of Transformers to capture long-range dependencies, some
research has integrated GNNs with Transformers to capture the long-term spatial-temporal
dependencies of traffic data [
26
,
27
]. Ref. [
28
] proposes a hierarchical framework that
combines transformer networks and saptio-temporal graph convolutional networks to
simultaneously capture the long-term temporal dependencies and short-term temporal and
spatial dependencies within traffic data. Ref. [
29
] proposes a novel bidirectional spatial-
temporal adaptive transformer model, which improves the accuracy and efficiency of urban
traffic flow forecasting by dynamically adjusting computational loads and utilizing the
reconstruction of past traffic conditions.
Sensors 2024,24, 7086 4 of 13
However, these methods overlook the dynamics of spatial dependence in traffic data.
Merely considering the distance or spatial position of roads, or learning a static relationship
from traffic data, fails to accurately capture the dynamic changes in road dependencies.
To address this, we have innovatively proposed an adaptive graph convolutional module.
This module adaptively constructs unique spatial relationships for traffic data across
various temporal scales.
3. Traffic Flow Forecasting Problem Formulation
Traffic flow data often includes the observation data
XRN×T×D={x1
,
x2
,
···
,
xT}
and the corresponding road network structure
G
, where
xtRN×D
describes the traf-
fic flow at the
t
-th period. Here,
N
denotes the number of roads, and
D
represents the
dimension of features associated with each road at a given time. The road network struc-
ture reflects the topological connection relationship between roads and can be defined as
G= (V,E,A)
, the node set
V={v1
,
v2
,
···
,
vN}
of graph
G
represents the roads in the
traffic network. Each node
vi
corresponds to a road in the traffic network, and the edges
eij E
between nodes represent the connectivity between roads. The topological structure
of an undirected graph can be expressed as an adjacency matrix
ARN×N
, which is
determined by the edge set
E
. If
eij E
, then
aij =
1, otherwise
aij =
0. In this paper,
given
T
step historical observation data, the traffic flow forecasting task aims at learning
a function f
(·)
to forecast the future values
b
XT+h
at time step
T+h
. The problem can be
formulated as follows:
b
XT+h=f(X,G,θ), (1)
where θdenotes all learnable parameters.
4. TEA-GCN Model Architecture and Methodology
In this section, we delve into the details of our proposed transformer-enhanced adap-
tive graph convolutional network(TEA-GCN), a novel approach designed to address the
challenges of capturing dynamic spatial dependencies and long-term temporal depen-
dencies in traffic flow forecasting. As illustrated in Figure 1, TEA-GCN is composed of
local-global temporal attention layers to capture both long-term and short-term temporal
dependence of the traffic data, as well as adaptive graph convolutional layers to learn adja-
cency matrices that best represent the spatial dependencies at different time levels. In the
following, we will first introduce the local-global temporal attention layer in
Section 4.1
,
followed by the adaptive graph convolutional layer in Section 4.2, then present the pre-
dictor and objective function in Section 4.3, and conclude with a complexity analysis in
Section 4.4.
4.1. Local-Global Temporal Attention Layer
To effectively extract the temporal correlation of traffic data, we design a local-global
temporal attention module. This module is designed to address the limitations of traditional
CNN or RNN based recursive methods, which often struggle with capturing long-term
dependencies due to issues like vanishing or exploding gradients. Our approach sidesteps
these challenges by employing a non-recursive mechanism that extracts both short-term
fluctuations and long-term trends.
Local Temporal Dependency Extraction: We utilize the traditional one-dimensional
temporal convolution to mine short-term temporal dependencies within traffic data.
Global Temporal Dependency Extraction: We employ the Transformer architecture to
model long-range temporal dependencies within traffic data.
Local-Global Temporal Information Fusion: Then, we design the temporal informa-
tion fusion module to automatically capture the importance of different temporal
patterns and promote the collaboration at different time levels.
Sensors 2024,24, 7086 5 of 13
4.1.1. Local Temporal Dependency Extraction
During morning and evening peak hours, traffic flow is substantial, and the traffic
congestion on one road at a certain moment may affect traffic speeds at later times, so it is
necessary to consider the short-term temporal dependence of traffic flow in these situation.
In this paper, we utilize one-dimensional temporal convolution to extract local tempo-
ral dependencies within traffic flow data, capturing more sharper changes. Concurrently,
to prevent the issue of gradient decay and ensure the adequate information transmission,
we employ gated temporal convolution. This approach includes an update gate that regu-
lates the flow of information, maintaining the integrity of data transmission throughout
the model.
Given the input representation
Z(l1)
, the gated temporal convolution can be formu-
lated as follow,
T(l)=R(l)Z(l1)
=σ1(Z(l1)Θ1+b1)σ2(Z(l1)Θ2+b2),(2)
where
Θ1
,
Θ2
,
b1
and
b2
are learnable parameters.
denotes the convolution operation,
represents the Hadamard product operator.
σ1(·)
is the non-linear activation function
and we choose
tanh
in this paper. And
σ2(·)
is the
sigmoid(·)
activation function. The
sigmoid(·)
function in the updated gate transforms the element value between 0 and 1 to
control the proportion of information flowing into the next layer.
4.1.2. Global Temporal Dependency Extraction
Compared to the sequential processing manner of traditional CNN or RNN based
methods, the Transformer architecture employs a different approach by utilizing attention
mechanism. This allows the Transformer to process information in a parallelized fashion,
effectively recognizing dependencies between any two positions in the sequence. This
mechanism enables the Transformer to capture long-term dependencies within the sequence
data more effectively. In this paper, we utilize the Transformer to capture global temporal
dependencies within traffic flow data.
The global temporal dependency extraction module features a transformer encoder-
decoder network architecture, which is composed of three encoding blocks and three
decoding blocks. Each building block primarily comprises two sub-layers: a multi-head
self-attention mechanism and a fully-connected feed-forward network. We also insert
the residual connection and the layer normalization behind each sub-layer to prevent
degradation in the network.
Specifically, for the
l
-th layer of the encoder, the
(l
1
)
-th layer representation
H(l1)
is projected to the representation H(l)through the following equation,
headi=Att(H(l1)Wq
i,H(l1)Wk
i,H(l1)Wv
i)
MultiHead(H(l1)) = Concat(head1,· · · , headm)WO
Res(l)=Ln(MultiHead(H(l1)) + H(l1))
H(l)=max(0, Res(l)U(l)
1+c1)U(l)
2+c2
H(l)=Ln(H(l)+Res(l)),
(3)
where
Wq
i
,
Wk
i
,
Wv
i
,
WO
,
U(l)
1
,
U(l)
2
,
c1
and
c2
are the learnable parameters. Scaled dot-
product attention function
Att(Q
,
K
,
V) = softmax(QKT
dk
)V
is used to learn the correlation
between the two time steps.
Concat(·)
is the concatenated function and
Ln(·)
denotes the
layer normalization.
It should be noted that the input of the first encoder layer is slightly different, as it
takes the raw sequence
X
as input. Given that traffic flow data is a typical time series,
it is essential to incorporate the temporal position information of traffic flow data when
Sensors 2024,24, 7086 6 of 13
modeling temporal dependencies. To fully leverage the sequence order, the transformer
uses sine and cosine functions of different frequencies to add positional encodings to the
input embeddings in the first layer of the encoder,
PEpo s,2i=sin(pos/10,0002i/d)
PEpo s,2i+1=cos(pos/10,0002i/d),(4)
where
pos
denotes the position and
i
is the dimension of the each element in the se-
quence data.
However, the above-mentioned position embedding method exhibits long-range
periodicity, which makes it less effective at capturing the periodic patterns hidden in traffic
flow data. To solve this problem, we propose a new temporal position embedding function
with a given period,
PEk=sin(2πk/period), (5)
where
k
denotes the position of each element in the sequence, and
period
is the prede-
fined period.
Subsequently, we incorporate the positional embedding into the corresponding data
embedding, so that the proposed method can effectively encode the temporal position
information within the traffic flow data. By this way, the input sequence
e
X
of the first
layer in the encoder part can be represented as the combination of the raw input
X
and the
corresponding positional embedding PE,
e
X=X+PE. (6)
The decoder in the global temporal dependency extraction module has the symmetrical
structure with the encoder, which reconstructs the raw sequence from the embedding
representation
H(L
2)
. It is worth noting that the input to each layer in the decoder stack
consists of the output
H(L
2)
from the last encoder layer and the output from the previous
decoder layer. Therefore, the last layer of decoder can be formulated as,
headi=Att(H(L
2)Wq
i,H(L
2)Wk
i,H(L1)Wv
i)
MultiHead(H(L1)) = Concat(head1,· · · , headm)WO
Res(L)=Ln(MultiHead(H(L1)) + H(L1))
H(L)=max(0, Res(L)U(L)
1+c1)U(L)
2+c2
b
X=Ln(H(L)+Res(L)).
(7)
Finally, we utilize the learned middle-layer embedding representation
H(L
2)
to forecast
the traffic flow, and minimize the difference between the forecast traffic flow and its
corresponding ground truth
Y
to train the global temporal dependency extraction module.
Concurrently, to retain the characteristics of the raw sequence as much as possible, we
also minimize the reconstruction error between the reconstructed sequence
b
X
and the raw
sequence X. The corresponding loss function can be formulated as,
Lre =
b
XX1
Lpre =Conv(H(L))Y1,(8)
where Conv(·)represents 1-D temporal convolution.
4.1.3. Local-Global Temporal Information Fusion
In practice, the correlations between different time steps hidden in traffic flow data are
very complex. The local temporal dependency extraction module uses the gated temporal
convolution to capture the short-term temporal correlation of data, which is beneficial
Sensors 2024,24, 7086 7 of 13
to analyze the short-term changing trend of traffic flow affected by emergencies. While
the global temporal dependency extraction module exploits the attention mechanism to
capture the long-term temporal dependence of data, which is conducive to analyze the
periodic patterns in traffic flow data.
These two modules are specifically designed to analyze the changing trends of traffic
flow under different temporal patterns. Therefore, to fully benefit from the advantages
of these two modules and enhance the robustness of the proposed model, we creatively
design a local-global information fusion module to respectively weight and fuse these two
kinds of information layer-by-layer. In this way, the proposed model can fully explore the
abundant and implicit temporal dependencies hidden in traffic flow data.
Specifically, for the
l
-th layer, we concatenate the representations
T(l)
learned from
the local temporal dependency extraction module and the corresponding representa-
tion
H(l)
learned from the global temporal dependency extraction module through the
following equation,
Y(l)=Conv(Concat(T(l),H(l))). (9)
We learn the weights to each element of
Y(l)
according to their corresponding impor-
tance, so that the fusion representation can focus on the information that is more critical to
the current task and filter out irrelevant information. The fusion operation is formulated as,
F(l)=Att(T(l),H(l),Y(l)). (10)
Then, we transfer the fusion representation
F(l)
to the corresponding adaptive graph
convolutional layer to facilitate mining the spatial dependencies at different temporal levels.
4.2. Adaptive Graph Convolutional Module
The dependencies between roads change dynamically over time, and the explicit road
network structure cannot accurately reflect the real-time dependencies between roads.
Existing spatial-temporal graph convolution models learn a fixed adjacency matrix to
represent the dependencies between roads, which fails to accurately capture the dynamic
changes of the relationship between roads. To address this challenging problem, we
creatively propose an adaptive graph convolutional module that automatically learns the
dynamic spatial dependencies of traffic flow data layer-by-layer in an end-to-end manner.
Specifically, for the
l
-th adaptive graph convolutional layer, we formulate the adaptive
graph convolution operation as,
Z(l)=
K
k=0
Pk
aF(l)W(l)
k1+Pk
bF(l)W(l)
k2
+Ak
adpF(l)W(l)
k3+ (A(l)
att)kF(l)W(l)
k4,
(11)
where
W(l)
k1
,
W(l)
k2
,
W(l)
k3
and
W(l)
k4
are the weight matrix of the
l
-th layer. The main difference
between the proposed model and others lies in the topological structures, which is divided
into four parts: Pa,Pb,Aadp and Aatt.
The matrices
Pa
and
Pb
represent the fixed physical connections between the roads,
where Pa=A/rowsum(A)and Pb=AT/rowsum(AT).
The matrix
Aadp RN×N
denotes the parameterized adjacency matrix. In contrast to
the fixed physical structure, there are no constraints on the values within
Aadp
, and the
elements in
Aadp
can be optimized along with the network parameters during the training
procedure. By this way, the proposed model can learn a graph structure that is tailored for
the current traffic flow prediction task.
The matrix
A(l)
att RN×N
represents a data-dependent similarity graph that captures
the interaction relationships between roads. To obtain the adaptive graph for the
l
-th layer,
we calculate the inner product of the fused representation to confirm whether there is a
correlation between two roads and how strong this correlation is,
Sensors 2024,24, 7086 8 of 13
A(l)
att =softmax(Relu(F(l)·(F(l))T)), (12)
where we utilize the softmax function to normalize the similarity matrix.
Different adjacency matrices serve distinct roles, and their combination can effectively
capture the rich and implicit spatial dependencies within traffic flow data across various
temporal levels.
4.3. Predictor and Objective Function
Finally, we obtain the optimal representation
Z
and send it to a predictor to generate
the final traffic flow prediction results. In addition, to minimize the loss of information, we
utilize both the mean absolute error and the mean square error to measure the difference
between the prediction result and its corresponding ground truth values,
Lmae =Conv(Z)Y1
Lmse =Conv(Z)Y2.(13)
By combining the reconstruction loss
Lre
, the prediction error
Lpre
, the mean absolute
error
Lmae
, and the mean square error
Lmse
, we derive the total objective function for the
proposed model as follows,
min L=λ1Lre +λ2Lpre +λ3Lmae +λ4Lmse, (14)
where hyper-parameters λ1,λ2,λ3, and λ4balance the importance of different losses.
4.4. Complexity Analysis
For the input data features
X RN×T×C
and the fixed road network structure
A RN×N,N,Tand C
respectively denote the number of sensors, the sequence length
and the feature dimension.
Cin
and
Cout
respectively represent the number of input and
output channels of the network.
K
denotes the kernel size of the temporal convolutional
network. Dis the mapping dimension of the transformer.
The major time-consuming burdens of the proposed method consists of the four
main modules:
For the Local Temporal Dependency Extraction module, the main computational
complexity is
O(N×(TK+
1
)×K×Cin ×Cout )
due to the convolution operations.
The Global Temporal Dependency Extraction module is mainly composed of the
transformer encoder-decoder network, thus its complexity is
O(N×T×D2+N×
D×T2).
For the Local-Global Temporal Information Fusion module, it mainly performs a
convolution operation and an attention calculation, so its complexity is
O(N×T×
Cin ×Cout +N×D×T2).
The Adaptive graph convolutional module mainly conducts the matrix multiplication
operation, thus its complexity is O(N2×T×Cin +N×T×Cin ×Cout).
Overall, Since
K
is regarded as a constant, the total complexity of the proposed method is
about O(N2×T×Cin +N×T×Cin ×Cout +N×T×D2).
5. Experimental Evaluation and Performance Analysis
In this section, we evaluate our proposed TEA-GCN model on two widely-used traffic
datasets and compared it with 8 baseline methods. First, we introduce the experimental
settings in Section 5.1, followed by the experimental results analysis in Section 5.2.
Sensors 2024,24, 7086 9 of 13
5.1. Experimental Settings
5.1.1. Datasets
We evaluate the proposed method on three type of traffic datasets, including one road
network traffic flow dataset PeMSD7(M) [
30
] and one metro passenger flow dataset Beijing
Metro [31].
PeMSD7(M) (https://pems.dot.ca.gov/?dnode=Clearinghouse, accessed on 30 Oc-
tober 2024) collects the traffic speed data from 228 sensors deployed in District 7 of
California, and the period is the weekdays from May to June in 2012 with a time
interval of 5 min. Such dataset is a popular benchmark in the traffic forecasting tasks.
Beijing Metro captures the passenger flow data from 325 stations of Beijing metro
system during August 2015, with a time interval of 5 min.
PeMS-BAY collects the driving speed data from 325 sensors located in the Bay Area,
covering the period from 1 January 2017 to 31 May 2017.
5.1.2. Compared Methods
Eight state-of-the-art traffic flow forecasting methods are chosen as the baseline,
LVSR [32]utilizes the support vector regression to predict the traffic flow.
FNN [
17
]is an auto-encoder network designed to learn compressed representations of
the traffic data, which can then be used for forecasting.
FC-LSTM [
33
]exploits long short-term neural network (LSTM NN) to capture the
long-term temporal dependencies of traffic data.
STGCN [
20
]models the traffic network as a graph, and uses the spatio-temporal
graph convolutional networks to extract spatio-temporal correlations features from the
traffic data.
DCRNN [
13
]combines diffusion convolution and gated recurrent units (GRU) to
capture the spatial and temporal dependencies of traffic flow data, respectively.
GWN [
16
]captures implicit spatial dependencies hidden in traffic data by learning an
adaptive graph, and uses the dilated casual convolution to capture temporal dependencies.
STSGCN [
14
]adopts a spatial-temporal synchronous graph convolution module to
capture local spatial-temporal dependencies from traffic data.
STFGNN [
22
]combines the gated dilated CNN module and the spatial-temporal fu-
sion graph module to simultaneously capture the local and global correlations of
traffic data
.
We choose three commonly-used evaluation metrics to measure the performance of
our proposed method, i.e., Mean Absolute Error (MAE), Mean Absolute Percentage Error
(MAPE) and Root Mean Square Error (RMSE). Since Beijing Subway is idle between 23:00
and 5:00, the Beijing Metro dataset exists a large number of 0. Thus, we cannot calculate its
MAPE. It is worth noting that the lower the values are, the better the performance is.
5.1.3. Parameters Settings
Following the work in [
28
], we partitioned the PeMSD7(M) dataset into a ratio of 7:1:2,
and the Beijing Metro dataset into a ratio of 8:1:1, to form the respective training, validation,
and testing sets.
For the comparison methods, we executed the original code obtained from the au-
thors’ personal homepages and followed the parameter settings reported in the original
papers to ensure optimal experimental results. Specifically, LSVR employs a linear kernel
with a penalty term of 0.001. The FNN model consists of a three-layer fully connected
network with dimensions of 12-128-64. The FC-LSTM model comprises a two-layer stacked
LSTM network. The STGCN model consists of multiple ST-Conv blocks, each ST-Conv
block featured three hidden layers with dimensions of 64-16-64. Both the size of graph
convolutional kernels and temporal convolutional kernels are set to 3. The GWN model is
composed of eight layers with a sequence of dilation factors 1, 2, 1, 2, 1, 2, 1, 2. The STSGCN
model comprises four spatial-temporal synchronous graph convolutional layers (STSG-
CLs), with each STSGCL incorporating three graph convolutional operations, each using
64 filters. The STFGNN model includes three spatial-temporal fusion graph neural layers
Sensors 2024,24, 7086 10 of 13
(STFGNLs), each consisting of eight independent spatial-temporal fusion graph neural
modules (STFGNMs) and one gated convolution module with a dilation rate of 3. All
models are optimized using the Adam optimizer with a learning rate of 0.001.
Our model has three spatial-temporal blocks and one predictor. The dimension of
network layer is set as 1-32-32-32-64-128-1. This architecture was determined after a series
of ablation studies where we varied the number of layers to balance model complexity
and forecasting accuracy.
λ1
,
λ2
,
λ3
, and
λ4
balance the corresponding loss items in the
objective function, and we set values of these hyper-parameters as 0.001, 0.5, 1.0 and 0.1,
respectively. We use the RMSProp optimizer to train our model, where the learning rate is
set to 0.001. We set the batch size of PeMSD7(M) and Beijing Metro to 20 and 16, respectively.
To alleviate the overfitting problem, we set the dropout to 0.3.
All the experiments are implemented in the environment of PyTorch 1.9.0 version,
and carried out on a workstation with NVIDIA RTX 3090 GPU manufactured by NVIDIA
Corporation, Santa Clara, CA, USA, AMD Ryzen 9 5900X 12-Core Processor manufactured
by Advanced Micro Devices, Inc., Santa Clara, CA, USA, and 128G RAM manufactured by
Kingston, CA, USA.
5.2. Experimental Results Analysis
We validate the proposed model on two datasets, and the results are shown in
Tables 13.
Obviously, the proposed model achieves the best prediction performance.
The specific experiment results are analyzed as follows.
Table 1. Performance comparison on the PeMSD7(M) traffic flow dataset. We mark the best-
performing results by bolded font.
Models 15 min 30 min 60 min
MAE MAPE (%) RMSE MAE MAPE (%) RMSE MAE MAPE (%) RMSE
LSVR 2.49 5.91 4.55 3.46 8.42 6.44 4.94 12.41 9.08
FNN 2.53 6.05 4.46 3.73 9.48 6.46 5.28 13.73 8.75
FC-LSTM 3.57 8.60 6.20 3.92 9.55 7.03 4.16 10.10 7.51
STGCN 2.25 5.26 4.04 3.05 7.33 5.70 4.04 9.77 7.55
DCRNN 2.37 5.54 4.21 3.31 8.06 5.96 4.01 9.99 7.19
GWN 2.82 6.80 4.80 3.90 8.93 6.94 4.89 12.90 9.01
STSGCN 2.59 6.19 4.91 3.34 8.18 6.59 4.62 11.71 8.75
STFGNN 2.47 5.86 4.54 3.23 8.10 6.27 4.21 10.35 8.07
TEA-GCN 2.10 4.90 3.96 2.80 6.98 5.42 3.72 9.72 7.04
Table 2. Performance comparison on the Beijing Metro flow dataset. We mark the best-performing
results by bolded font.
Models 15 min 30 min 45 min
MAE RMSE MAE RMSE MAE RMSE
LSVR 14.71 25.12 16.55 31.33 17.75 32.93
FNN 11.01 23.61 14.46 31.22 18.78 40.75
FC-LSTM 10.76 21.22 12.27 22.33 12.86 23.74
STGCN 7.83 16.81 9.56 17.92 10.16 20.29
DCRNN 8.37 19.13 9.46 23.38 11.63 25.87
GWN 11.91 24.64 14.24 29.24 16.14 36.64
STSGCN 10.65 20.71 12.24 24.03 16.22 33.23
STFGNN 9.13 17.47 9.06 18.50 11.72 22.39
TEA-GCN 7.01 15.32 7.79 17.25 8.92 19.21
Sensors 2024,24, 7086 11 of 13
Table 3. Performance comparison on the PeMS-BAY traffic flow dataset. We mark the best-performing
results by bolded font.
Models 15 min 30 min 60 min
MAE MAPE (%) RMSE MAE MAPE (%) RMSE MAE MAPE (%) RMSE
LSVR 1.85 3.80 3.59 2.48 5.50 5.18 3.28 8.00 7.08
FNN 2.20 5.19 4.42 2.30 5.43 4.63 2.46 5.89 4.89
FC-LSTM 2.95 4.81 4.19 3.97 5.25 4.55 4.74 5.79 4.96
STGCN 1.39 3.00 2.92 1.84 4.22 4.12 2.42 5.58 5.33
DCRNN 1.38 2.90 2.95 1.74 3.90 3.97 2.07 4.92 4.74
GWN 1.30 2.69 2.74 1.64 3.63 3.72 1.93 4.53 4.46
STSGCN 1.57 4.34 4.42 1.98 4.64 4.51 2.53 6.13 5.97
STFGNN 1.47 3.14 3.04 1.91 4.32 4.28 2.44 6.07 5.54
TEA-GCN 1.28 2.64 2.72 1.68 3.80 3.73 2.03 4.91 4.60
LSVR, FNN, and FC-LSTM aim to uncover the underlying patterns of traffic flow
changes within historical data by modeling the temporal dependencies of traffic data.
However, these temporal based methods often overlook the complex spatial structure
inherent in traffic data. Obviously, the traffic speed on one road is not only determined by
its own historical data, but also affected by the traffic speeds on adjacent roads. Therefore,
it is difficult to accurately analyze the traffic flow trends by considering only the temporal
dependencies within a single road segment. The experimental results show that the
prediction performance of spatial relationship modeling methods (e.g., STGCN, DCRNN)
is significantly better than that of purely temporal based methods, which fully proves
the effectiveness of GNNs in capturing the complex spatial-temporal dependencies of
traffic data.
The GNN-based traffic flow prediction methods integrate Graph Neural Networks
(GNNs) with Convolutional Neural Networks (CNNs) or Recurrent Neural Networks
(RNNs), aiming to capture both the temporal and spatial correlations within traffic data
simultaneously. However, these methods overlook the dynamic of spatial dependence of
traffic data over time; the traffic speed on one road at different times is affected by the cor-
relation of different roads. The traffic data exhibit complex and implicit spatial correlation
relationships. Focusing solely on the physical connections between roads or learning a
static correlation relationship from the data fails to fully capture the dynamic characteristics
of spatial dependence of traffic data, which leads to the sub-optimal performance of traffic
flow prediction models.
We creatively propose a novel adaptive graph convolutional module to alternately
capture the spatial and temporal dependencies of traffic data. This module learns a unique
spatial structure for traffic data at various time levels in an end-to-end manner, effectively
capturing the rich and implicit spatial dependencies inherent in traffic data. At the same
time, our proposed local-global temporal attention module synchronously explores the long-
term and short-term temporal dependencies hidden in the traffic data in a non-recursive
manner, which can capture the temporal dependencies at different temporal levels for
different tasks and effectively improve the robustness of the model.
6. Conclusions
In this paper, we propose a novel transformer-enhanced adaptive graph convolutional
network for the traffic flow forecasting task, which contains two important modules.
Specifically, an adaptive graph convolutional module is designed to capture the dynamic
road spatial dependencies, and a local-global temporal attention module simultaneously
captures the long-term and short-term temporal dependencies of traffic data. Our proposed
model is able to alternately learn the temporal and spatial correlations layer-by-layer, which
better reflects the spatio-temporal dependencies among roads and effectively improves
the robustness of the proposed model. The traffic flow prediction results on two traffic
Sensors 2024,24, 7086 12 of 13
datasets verify the superiority of the proposed model. In the future, we plan to focus
on the adaptability of the model to various urban settings and integrates with additional
data sources such as weather conditions, event calendars, or social media data to further
enhance the predictive accuracy.
Author Contributions: Conceptualization, X.L.; methodology, X.H.; software, X.H.; validation, X.H.;
formal analysis, X.Z.; investigation, X.Z.; resources, X.L.; data curation, X.H.; writing—original draft
preparation, X.H.; writing—review and editing, X.Z. and W.Z.; visualization, X.H.; supervision, W.Z.
and X.L.; project administration, X.L. All authors have read and agreed to the published version of
the manuscript.
Funding: This research was funded by Science and Technology Research Project of Jiangxi Provincial
Department of Education grant number GJJ2206601.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The original contributions presented in this study are included in the
article material. Further inquiries can be directed to the corresponding author.
Conflicts of Interest: The authors declare no conflicts of interest.
References
1.
Jiao, P.; Li, R.; Sun, T.; Hou, Z.; Ibrahim, A. Three Revised Kalman Filtering Models for Short-Term Rail Transit Passenger Flow
Prediction. Math. Probl. Eng. 2016,216, 9717582. [CrossRef]
2.
Wang, Y.; Papageorgiou, M. Real-time freeway traffic state estimation based on extended Kalman filter: A general approach.
Transp. Res. Part B Methodol. 2005,39, 141–167. [CrossRef]
3.
Van, H.; Chris, P.; Schreiter, T.; Zuurbier, F.; Van, L.; Van, Z. Localized extended kalman filter for scalable real-time traffic state
estimation. IEEE Trans. Intell. Transp. Syst. 2012,13, 385–394.
4.
Fu, R.; Zhang, Z.; Li, L. Using LSTM and GRU neural network methods for traffic flow prediction. In Proceedings of the
31st Youth Academic Annual Conference of Chinese Association of Automation (YAC), Wuhan, China, 11–13 November 2016;
pp. 324–328.
5.
Williams, B.; Hoel, L. Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: Theoretical basis and
empirical results. J. Transp. Eng. 2003,129, 664–672. [CrossRef]
6. Liu, J.; Guan, W. A summary of traffic flow forecasting methods. J. Highw. Transp. Res. Dev. 2004,21, 82–85.
7. Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907.
8.
Gilmer, J.; Schoenholz, S.S.; Riley, P.F.; Vinyals, O.; Dahl, G.E. Neural Message Passing For Quantum Chemistry. In Proceedings
of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 2053–2070.
9. Petar, V.; Guillem, C.; Arantxa, C.; Adriana, R.; Pietro, L.; Yoshua, B. Graph Attention Networks. arXiv 2017, arXiv:1710.10903.
10.
Hamilton, W.; Ying, R.; Leskovec, J. Inductive Representation Learning on Large Graphs. In Proceedings of the 31st International
Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 1025–1035.
11.
Seo, Y.; Defferrard, M.; Vandergheynst, P.; Bresson, X. Structured sequence modeling with graph convolutional recurrent networks.
In International Conference on Neural Information Processing; Springer: Cham, Switzerland, 2018; pp. 362–373.
12.
Hu, J.; Chen, L. Multi-Attention based spatial-temporal graph convolution networks for traffic flow forecasting. In Proceedings
of the International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021; pp. 1–7.
13.
Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv 2017,
arXiv:1707.01926.
14.
Song, C.; Lin, Y.; Guo, S.; Wan, H. Spatial-temporal synchronous graph convolutional networks: A new framework for
spatial-temporal network data forecasting. AAAI Conf. Artif. Intell. 2020,34, 914–921. [CrossRef]
15.
Guo, S.; Lin, Y.; Feng, N.; Song, C.; Wan, H. Attention based spatial-temporal graph convolutional networks for traffic flow
forecasting. AAAI Conf. Artif. Intell. 2019,33, 922–929. [CrossRef]
16.
Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Zhang, C. Graph wavenet for deep spatial-temporal graph modeling. In Proceedings of the
28th International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 1907–1913.
17.
Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006,313, 504–507.
[CrossRef] [PubMed]
18.
Lv, Y.; Duan, Y.; Kang, W.; Li, Z.; Wang, F. Traffic flow prediction with big data: A deep learning approach. IEEE Trans. Intell.
Transp. Syst. 2015,16, 865–873. [CrossRef]
19.
Bai, L.; Yao, L.; Li, C.; Wang, X.; Wang, C. Adaptive graph convolutional recurrent network for traffic forecasting. In Proceedings
of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020;
pp. 17804–17815.
Sensors 2024,24, 7086 13 of 13
20.
Yu, B.; Yin, H.; Zhu, Z. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. In Pro-
ceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018;
pp. 3634–3640
.
21.
Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Lin, T.; Deng, M.; Li, H. T-GCN: A temporal graph convolutional network for traffic
prediction. IEEE Trans. Intell. Transp. Syst. 2020,21, 3848–3858. [CrossRef]
22.
Li, M.; Zhu, Z. Spatial-temporal fusion graph neural networks for traffic flow forecasting. Proc. AAAI Conf. Artif. Intell. 2021,
35, 4189–4196. [CrossRef]
23.
Zhao, Y.; Luo, X.; Ju, W.; Chen, C.; Hua, X.; Zhang, M. Dynamic Hypergraph Structure Learning for Traffic Flow Forecasting. In
Proceedings of the IEEE 39th International Conference on Data Engineering, Anaheim, CA, USA, 3–7 April 2023; pp. 2303–2316.
[CrossRef]
24.
Ju, W.; Zhao, Y.; Qin, Y.; Yi, S.; Yuan, J.; Xiao, Z.; Luo, X.; Yan, X.; Zhang, M. COOL: A Conjoint Perspective on Spatio-Temporal
Graph Neural Network forTraffic Forecasting. Inf. Fusion 2024,107, 1–11. [CrossRef]
25.
Kong, W.; Guo, Z.; Liu, Y. Spatio-Temporal Pivotal Graph Neural Networks for Traffic Flow Forecasting. AAAI Conf. Artif. Intell.
2024,38, 8627–8635. [CrossRef]
26.
Guo, S.; Lin, Y.; Wan, H.; Li, X.; Cong, G. Learning dynamics and heterogeneity of spatial-temporal graph data for traffic
forecasting. IEEE Trans. Knowl. Data Eng. 2021,34, 5415–5428. [CrossRef]
27.
Ye, X.; Fang, S.; Sun, F.; Zhang, C.; Xiang, S. Meta Graph Transformer: A Novel Framework for Spatial–Temporal Traffic Prediction.
Neurocomputing 2021,491, 544–563. [CrossRef]
28.
Huo, G.; Zhang, Y.; Wang, B.; Gao, J.; Hu, Y.; Yin, B. Hierarchical Spatio–Temporal Graph Convolutional Networks and
Transformer Network for Traffic Flow Forecasting. IEEE Trans. Intell. Transp. Syst. 2023,24, 3855–3867. [CrossRef]
29.
Chen, C.; Liu, Y.; Chen, L.; Zhang, C. Bidirectional Spatial-Temporal Adaptive Transformer for Urban Traffic Flow Forecasting.
IEEE Trans. Neural Netw. Learn. Syst. 2023,34, 6913–6925. [CrossRef]
30.
Chen, C.; Petty, K.; Skabardonis, A.; Varaiya, P.; Jia, Z. Freeway performance measurement system: Mining loop detector data.
Transp. Res. Rec. 2001,1748, 96–102. [CrossRef]
31.
Wang, J.; Zhang, Y.; Wei, Y.; Hu, Y.; Piao, X.; Yin, B. Metro passenger flow prediction via dynamic hypergraph convolution
networks. IEEE Trans. Intell. Transp. Syst. 2021,22, 7891–7903. [CrossRef]
32.
Wu, C.; Ho, J.; Lee, D. Travel-time prediction with support vector regression. IEEE Trans. Intell. Transp. Syst. 2004,5, 276–281.
[CrossRef]
33.
Ma, X.; Tao, Z.; Wang, Y.; Yu, H.; Wang, Y. Long short-term memory neural network for traffic speed prediction using remote
microwave sensor data. Transp. Res. Part C Emerg. Technol. 2015,54, 187–197. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
... Subsequently, researchers have pivoted their focus from deep learning models that solely focus on graph structure 3,4 to integrating graph structure information, thereby propelling the advancement of graph neural network (GNN)-based methodologies 5 . In recent times, GNN have emerged as leading contenders at the vanguard of deep learning across numerous applications 6 . ...
Article
Full-text available
In the realm of traffic prediction, emerging are methodologies founded on graph convolutional networks. Nonetheless, existing approaches grapple with issues encompassing insufficient sharing patterns, dependence on static relationship presumptions, and an inability to effectively grasp the intricate trends and cyclic attributes of traffic flow. To tackle this challenge, we introduce a novel framework termed Dynamic Graph Convolutional Networks with Temporal Representation Learning for Traffic Flow Prediction (DGCN-TRL). Specifically, a temporal graph convolution block is specifically devised, treating historical time slots as graph nodes and employing graph convolution to process dynamic time series. This approach effectively captures flexible global temporal dependencies, enhancing the model’s ability to comprehend current traffic conditions. Subsequently, a novel dynamic graph constructor is introduced to explore spatial correlations between nodes at specific times and dynamic temporal dependencies across different time points. This meticulous exploration uncovers dynamic spatiotemporal relationships. Finally, a novel temporal representation learning module is developed utilizing a masked subsequence transformer to predict the content of masked subsequences from a fraction of unmasked subsequences and their temporal contexts in a pre-trained manner. This design encourages the model to adeptly learn temporal representations of contextual subsequences from extensive historical data. Empirical evaluations on four real datasets substantiate the superior performance of DGCN-TRL compared to existing methodologies.
... The goal of pose forecasting is to provide accurate predictions of future poses, which can have practical applications in a wide range of fields. For example, in robotics, pose forecasting models enable robots to infer human intentions and predict future movements, facilitating safer, more intuitive collaboration in environments such as manufacturing floors, healthcare, and assistive robotics [1][2][3][4][5][6]. In sports analytics, forecasting player trajectories and body orientations several moments ahead supports tactical decision-making, performance evaluation, and even automated highlight generation. ...
Article
Full-text available
Highlights This paper presents the GCN-Transformer, a novel deep learning model that integrates Graph Convolutional Networks (GCNs) and Transformers to enhance multi-person pose forecasting. The model effectively captures both spatial and temporal dependencies, improving the performance of pose forecasting. Additionally, a new evaluation metric, Final Joint Position and Trajectory Error (FJPTE), is introduced to provide a more comprehensive assessment of movement dynamics. These contributions establish GCN-Transformer as a state-of-the-art solution in pose forecasting. What are the main findings?We introduce GCN-Transformer, a novel architecture combining Graph Convolutional Networks (GCNs) and Transformers for multi-person pose forecasting. We propose a new evaluation metric, Final Joint Position and Trajectory Error (FJPTE), which comprehensively assesses both local and global movement dynamics. What is the implication of the main finding?GCN-Transformer achieves state-of-the-art performances on the CMU-Mocap, MuPoTS- 3D, SoMoF Benchmark, and ExPI datasets, demonstrating superior generalization across different motion scenarios. The proposed FJPTE metric improves the evaluation of pose forecasting models by accounting for both movement trajectory and final position, enabling better assessments of motion realism. Abstract Multi-person pose forecasting involves predicting the future body poses of multiple individuals over time, involving complex movement dynamics and interaction dependencies. Its relevance spans various fields, including computer vision, robotics, human–computer interaction, and surveillance. This task is particularly important in sensor-driven applications, where motion capture systems, including vision-based sensors and IMUs, provide crucial data for analyzing human movement. This paper introduces GCN-Transformer, a novel model for multi-person pose forecasting that leverages the integration of Graph Convolutional Network and Transformer architectures. We integrated novel loss terms during the training phase to enable the model to learn both interaction dependencies and the trajectories of multiple joints simultaneously. Additionally, we propose a novel pose forecasting evaluation metric called Final Joint Position and Trajectory Error (FJPTE), which assesses both local movement dynamics and global movement errors by considering the final position and the trajectory leading up to it, providing a more comprehensive assessment of movement dynamics. Our model uniquely integrates scene-level graph-based encoding and personalized attention-based decoding, introducing a novel architecture for multi-person pose forecasting that achieves state-of-the-art results across four datasets. The model is trained and evaluated on the CMU-Mocap, MuPoTS-3D, SoMoF Benchmark, and ExPI datasets, which are collected using sensor-based motion capture systems, ensuring its applicability in real-world scenarios. Comprehensive evaluations on the CMU-Mocap, MuPoTS-3D, SoMoF Benchmark, and ExPI datasets demonstrate that the proposed GCN-Transformer model consistently outperforms existing state-of-the-art (SOTA) models according to the VIM and MPJPE metrics. Specifically, based on the MPJPE metric, GCN-Transformer shows a 4.7% improvement over the closest SOTA model on CMU-Mocap, 4.3% improvement over the closest SOTA model on MuPoTS-3D, 5% improvement over the closest SOTA model on the SoMoF Benchmark, and a 2.6% improvement over the closest SOTA model on the ExPI dataset. Unlike other models with performances that fluctuate across datasets, GCN-Transformer performs consistently, proving its robustness in multi-person pose forecasting and providing an excellent foundation for the application of GCN-Transformer in different domains.
Article
Traffic flow prediction is beneficial to future intelligent traffic management and urban planning. However, existing studies have limitations in dealing with spatio-temporal dependencies and multimodal data fusion. It is difficult to comprehensively capture long-term dependencies and short-term spatial relationships, which limit prediction accuracy. In this paper, a multimodal traffic flow prediction model based on the Spatio-Temporal Mixed Attention Network (ST-MANet) is proposed to address these issues in urban traffic flow prediction. The model’s global attention is used to capture long-term trends and periodic patterns in historical data, while local attention is employed to capture short-term spatial relationships among adjacent regions or nodes. Moreover, the model fuses multimodal data, such as weather and holiday information, and introduces an adaptive attention scaling module to dynamically adjust the weights of global and local information, thereby enhancing the model’s robustness and adaptability. Experimental results on the real traffic datasets PeMS04 and PeMS08 show that ST-MANet outperforms traditional methods and other mainstream models in terms of prediction precision and robustness.
Article
Full-text available
Traffic flow prediction can guide the rational layout of land use. Accurate traffic flow prediction can provide an important basis for urban expansion planning. This paper introduces a personalized lightweight federated learning framework (PLFL) for traffic flow prediction. This framework has been improved and enhanced to better accommodate traffic flow data. It is capable of collaboratively training a unified global traffic flow prediction model without compromising the privacy of individual datasets. Specifically, a spatiotemporal fusion graph convolutional network (MGTGCN) is established as the initial model for federated learning. Subsequently, a shared parameter mechanism of federated learning is employed for model training. Customized weights are allocated to each client model based on their data features to enhance personalization during this process. In order to improve the communication efficiency of federated learning, dynamic model pruning (DMP) is introduced on the client side to reduce the number of parameters that need to be communicated. Finally, the PLFL framework proposed in this paper is experimentally validated using LPR data from Changsha city. The results demonstrate that the framework can still achieve favorable prediction outcomes even when certain clients lack data. Moreover, the communication efficiency of federated learning under this framework has been enhanced while preserving the distinct characteristics of each client, without significant interference from other clients.
Article
Full-text available
Metro passenger flow prediction is a strategically necessary demand in an intelligent transportation system to alleviate traffic pressure, coordinate operation schedules, and plan future constructions. Graph-based neural networks have been widely used in traffic flow prediction problems. Graph Convolutional Neural Networks (GCN) captures spatial features according to established connections but ignores the high-order relationships between stations and the travel patterns of passengers. In this paper, we utilize a novel representation to tackle this issue-hypergraph. A dynamic spatio-temporal hypergraph neural network to forecast passenger flow is proposed. In the prediction framework, the primary hypergraph is constructed from metro system topology and then extended with advanced hyperedges discovered from pedestrian travel patterns of multiple time spans. Furthermore, hypergraph convolution and spatio-temporal blocks are proposed to extract spatial and temporal features to achieve node-level prediction. Experiments on historical datasets of Beijing and Hangzhou validate the effectiveness of the proposed method, and superior performance of prediction accuracy is achieved compared with the state-of-the-arts. Index Terms-Metro flow prediction, hypergraph, graph neural network.
Article
Traffic flow forecasting is a classical spatio-temporal data mining problem with many real-world applications. Recently, various methods based on Graph Neural Networks (GNN) have been proposed for the problem and achieved impressive prediction performance. However, we argue that the majority of existing methods disregarding the importance of certain nodes (referred to as pivotal nodes) that naturally exhibit extensive connections with multiple other nodes. Predicting on pivotal nodes poses a challenge due to their complex spatio-temporal dependencies compared to other nodes. In this paper, we propose a novel GNN-based method called Spatio-Temporal Pivotal Graph Neural Networks (STPGNN) to address the above limitation. We introduce a pivotal node identification module for identifying pivotal nodes. We propose a novel pivotal graph convolution module, enabling precise capture of spatio-temporal dependencies centered around pivotal nodes. Moreover, we propose a parallel framework capable of extracting spatio-temporal traffic features on both pivotal and non-pivotal nodes. Experiments on seven real-world traffic datasets verify our proposed method's effectiveness and efficiency compared to state-of-the-art baselines.
Article
Graph convolutional networks (GCN) have been applied in the traffic flow forecasting tasks with the graph capability in describing the irregular topology structures of road networks. However, GCN based traffic flow forecasting methods often fail to simultaneously capture the short-term and long-term temporal relations carried by the traffic flow data, and also suffer the over-smoothing problem. To overcome the problems, we propose a hierarchical traffic flow forecasting network by merging newly designed the long-term temporal Transformer network (LTT) and the spatio-temporal graph convolutional networks (STGC). Specifically, LTT aims to learn the long-term temporal relations among the traffic flow data, while the STGC module aims to capture the short-term temporal relations and spatial relations among the traffic flow data, respectively, via cascading between the one-dimensional convolution and the graph convolution. In addition, an attention fusion mechanism is proposed to combine the long-term with the short-term temporal relations as the input of the graph convolution layer in STGC, in order to mitigate the over-smoothing problem of GCN. Experimental results on three public traffic flow datasets prove the effectiveness and robustness of the proposed method.
Article
Spatial-temporal data forecasting of traffic flow is a challenging task because of complicated spatial dependencies and dynamical trends of temporal pattern between different roads. Existing frameworks usually utilize given spatial adjacency graph and sophisticated mechanisms for modeling spatial and temporal correlations. However, limited representations of given spatial graph structure with incomplete adjacent connections may restrict effective spatial-temporal dependencies learning of those models. Furthermore, existing methods were out at elbows when solving complicated spatial-temporal data: they usually utilize separate modules for spatial and temporal correlations, or they only use independent components capturing localized or global heterogeneous dependencies. To overcome those limitations, our paper proposes a novel Spatial-Temporal Fusion Graph Neural Networks (STFGNN) for traffic flow forecasting. First, a data-driven method of generating “temporal graph” is proposed to compensate several genuine correlations that spatial graph may not reflect. STFGNN could effectively learn hidden spatial-temporal dependencies by a novel fusion operation of various spatial and temporal graphs, treated for different time periods in parallel. Meanwhile, by integrating this fusion graph module and a novel gated convolution module into a unified layer parallelly, STFGNN could handle long sequences by learning more spatial-temporal dependencies with layers stacked. Experimental results on several public traffic datasets demonstrate that our method achieves state-of-the-art performance consistently than other baselines.
Article
Urban traffic forecasting is the cornerstone of the intelligent transportation system (ITS). Existing methods focus on spatial-temporal dependency modeling, while two intrinsic properties of the traffic forecasting problem are overlooked. First, the complexity of diverse forecasting tasks is nonuniformly distributed across various spaces (e.g., suburb versus downtown) and times (e.g., rush hour versus off-peak). Second, the recollection of past traffic conditions is beneficial to the prediction of future traffic conditions. Based on these properties, we propose a bidirectional spatial-temporal adaptive transformer (Bi-STAT) for accurate traffic forecasting. Bi-STAT adopts an encoder–decoder architecture, where both the encoder and the decoder maintain a spatial-adaptive transformer and a temporal-adaptive transformer structure. Inspired by the first property, each transformer is designed to dynamically process the traffic streams according to their task complexities. Specifically, we realize this by the recurrent mechanism with a novel dynamic halting module (DHM). Each transformer performs iterative computation with shared parameters until DHM emits a stopping signal. Motivated by the second property, Bi-STAT utilizes one decoder to perform the present \rightarrow past recollection task and the other decoder to perform the present \rightarrow future prediction task. The recollection task supplies complementary information to assist and regularize the prediction task for a better generalization. Through extensive experiments, we show the effectiveness of each module in Bi-STAT and demonstrate the superiority of Bi-STAT over the state-of-the-art baselines on four benchmark datasets. The code is available at https://github.com/chenchl19941118/Bi-STAT.git .
Article
Accurate traffic prediction is critical for enhancing the performance of intelligent transportation systems. The key challenge to this task is how to properly model the complex dynamics of traffic while respecting and exploiting both spatial and temporal heterogeneity in data. This paper proposes a novel framework called Meta Graph Transformer (MGT) to address this problem. The MGT framework is a generalization of the original transformer, which is used to model vector sequences in natural language processing. Specifically, MGT has an encoder-decoder architecture. The encoder is responsible for encoding historical traffic data into intermediate representations, while the decoder predicts future traffic states autoregressively. The main building blocks of MGT are three types of attention layers named Temporal Self-Attention (TSA), Spatial Self-Attention (SSA), and Temporal Encoder-Decoder Attention (TEDA), respectively. They all have a multi-head structure. TSAs and SSAs are employed by both the encoder and decoder to capture temporal and spatial correlations. TEDAs are employed by the decoder, allowing every position in the decoder to attend all positions in the input sequence temporally. By leveraging multiple graphs, SSA can conduct sparse spatial attention with various inductive biases. To facilitate the model’s awareness of temporal and spatial conditions, Spatial-Temporal Embeddings (STEs) are learned from external attributes, which are composed of temporal attributes (e.g. sequential order, time of day) and spatial attributes (e.g. Laplacian eigenmaps). These embeddings are then utilized by all the attention layers via meta-learning, hence endowing these layers with Spatial-Temporal Heterogeneity-Aware (STHA) properties. Experiments on three real-world traffic datasets demonstrate the superiority of our model over several state-of-the-art methods. Our code and data are available at (http://github.com/lonicera-yx/MGT).
Article
Accurate traffic forecasting is critical in improving safety, stability, and efficiency of intelligent transportation systems. Despite years of studies, accurate traffic prediction still faces the following challenges, including modeling the dynamics of trafc data along both temporal and spatial dimensions, and capturing the periodicity and the spatial heterogeneity of trafc data, and the problem is more difficult for long-term forecast. In this paper, we propose an Attention based Spatial-Temporal Graph Neural Network (ASTGNN) for traffic forecasting. Specifically, in the temporal dimension, we design a novel self-attention mechanism that is capable of utilizing the local context, which is specialized for numerical sequence representation transformation. It enables our prediction model to capture the temporal dynamics of traffic data and to enjoy global receptive elds that is beneficial for long-term forecast. In the spatial dimension, we develop a dynamic graph convolution module, employing self-attention to capture the spatial correlations in a dynamic manner. Furthermore, we explicitly model the periodicity and capture the spatial heterogeneity through embedding modules. Experiments on five real-world traffic flow datasets demonstrate that ASTGNN outperforms the state-of-the-art baselines.