Content uploaded by Kobby Panford-Quainoo
Author content
All content in this area was uploaded by Kobby Panford-Quainoo on Oct 03, 2020
Content may be subject to copyright.
Bilateral Trade Modeling
with Graph Neural Networks
Kobby Panford-Quainoo∗
African Masters in Machine Intelligence
African Institute for Mathematical Sciences
Kigali, Rwanda
kpanford-quainoo@aimsammi.org
Avishek Joey Bose
Department of Computer Science
McGill University and Mila
Montreal, Canada
joey.bose@mail.mcgill.ca
Michaël Defferrard
Institute for Electrical Engineering
École Polytechnique Fédérale de Lausanne
Lausanne, Switzerland
michael.defferrard@epfl.ch
Abstract
Bilateral trade agreements confer preferred trading status between participating
countries, enabling increased trade and potential economic growth. Predicting
such trade flows often serve as important economic indicators used by economists
and policy makers with impactful ramifications in economic policies adopted by
respective countries. However, traditional approaches to predicting potential trade
partners is through the use of gravity methods which are cumbersome to define due
to the exponentially growing number of constants that need to be considered. In
this work, we present a framework for directly predicting bilateral trade partners
from observed trade records using graph representation learning. Furthermore,
we show as a downstream task that modeling bilateral trade as a graph allows for
the classification of countries into various income levels. Empirically, we observe
accuracies of up to
98%
for predicting trading partners and
68%
on income level
classification.
1 Introduction
International trade involves the exchange of goods, capital, and services between countries, and
where two countries are concerned, it is referred to as bilateral trade. Often, the deficit and surpluses
created via bilateral trade represent important economic development indicators, which drive the
adoption of specific domestic economic policies –i.e., relaxation of restrictions and trade barriers, in
either country. Consequently, various models have been employed by economists to understand trade
patterns and factors that account for the observed trade activities between countries. For instance,
the Ricardian model introduced the idea of comparative advantage of nations, whereby a country
exports more of the goods they can produce at a lower cost [
1
]. In a similar vein, the “factor of
abundance” argues that the trading behavior of a country is influenced by what they confidently
produce in abundance [2].
The most popular method with practical benefits is known as the Gravity Model of trade, which is
motivated by Newton’s law of gravitation. The gravity model relates the bilateral trade flows between
∗Code can be found at https://github.com/panford/BiTrade-Graphs.
Submitted to the African Institute for Mathematical Sciences for a Master’s degree in Machine Intelligence
(AIMS 2019), Rwanda.
two countries using the respective gross domestic product (GDP) of each country while taking into
account the geographical distance. Intuitively, trade flow is high when participating countries have
high GDPs and are geographically close to each other [
3
]. While the gravity model is an effective
empirical measure for bilateral trade flow, it lacks both a theoretical justification [
4
], as well as suffers
from practical limitations. In particular, model performance is dictated by defining handcrafted
features such as cultural differences and political terms that require significant domain knowledge.
models show that countries with high GDPs will have a high trade flow and will more likely trade with
each other compared to countries with low GDPs [
3
]. Again, trade flow is smaller if they are distant
apart and will less likely trade. Trade flow, therefore, serves as the basis for predicting potential trade
partners for countries. This is an important task in economics because it allows policymakers to relax
restrictions and trade barriers to foster the partnership between the countries and consequently expand
their economic capacities. One difficulty with using the gravity model is that a lot of other dummy
variables like cultural differences, political terms and others must be handcrafted and factored into
the equation. This makes the cost of capturing the information that actually affects trade patterns
very expensive.
Present work In this paper, we take a data-driven approach to modeling bilateral trade. We first
observe that trade flows can naturally be interpreted as a graph wherein countries are nodes and edges
represent countries undertaking bilateral trade. We leverage recent advances in graph representation
learning and predict trade links between countries, crucially without first estimating trade flow
heuristics. We further analyze the graphical structure of trade relationships between countries and
use it to power a supervised learning approach to predict the income levels of countries using graph
neural networks (GNNs). Empirically we observe
98%
and
68%
accuracies in predicting bilateral
trade links and income levels, respectively. Our work is motivated by the difficulty in estimating
the trade flow and we intend to tackle trade partner prediction and income level classification from a
graph perspective. Our main contributions are:
•To show that international trade data can naturally be modelled as a graph.
•
Directly predict trade links between countries without first having to estimate the trade flow
values between them.
•
To show that the trade relationship between countries could be a major ingredient when
predicting their income levels.
2 Background
2.1 Country Classification
The World Bank defines four income groups in the world. Namely: high-, upper middle, lower middle
and low-income. The division into these income groups is based on the total annual income called
the gross national income (GNI) per capita. The GNI of a country gives an idea of its economic
strength and weaknesses and in general, the standard of living of the average citizen. Countries
are classified into various income groups if their GNI falls within a certain threshold, defined in
table 1 [
5
]. This classification by income levels can be used to measure progress over time or analyse
data for countries falling into the same income groups.
income group GNI threshold
lower 1,006 and below
lower middle 1,006 - 3,955
upper middle 3,956 - 12,235
high 12,235 and above
Table 1: Country GNI and income group according to World Bank, 2017.
2.2 Bilateral Trade Flows
Inspired by Newton’s law of universal gravitation, the gravity model provides a theoretical approach
to representing the numerical trade strength between any two countries. The gravity model is used
2
to compute a trade flow value that shows that the strength of trade flow between any two countries
increases with increasing respective net income or GDP and decreasing with increasing distance [
3
,
6
].
It is expressed as
Fij =MGDPi·GDPj
Dij
,
where
Fij
is the trade flow between countries
i
an
j
,
GDPi
is the GDP of country
i
,
M
is a
proportionality constant and
Dij
is the geographical distance between countries
i
and
j
. A more
convenient way to deal with this equation is to express it in
log
and introduce coefficients and
placeholder variables to account for other unanticipated factors which are not exactly deterministic.
The gravity equation may then be expressed in the form
ln Fij =c0+c1ln GDPi+c2ln GDPj+c3ln Dij +c4d+c5Pij +ij ,
where ckare constants, Pij is a political influence term and ij is an error correction term.
2.3 Graph Neural Networks
Given a graph
G= (V, X, A,E)
, individual entities are referred to as nodes
V
with some characteristic
node features
X∈R|V|×D
, where
|V|
and
D
are the number of nodes and features respectively.
An edge
aij
is said to exist between nodes
i
and
j
if they are connected and vice versa. This can
be composed into a dense square adjacency matrix
A∈R|V|×|V |
which may be symmetric or not
depending on whether or not the graph is directed. Edges in a directed graph have arrows going from
one node
i
to another node
j
to show that node
i
is connected to node
j
. The reverse is not true and
aij 6=aji
. On the other hand, edges in an undirected graph have no direction, i.e.
aij =aji
. Edge
weights Eis numerical indication of the strength of relationship between the connected nodes.
Graph Neural Networks (GNNs) are a family of approaches that aim to generalize neural networks,
developed for Euclidean data, to graphs [
7
]. They can tackle tasks such as node [
8
,
9
], graph [
10
,
11
,
12
] and edge classification [
13
], link prediction [
14
] and node clustering [
15
] or community
detection [16].
Node, graph and edge classification problems involve discriminating between classes of nodes, graphs
and edges and providing labels to unlabeled ones at test time. Link prediction is predicting missing
links between two nodes in the graph. Clustering is an unsupervised learning technique that leverages
the similarities in features to put data points into inherent groups other than predefined target labels.
Node clustering and community detection
2
therefore seek to detect groups of nodes referred to as
clusters or communities in the absence of target labels.
Depending on the task at hand, several Graph Neural Network techniques have been proposed and
many improvements have also been developed. These are based on specific applications and other
properties of the graph such as being directed (i.e. follower on Twitter) or heterogeneous (i.e. paper -
author nodes in citation network) and edges having weights or features (i.e. net import and export
trade value between two countries) [17].
2.3.1 ChebNet
Some of the early techniques proposed in graph representation learning sought to learn the local
neighbourhood structure of nodes present in graphs by using filters that would share structural
resemblances as well as the successes of those originally used on particularly images. Bruna et
al. [
18
] introduced the spectral network, a convolutional network based on spectral filtering. Spectral
graph theory defines the convolution operation on graphs as
gθ(L)∗x=gθ(UΛU>)x. (1)
Here,
g
is the spectral filter defined over
x
,
U
and
Λ
are the eigenvalues and eigenvectors of the
Laplacian
L
respectively. Defferrard et al. [
19
] use a local approximation to avoid the expensive
computation of the Laplacian eigenvectors (Λ) by defining filters as polynomials of the Laplacian:
gθ(Λ) ∗x=
K−1
X
k=0
θkΛkx. (2)
2
Node clustering and community detection are terms used by the machine learning on graphs and data mining
communities respectively.
3
They compute an approximation of the Chebyshev polynomial
Tk(˜
L)
in (4) from the truncated
Laplacian ˜
Lshown in equation (3).
˜
L= 2L/λmax −In.(3)
gθ(L)∗x=
K−1
X
k=0
θkTk(˜
L)x. (4)
Here,
Λk
is the truncated Laplacian eigenvectors computed from the Laplacian
L=UΛkU>
,
θk
is
the Chebyshev coefficient evaluated at the k-th order. This network is referred to as ChebNet.
2.3.2 Graph Convolutional Network (GCN)
GCN [
8
] is a GNN variant that approximates the ChebNet to the first-order by setting
K= 1
and
λmax = 2. The convolution filter on the graph reduces to
gθ∗x≈θIN+D−1
2AD−1
2x, (5)
where
A
is the adjacency matrix and
D
is the degree or the number of neighbours of a node. From
the equation 1,
A
and
D
can immediately be re-normalized by adding the identity matrix
IN
so that
the features and states of the node itself are captured and this transforms into the aggregation function
F=˜
D−1
2˜
A˜
D1
2Xand node update H=FΘ.
By stacking a sufficient number of GCN layers, hierarchical information can be extracted from the
graph-structured data by propagating local messages across layers. This message propagation rule
across hidden layers, land l+ 1 is given by 6.
hl+1 =σ˜
D−1
2˜
A˜
D−1
2h(l)W(l)(6)
z = softmax h(l+1)W(1) (7)
where
σ
is the activation function,
˜
D
is the degree of node with self-loops,
˜
A
is the adjacency matrix
with self-loops and
W
is the weight matrix. z in equation 7 is the final softmax layer used in a
semi-supervised classification task.
2.3.3 Graph Attention Network (GAT)
Graph Attention Networks (GAT) [
20
] use self-attention to compute attention coefficients that assign
weights by importance to nodes in each ones neighbourhood. It is an extension of attention, as
proposed by Badanau et al. [
21
], that allows each neighbouring node to be attended to. The following
equations summarize the attention mechanism used by GAT.
eij =a(Whi,Whj),(8)
eij
is the attention score between node
i
and node
j
computed from the shared weight
W
and
the hidden states of
i
and
j
. This provides a new set of feature information about node
i
and its
neighboring nodes to be learned together with a trainable parameter
a
which in effect aligns the
attention scores
eij
with the output features. Using softmax, attention scores (
α
) are normalised to
the resulting equation in (10).
αij = softmax(eij ),(9)
αij =exp LeakyReLU a>[WhikWhj]
Pk∈N exp (LeakyReLU (a>[WhikWhk])) (10)
[
20
] also defines the multi-head attention on graphs by concatenating a
K
number of independent
attention mechanisms (12). Each of these attention mechanisms implements (11).
ht+1 =σ
X
j∈N
αij Whj
(11)
4
ht+1 =
K
k=1σ
X
j∈N
αk
ij Wkhj
(12)
In the case where the output of the last hidden layer of the network is used for prediction with a
sigmoid or softmax activation, features from the
K
attention layers are averaged out as demonstrated
by equation (13).
ht+1 =σ
1
K
K
X
k=1 X
j∈N
αk
ij Wkhj
(13)
2.3.4 Attention-Based Graph Neural Network (AGNN)
While the attention computed for neighboring nodes does not change going from one layer to another,
AGNN [
22
] uses a layer-wise parameter
α
that learns to weight neighboring nodes of node
i
according
to their contribution to the label of the target node. This is done by storing a single
α(t)
for layer
t∈ {1,2, ..., l}
for
l
number of layers. Propagation of hidden state information across layers is
guided by the rule in equation 14.
h(t+1) =F(t)h(t)(14)
F(t)
i= softmax hα(t)cos(h(t)
i, h(t)
j)ij∈N (i)∪{i}(15)
F(t)
i
is the propagation vector computed at layer
t
as a summary by relevance of each neighboring
node jto node i.
2.3.5 Graph Auto-Encoder (GAE)
Previous models we have discussed are (semi-) supervised learning methods on graph data where
each example has a label showing the class it belongs to. We go further to discuss the Graph Auto-
Encoder (GAE) [
23
], an extension of auto-encoders [
24
,
25
], for unsupervised learning on graphs.
GAE primarily learns a latent representation for the data without looking at the labels and the edge
directions. GAE is composed of an encoder (inference model) that learns a code which targets
minimizing a reconstruction loss and a decoder (generative model) whose output is a reconstruction
of the original representation from the code. Kipf and Welling [
23
] proposed a GCN encoder function
f(z|X,A) = GCN (A)(16)
and a decoder function
p(A|z) =
N
Y
i=1
N
Y
j=1
p(Aij |zi,zj),(17)
where p(Aij |zi,zj)can be
p(Aij |zi,zj) = σ(z>
izj).(18)
specialised in predicting new edges between nodes in the newly constructed adjacency matrix.
2.3.6 Variational Graph Auto-Encoder (VGAE)
The inference model of the Variational Graph Auto-Encoder (VGAE) learns a latent code which
follows a controlled distribution with an encoder function
f(z|X,A)
. The encoder is generally made
up of a simple double-layered GCN whose shared parameters are normally distributed.
q(z|X,A) =
N
Y
j=1
q(zi|X,A), where q(zi|X,A) = N(zi|µi,diag(σ2
i)).(19)
The generative model then reconstructs a new adjacency matrix with a decoder function
p(A|z)
(same as shown for GAEs). Training is done by minimizing a variational lower bound:
L=E[log p(A|Z)] −KL[q(Z|X,A)||p(Z)],(20)
5
Table 2: Summary of data features and representation.
Feature notation number Representation
nodes V111 countries
node features X 38 population etc.4
edges A476 1 if countries iand jhave traded and 0 otherwise
edge weights E476 net trade value (USD)
node labels Y4 income group
3 Data and Tasks
Here, we show our proposed approach to representing countries in the graph-structured bilateral trade
data between countries for income-level classification and trade partner prediction. We first describe
the basic components of a typical graph and relate each to a component in our data.
Our primary goals are to (1) classify countries into their respective income levels: high, upper
middle, lower middle and low, and (2) predict potential trade partners. These are essentially multi-
class node classification and link prediction tasks. We trained Graph Neural Network models and
baseline models to perform the aforementioned tasks. We based our evaluation of performance on
classification accuracy on test examples and area under the curve (AUC) and average precision (AP)
for node classification and link prediction task respectively. All experiments were performed on
data specifically collected for this study and we do not include results on other standard benchmark
datasets traditionally used to test the efficiency of novel approaches. In the next subsection, we talk
about how we collected the data, the representation approach we took and the GNN models adopted
for the downstream tasks.
3.1 Data Collection and Representation
The data used for this study are taken from two different sources: the United Nations Comtrade
Database and Kaggle. The United Nations Comtrade Database
3
is an international trade database
containing the reporter-partner trade statistics collated for about 170 countries over a certain period of
time. These trade statistics include (1) imports, exports, re-exports, and re-imports, (2) commodities
exchanges between the trade partner and reporter, (3) trade value in US dollars.
The data retrieved from Kaggle, on the other hand, contains the profile of specific countries, which
included geographical, financial, geological, and other information. To ensure consistency, we used
data for a particular year and accumulated the trade and profile information for a total of 111 countries
together, along with their income groups, which are used as target labels.
Each country considered is a node in our graph and contains 38 node features, which are simply the
collected profile information. We then used the net trade balance between the countries to construct
an adjacency matrix such that there is an edge between the countries if the trade balance was not zero
and vice versa. An entry of
1
is assigned if there is an edge and
0
otherwise. The trade values are in
US dollars and constitute the edge weight matrix. A summary of features and the representation is
provided in Table 2.
4 Experiments
We evaluate our approach to modeling bilateral trade using GNNs in two settings: link prediction of
withheld edges between trading countries and income classification of countries (node classification)
as a separate downstream task. We train all GNN based models using PyTorch Geometric [
26
] using
80%
and
20%
for training and test sets. For optimization of model parameters, we use the Adam
optimizer [27], while hyperparameters are tuned using Bayesian optimization [28, 29].
3https://comtrade.un.org/data/
3See appendix for full list of features.
6
(a)
(b)
Figure 1: (a) shows the graph of countries (numbered for country reference) and links from trade
data. (b) is a graph of countries with node color indicating their income level.
node classifier
link predictor
1
2
Figure 2: (1) Input graph has partially labelled nodes and edges of different widths corresponding to
the values of trade flow. The node classifier infers the remaining nodes. (2) Link predictor predicts
missing edges illustrated with dotted lines
5 Setup and Baselines
In addition to GNNs, we test a multi-layer perceptron (MLP) and a logistic regression model as
baseline models for node classification. Both MLP and the logistic regression model consume node
features as input, but critically do not have access to any underlying graph structure in the form
of an adjacency matrix. We hypothesize that effective learning requires the utilization of local
neighborhood information available via the adjacency matrix —i.e., adoption of specific domestic
trade policies in one country can influence similar policies to its neighbors.
5.1 Hyperparameter tuning
In our experiments, we used bayesian optimization [
28
,
29
]
5
to tune the hyperparameters. We defined
the bounds on the parameter space as shown in table 3 and obtained the best results for the learning
rate of 0.001 and weight decay of 0.005 for all GNN models. The baseline linear model performed
better at a slightly different hyperparameter setting of 0.4601 for learning rate and 0.06976 for weight
decay.
5.2 Node Classification
We used the GCN, ChebNet, GAT and AGNN models described in Section 2 to perform a multi-class
node classification with the input graph. We split the data into train and test sets. We do this by
5Code available at https://github.com/fmfn/BayesianOptimization.
7
Table 3: hyperparameters and predefined optimization bounds
hyperparameter upper bound lower bound
learning rate 0.0001 1.0
weight decay 0.005 1.0
randomly masking out
20%
of the total nodes and edge indices from the adjacency matrix and using
the
80%
for training. The learned model is then used to predict labels for the masked out set of nodes.
We then report the classification accuracy on the test accuracy as a measure of model performance.
5.3 Link Prediction
In the link prediction task, we used the graph autoencoder (GAE) and variational graph autoencoder
(VGAE) [
23
] to learn a latent representation of the input graph. The input graph is having few
observed edges and hence a sparse adjacency matrix.This input graph is fed into the link prediction
model, which then reconstructs a new adjacency matrix representing a new neighborhood structure
for each node in the graph. We evaluate the reconstruction accuracy using the area under the curve
(AUC) and average precision (AP) metrics. In Table 5, we report high AUC and AP scores for both
GAE and VGAE. This is indicative of how well these models are reliably able to reconstruct the
adjacency matrix from the learned latent representation.
5.4 Results and Evaluation
Figure 3: Accuracy on test sets.
On the node classification tasks, we summarize results denoting mean results after 100 runs in Table
4. We report that the GCN has the best accuracy score. ChebNet also compares competitively with
the linear baseline model.
8
(a) (b)
Figure 4: (a) AUCs and APs on adjacency matrix reconstruction by (a) GAE and (b) VGAE.
Table 4: Results for multi-class node classification task
GCN ChebNet GAT AGNN Linear Logistic Regression
Test Accuracy 0.6812 0.6436 0.6158 0.6003 0.6491 0.5758
Table 5: Results for link prediction task
GAE VGAE
AUC 0.9840 0.9888
Average Precision 0.9835 0.9896
In Table 5, we report high AUC and AP scores for both GAE and VGAE. This is indicative of
how well these models are reliably able to reconstruct the adjacency matrix from the learned latent
representation.
6 Discussion and Conclusion
In this paper, we approach modeling bilateral trade and related downstream tasks — potential
trade partners prediction among countries and income level classification — as a problem in graph
representation learning. We leverage historical mutual trade relationships first to construct a graph and
where nodes are countries and edges represent active trade between any two given countries before
utilizing graph neural networks. Our approach naturally points to a new direction machine learning,
particularly graph representation learning and application in the field of Economics. Empirically,
we confirm that our approach does well for the intended tasks and can potentially aid future trade
analysis. While we considered modeling trade as a static graph an exciting future direction is to
model the time evolution of bilateral trade as a dynamic graph. This will encourage analyses of
(1) how countries evolve from one income class to the other with time and (2) how trade activities
between countries change over time . In the latter case, a temporal prediction of an edge between any
two countries will be an indication of how likely they will partner in trade in the future.
References
[1]
David Ricardo. On the Principle of Political Economy and Taxation, chapter 1. Batoche Books,
52 Eby Street South, Kitchener, Ontario, N2G 3L1Canada, 3 edition, 1817.
[2] Patrick Steiner. Determinants of bilateral trade flows, 2015.
[3]
Alan Deardorff. Determinants of bilateral trade: Does gravity work in a neoclassical world?
The Regionalization of the World Economy, pages 7–32, 1998.
9
[4] James E Anderson. The gravity model: Annual review of economics. 2011.
[5]
World Bank Data Team. New country classifications by income level, 2017. Last accessed 31
December 2019.
[6]
Thomas Chaney. The gravity equation in international trade: An explanation. Working Paper
19285, National Bureau of Economic Research, August 2013.
[7]
F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini. The graph neural
network model. IEEE Transactions on Neural Networks, 20(1):61–80, Jan 2009.
[8]
Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional
networks. arXiv preprint arXiv:1609.02907, 2016.
[9]
Smriti Bhagat, Graham Cormode, and S. Muthukrishnan. Node classification in social networks.
CoRR, abs/1101.3291, 2011.
[10]
Edouard Pineau and Nathan de Lara. Graph classification with recurrent variational neural
networks. CoRR, abs/1902.02721, 2019.
[11]
Jia Li, Yu Rong, Hong Cheng, Helen Meng, Wen-bing Huang, and Junzhou Huang. Semi-
supervised graph classification: A hierarchical graph perspective. CoRR, abs/1904.05003,
2019.
[12]
Antoine Jean-Pierre Tixier, Giannis Nikolentzos, Polykarpos Meladianos, and Michalis
Vazirgiannis. Classifying graphs as images with convolutional neural networks. CoRR,
abs/1708.02218, 2017.
[13]
C. Aggarwal, G. He, and P. Zhao. Edge classification in networks. In 2016 IEEE 32nd
International Conference on Data Engineering (ICDE), pages 1038–1049, May 2016.
[14]
Muhan Zhang and Yixin Chen. Link prediction based on graph neural networks. CoRR,
abs/1802.09691, 2018.
[15]
Ramnath Balasubramanyan, Frank Lin, and William W. Cohen. Node clustering in graphs: An
empirical study. 2010.
[16] Santo Fortunato. Community detection in graphs. Physics Reports, 486(3):75 – 174, 2010.
[17]
Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, and Maosong Sun. Graph
neural networks: A review of methods and applications. CoRR, abs/1812.08434, 2018.
[18]
Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. Spectral networks and locally
connected networks on graphs. CoRR, abs/1312.6203, 2013.
[19]
Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks
on graphs with fast localized spectral filtering. CoRR, abs/1606.09375, 2016.
[20]
Petar Veliˇ
ckovi´
c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua
Bengio. Graph Attention Networks. International Conference on Learning Representations,
2018. accepted as poster.
[21]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly
learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
[22]
Kiran K. Thekumparampil, Sewoong Oh, Chong Wang, and Li-Jia Li. Attention-based graph
neural network for semi-supervised learning, 2018.
[23]
Thomas N Kipf and Max Welling. Variational graph auto-encoders. NIPS Workshop on Bayesian
Deep Learning, 2016.
[24]
Hervé Bourlard and Yves Kamp. Auto-association by multilayer perceptrons and singular value
decomposition. Biological Cybernetics, 59:291–294, 1988.
10
[25]
Geoffrey E Hinton and Richard S. Zemel. Autoencoders, minimum description length and
helmholtz free energy. In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural
Information Processing Systems 6, pages 3–10. Morgan-Kaufmann, 1994.
[26] Matthias Fey and Jan E. Lenssen. Fast graph representation learning with PyTorch Geometric.
In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
[27]
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. CoRR,
abs/1412.6980, 2014.
[28]
Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. Practical bayesian optimization of machine
learning algorithms. In Proceedings of the 25th International Conference on Neural Information
Processing Systems - Volume 2, NIPS’12, page 2951–2959, Red Hook, NY, USA, 2012. Curran
Associates Inc.
[29]
Eric Brochu, Vlad M. Cora, and Nando de Freitas. A tutorial on bayesian optimization of
expensive cost functions, with application to active user modeling and hierarchical reinforcement
learning. CoRR, abs/1012.2599, 2010.
11