PreprintPDF Available

Training Stable Graph Neural Networks Through Constrained Learning

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

Graph Neural Networks (GNN) rely on graph convolutions to learn features from network data. GNNs are stable to different types of perturbations of the underlying graph, a property that they inherit from graph filters. In this paper we leverage the stability property of GNNs as a typing point in order to seek for representations that are stable within a distribution. We propose a novel constrained learning approach by imposing a constraint on the stability condition of the GNN within a perturbation of choice. We showcase our framework in real world data, corroborating that we are able to obtain more stable representations while not compromising the overall accuracy of the predictor.
arXiv:2110.03576v1 [cs.LG] 7 Oct 2021
TRAINING STABLE GRAPH NEURAL NETWORKS
THROUGH CONSTRAINED LEARNING
Juan Cervi˜
no, Luana Ruiz and Alejandro Ribeiro
Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, USA
ABSTRACT
Graph Neural Networks (GNN) rely on graph convolutions to
learn features from network data. GNNs are stable to differ-
ent types of perturbations of the underlying graph, a property
that they inherit from graph filters. In this paper we leverage
the stability property of GNNs as a typing point in order to
seek for representations that are stable within a distribution.
We propose a novel constrained learning approach by impos-
ing a constraint on the stability condition of the GNN within
a perturbation of choice. We showcase our framework in real
world data, corroborating that we are able to obtain more sta-
ble representations while not compromising the overall accu-
racy of the predictor.
Index TermsGraph Neural Networks, Constrained
Learning, Stability
1. INTRODUCTION
Graph Neural Networks (GNNs) are deep convolutional ar-
chitectures tailored to graph machine learning problems [1, 2]
which have achieved great success in fields such as biology
[3, 4] and robotics [5, 6], to name a few. Consisting of lay-
ers that stack graph convolutions and pointwise n onlinearities,
their successful empirical results can be explained by theo-
retical properties they inherit from graph convolutions. In-
deed, convolutions are the reason why GNNs are invariant to
node relabelings [7, 8]; stable to deterministic [2], stochastic
[9], and space-time graph perturbations [10]; and transferable
from small to large graphs [11].
Stability is an especially important property because, in
practice, networks are prone to perturbations. For instance, in
a social network, friendship links can not only be added or re-
moved, but also strengthened or weakened depending on the
frequency of interaction. Similarly, in a wireless network the
channel states are dramatically affected by environment noise.
Because GNNs have been proved to be stable to such pertur-
bations, in theory any GNN should do well in these scenarios.
In practice, however, actual stability guarantees depend on
factors such as the type of graph perturbation, the smoothness
of the convolutions, the depth and width of the neural net-
work, and the size of the graph [12]. In other words, GNNs
Support by NSF CCF 1717120, and Theorinet Simons.
are provably stable to graph perturbations, but we cannot al-
ways ensure that they will meet a certain stability requirement
or constraint.
In this paper, our goal is thus to enforce GNNs to meet a
specific stability requirement, which we do by changing the
way in which the GNN is learned. Specifically, we modify the
statistical learning problem by introducing GNN stability as
a constraint, therefore giving rise to a constrained statistical
learning problem. This leads to an non-convex constrained
problem for which even a feasible solution may be may chal-
lenging to obtain in practice. To overcome this limitation,
we resort to the dual domain, in which the problem becomes
a weighted unconstrained minimization problem that we can
solve using standard gradient descent techniques. By evaluat-
ing the constraint slackness, we iteratively update the weights
of this problem. This procedure is detailed in Algorithm 1.
In Theorem 1, we quantify the duality gap, i.e., the mismatch
between solving the primal and the dual problems. In The-
orem 2, we present convergence guarantees for Algorithm 1.
These results are illustrated numerically in Section 4, where
we observe that GNNs trained using Algorithm 1 successfully
meet stability requirements for a variety of perturbation mag-
nitudes and GNN architectures.
2. GRAPH NEURAL NETWORKS
A graph is a triplet G= (V,E,W), where V={1,...,N}
is its set of nodes, E V × V is its set of edges, and W
is a function assigning weights W(i, j )to edges (i, j)∈ E .
A graph may also be represented by the graph shift operator
(GSO) SRN×N, a matrix which satisfies Sij 6= 0 if and
only if (j, i)∈ E or i=j. The most common examples of
GSOs are the graph adjacency matrix A,[A]ij =W(j, i);
and the graph Laplacian L=diag(A1)A.
We consider the graph Gto be the support of data x=
[x1,...,xN]which we call graph signals. The ith com-
ponent of a graph signal x,xi, corresponds to the value of
the data at node i. The operation Sx defines a graph shift
of the signal x. Leveraging this notion of shift, we define
graph convolutional filters as linear shift-invariant graph fil-
ters. Explicitly, a graph convolutional filter with coefficients
h= [h1,...,hK1]is given by
y=hSx=
K1
X
k=0
hkSkx(1)
where Sis the convolution operation parametrized by S.
GNNs are deep convolutional architectures consisting of
Llayers, each of which contains a bank of graph convolu-
tional filters like (1) and a pointwise nonlinearity ρ. Layer l
produces Flgraph signals xf
l, called features. Defining a ma-
trix Xlwhose fth column corresponds to the fth feature of
layer lfor 1fFl, we can write the lth layer of the GNN
as
Xl=ρ K1
X
k=0
SkXl1Hlk!. (2)
In this expression, [Hlk ]gf denotes the kth coefficient of the
graph convolution (1) mapping feature gto feature ffor 1
gFl1and 1fFl. A more succinct representa-
tion of this GNN can be obtained by grouping all learnable
parameters Hlk ,1lL, in a tensor H={Hlk}l,k.
This allows expressing the GNN as the parametric map XL=
φ(X0,S;H). For simplicity, in the following sections we as-
sume that the input and output only have one feature, i.e.,
X0=xRNand XL=yRN.
2.1. Statistical Learning on Graphs
To learn a GNN, we are given pairs (x,y)corresponding to
an input graph signal xRNand a target output graph signal
yRNsampled from the joint distribution p(x,y). Our ob-
jective is to find the filter coefficients Hsuch that φ(x,S;H)
approximates yover the joint probability distribution p. To do
so, we introduce a nonnegative loss function :RN×RN
R+which satisfies (φ(x),y) = 0 when φ(x) = y. The
GNN is learned by averaging the loss over the probability dis-
tribution as follows,
min
H∈RQ
E
p(x,y)[(y, φ(x,S;H),y))].(3)
Problem 3 is the Statistical Risk Minimization problem
[13] for the GNN.
2.2. Stability to Graph Perturbations
In the real world, it is not uncommon for graphs to be prone to
small perturbations such as, e.g., interference noise in wire-
less networks. Hence, stability to graph perturbations is an
important property for GNNs. Explicitly, we define a graph
perturbation as a graph that is ǫclose to the original graph,
ˆ
S:kˆ
SSk ≤ ǫ. (4)
An example of perturbation is an additive perturbation of the
form ˆ
S=S+E, where Eis a stochastic perturbation with
bounded norm kEk ≤ ǫdrawn from a distribution .
The notion of GNN stability is formalized in Definition
1. Note that the maximum is taken in order to account for all
possible inputs.
Definition 1 (GNN stability to graph perturbations) Let
φ(x,S;H)be a GNN (2) and let ˆ
Sbe a graph perturbation
(4) such that kˆ
SSk ≤ ǫ. The GNN φ(X,S;H)is C-stable
if
max
xkφ(x,S;H)φ(x,ˆ
S;H)k ≤ Cǫ (5)
for some finite constant C.
A GNN is thus stable to a graph perturbation kˆ
SSk ≤ ǫ
if its output varies at most by C ǫ. The smaller the value of
C, the more stable the GNN is to perturbations ˆ
S. Under
mild smoothness assumptions on the graph convolutions, it
is possible to show that any GNN can be made stable in the
sense of Definition 1 [12]. However, for an arbitrary GNN the
constant Cis not guaranteed to be small and, in fact, existing
stability analyses show that it can vary with the GNN depth
and width (i.e. number of layers L, and number of features F
respectively), the size of the graph N, and the misalignment
between the eigenspaces of Sand ˆ
S. What is more, problem
3 does not impose any conditions on the stability of the GNN,
thus solutions Hmay not have small stability bounds C. In
this paper, our goal is to enforce stability for a constant C
of choice. In the following, we show that, on average over
the support of the data, better stability can be achieved by
introducing a modification of definition 1 as a constraint of
the statistical learning problem for the GNN.
3. CONSTRAINED LEARNING
In order to address the stability of the GNN, we can explic-
itly enforce our learning procedure to account for differences
between the unperturbed and the perturbed performance, to
do this we resort to the constrained learning theory [14]. We
modify the standard unconstrained statistical risk minimiza-
tion problem (cf. (3)) by introducing a constraint that requires
the solution Hto attain at most an C ǫ difference between the
perturbed and unperturbed problem.
P= min
H∈RQ
E
p(x,y)[(y, φ(x,S;y))] (6)
s.t. E
p(x,y,∆)[(y, φ(x,ˆ
S;H)) (y, φ(x,S;H))] Cǫ
Note that if the constant Cis set at a sufficiently large value,
the constraint renders inactive, making problems (6) and (3)
equivalent. As opposed to other methods based on heuristics,
or tailored solutions, our novel formulation admits a simple
interpretation from an optimization perspective.
3.1. Dual Domain
In order to solve problem (6), we will resort to the dual do-
main. To do so, we introduce the dual variable λ > 0R,
and we define the Lagrangian function as follows,
L(H, λ) =(1 λ)E[(y, φ(x,S;H)] (7)
+λE[(y, φ(x,ˆ
S;H)) ǫC].
We can introduce the dual function as the minimum of the La-
grangian L(H, λ), over Hfor a fixed value of dual variables
λ[15],
d(λ) = min
H∈RQL(H, λ).(8)
Note that to obtain the value of the dual function d(λ)we need
to solve an unconstrained optimization problem weighted by
λ. Given that the dual function is a point wise minimum of a
family of affine functions, it is concave, even when the prob-
lem (6) is not convex. The maximum of the dual function
d(λ)over λis called the dual problem D, and it is always a
lower bound of problem Pas follows,
d(λ)Dmin
H∈RQmax
λR+L(φ, λ) = P.(9)
The advantage of delving into the dual domain, and max-
imizing the dual function dis that it allows us to search for
solutions of problem 6 by minimizing an unconstrained prob-
lem. The difference between the dual problem Dand the
primal problem P(cf. 6), is called duality gap and will be
quantified in the following theorem.
AS1 The loss function is L-Lipschitz, i.e.k(x, ·)(z, ·)k ≤
Lkxzk, strongly convex and bounded by B.
AS2 The conditional distribution p(x,|y)is non-atomic
for all yRN, and there a finite number of target graph
signals y.
AS3 There exists a convex hypothesis class ˆ
Csuch that C ⊆
ˆ
C, and there exists a constant ξ > 0such that for all ˆ
φˆ
C,
there exists H ∈ RQsuch that supx∈X kˆ
φ(x)φ(x, H)k ≤ ξ.
Note that assumption 1 is satisfied in practice by most loss
functions (i.e. square loss, L1loss), by imposing a bound.
Assumption 2 can be satisfied in practice by data augmen-
tation [16]. Assumption 3 is related to the richness of the
function class of GNNs C, the parameter ξcan be decrease by
increasing the capacity of the GNNs in consideration. To ob-
tain a convex hypothesis class ˆ
H, it suffices to take the convex
hull over the function class of GNNs.
Theorem 1 (Near-Zero Duality Gap) Under assumptions
1, 2, and 3, the Constrained Graph Stability problem (6), has
near zero duality gap,
PD(2λ+ 1)(10)
where λis the optimal dual variable.
Algorithm 1 Graph Stability Algorithm
1: Initialize model H0, and dual variables λ= 0
2: for epochs e= 1,2,... do
3: for batch iin epoch edo
4: Obtain Nsamples {(xi,yi)}ip(x,y)
5: Obtain Mperturbations {(ˆ
Si)}i
6: Get primal gradient HL(H, λ)(cf. eq (11))
7: Update params. Hk+1 =HkηPˆ
HL(Hk, λ)
8: end for
9: Obtain Nsamples {(xi,yi)}ip(x,y)
10: Obtain Mperturbations {(ˆ
Si)}i
11: Update dual variable λ[λ+ηDλL(H, λ)]+
12: end for
The importance of Theorem 1 is that is allows us to quan-
tify the penalty that we incur by delving into the dual domain.
Note that this penalty decreases as we make our parameteriza-
tion richer, and thus we decrease ξ. Also note that the optimal
dual variable λaccounts for the difficulty of finding a fea-
sible solution, thus we should expect this value to be small
given the theoretical guarantees on GNN stability [12].
3.2. Algorithm Construction
In order to solve problem 6, we will resort to iterativelly solv-
ing the dual function d(λ), evaluate the constraint slack and
update the dual variables λaccordingly. We assume that the
distributions are unknown, but we have access to samples of
both graph signals (xi,yi)p(x, y), and perturbed graphs
ˆ
Sj. In a standard learning procedure, to minimize the
Lagrangian L(H, λ)with respect to a set of variables λwe
can take stochastic gradients as follows,
ˆ
HL(H, λ) =H1λ
N
N
X
i=1
(φ(xi,S;Hk), yi)(11)
+λ
NM
N
X
i=1
M
X
j=1
(φ(xi,ˆ
Sj;Hk), yi)
The main difference with a regularized problem is that the
dual variables λare also updated. To update the dual variables
λ, we evaluate the constraint violation as follows,
ˆ
λL(H, λ) = 1
NM
N
X
i=1
M
X
j=1
(φ(xi,ˆ
Sj;Hk), yi)
1
N
N
X
i=1
(φ(xi,S;Hk), yi)Cǫ. (12)
The intuition behind the dual step is that the dual variable λ
will increase while the constraint is not being satisfied, adding
weight to the stability condition in the minimization of the
Lagrangian. Conversely, if the constraint is being satisfied,
RMSE for 2Layer GNN RMSE for 3Layer GNN
Norm Of Perturbation Unconstrained Constrained Unconstrained Constrained
0 0.8447(0.1386) 0.8564(0.0572) 0.8290(0.1504) 0.8417(0.0980)
0.0001 0.9083(0.2160) 0.8316(0.0692) 3.3542(0.5042) 0.8349(0.1011)
0.001 0.9084(0.2160) 0.8317(0.0691) 3.3542(0.5043) 0.8341(0.1011)
0.01 0.9092(0.2162) 0.8322(0.0688) 3.5410(0.5044) 0.8343(0.1011)
0.1 0.9170(0.2173) 0.8371(0.0667) 3.3484(0.5086) 0.8362(0.1010)
0.2 0.9282(0.2212) 0.8458(0.0608) 3.3326(0.8189) 0.8379(0.1007)
0.5 0.9565(0.2223) 0.8749(0.0528) 3.2070(0.5920) 0.8468(0.0932)
Table 1. Evaluations of the RMSE of the trained GNN for 20 epochs on the testing set (unseen data) for different magnitudes
of relative perturbations. We consider GNNs of 2, and 3layers with K= 5 filter taps in both cases and F1= 64, and F2= 32
features for the first and second layer respectively. The constrained learning approach is able to keep a comparable performance
on the unperturbed evaluation (i.e., Norm of Perturbation= 0) while it is more stable as the norm of the perturbation increases.
we will increase the relative weight of the objective function.
This means, that if the constraint is more restrictive the opti-
mal dual variable will be larger.
Theorem 2 (Convergence) Under assumptions 1, 2, and 3,
if for each dual variable λk, the Lagrangian is minimized up
to a precision α > 0, i.e. L(Hλk, λk)minH∈RQL(H, λk)+
α, then for a fixed tolerance β > 0, the iterates generated
by Algorithm 1 achieve a neighborhood of the optimal P
problem in finite time
P+α≥ L(Hk, λk)P(2λ+ 1)αβηDB2
2
Theorem (2) allows us to claim converge of Algorithm 1 up
to a neighborhood of the optimal problem (6) that depends on
the tolerance β, the precision α, the dual step size ηDand the
loss bound B.
4. EXPERIMENTS
We consider the problem of predicting the rating a movie will
be given by a user. We leverage the dataset MovieLens 100k
[17] which contains 100,000 integer ratings between 1and 5,
that were collected among U= 943 users and M= 1682
movies. For this example, we focus only on the movie “Con-
tact”. To exploit the underlying graph structure of the prob-
lem we build a movie similarity graph, obtained by computing
the pairwise correlations between the different movies in the
training set [18]. In order to showcase the stability properties
of GNNs, we perturb the graph shift operator according to the
Relative Perturbation Modulo Perturbation [12][Definition 3]
model ˆ
S=S+ES +SE. We consider a uniform distribution
of Esuch that, kEk ≤ ǫ.
We split the dataset into 90% for training and 10% for
testing, considering 10 independent random splits. For the
optimizer we used a 5sample batch size, and ADAM [19]
with learning rate 0.005, β1= 0.9, β2= 0.999 and no learn-
ing decay. We used the smooth L1loss. For the GNNs,
we used ReLU as the non-linearity, and we considered two
GNNs: (i) one layer with F= 64 features, and (ii) two layers
with F1= 64 and F2= 32. In both cases, we used K= 5
filter taps per filter. For the Algorithm 1 we used dual step
size ηD= 0.1, stability constant C=1
3, and magnitude of
perturbation ǫ= 0.3. For number of perturbation per primal
step we used M= 3, and to evaluate the constraint slackness
we used 20% of the training set.
Table 1 shows the RMSE achieved on the test set when the
GNN is trained using Algorithm 1 and when trained uncon-
strained (cf. 3). We evaluate GNN performance for different
magnitudes of perturbations of the graph. The numerical re-
sults shown in Table 1 express the manifestation of the claims
that we put forward. First, using Algorithm 1 we are able to
attain a comparable performance to the one we would have
achieved by training ignoring the perturbation. As seen in the
first row, the evaluation of the trained GNNs produces compa-
rable results for both the 2and the 3layer GNN. Second, with
our formulation we are able to obtain more stable represen-
tations because when the perturbation magnitude increases,
the loss deteriorates at a slower rate. This effect is especially
noticeable for the 3layer GNN. It is well studied that GNN
stability worsens as the number of layers increases, however
using Algorithm 1 we are able to curtail this undesirable ef-
fect.
5. CONCLUSION
In this paper we introduced a constrained learning formula-
tion to improve the stability of GNN. By explicitly introduc-
ing a constraint on the stability of the GNN we are able to
obtain filter coefficients that are more resilient to perturba-
tions of the graph. The benefit of our novel procedure was
benchmarked in a recommendation system problem with real
world data. For future work, we will improve our theoreti-
cal guarantees in order to assure stability, and consider other
more demanding simulations such as robot swarms.
6. REFERENCES
[1] Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang,
Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng
Li, and Maosong Sun, “Graph neural networks: A re-
view of methods and applications,” AI Open, vol. 1, pp.
57–81, 2020.
[2] Fernando Gama, Antonio G Marques, Geert Leus, and
Alejandro Ribeiro, “Convolutional neural network ar-
chitectures for signals supported on graphs,” IEEE
Transactions on Signal Processing, vol. 67, no. 4, pp.
1034–1049, 2018.
[3] Alex Fout, Jonathon Byrd, Basir Shariat, and Asa Ben-
Hur, “Protein interface prediction using graph con-
volutional networks,” in Advances in Neural Infor-
mation Processing Systems, I. Guyon, U. V. Luxburg,
S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and
R. Garnett, Eds. 2017, vol. 30, Curran Associates, Inc.
[4] David K Duvenaud, Dougal Maclaurin, Jorge Ipar-
raguirre, Rafael Bombarell, Timothy Hirzel, Alan
Aspuru-Guzik, and Ryan P Adams, “Convolutional net-
works on graphs for learning molecular fingerprints,”
in Advances in Neural Information Processing Systems,
C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and
R. Garnett, Eds. 2015, vol. 28, Curran Associates, Inc.
[5] Siyuan Qi, Wenguan Wang, Baoxiong Jia, Jianbing
Shen, and Song-Chun Zhu, “Learning human-object in-
teractions by graph parsing neural networks,” in Pro-
ceedings of the European Conference on Computer Vi-
sion (ECCV), 2018, pp. 401–417.
[6] Qingbiao Li, Fernando Gama, Alejandro Ribeiro, and
Amanda Prorok, “Graph neural networks for decen-
tralized multi-robot path planning,” arXiv preprint
arXiv:1912.06095, 2019.
[7] Zhengdao Chen, Soledad Villar, Lei Chen, and Joan
Bruna, “On the equivalence between graph isomor-
phism testing and function approximation with gnns,”
arXiv preprint arXiv:1905.12560, 2019.
[8] Nicolas Keriven and Gabriel Peyr´e, “Universal invari-
ant and equivariant graph neural networks,” in Advances
in Neural Information Processing Systems (NeurIPS),
2019.
[9] Zhan Gao, Elvin Isufi, and Alejandro Ribeiro, “Stabil-
ity of graph convolutional neural networks to stochastic
perturbations,” Signal Processing, p. 108216, 2021.
[10] Samar Hadou, Charilaos I. Kanatsoulis, and Alejandro
Ribeiro, “Space-time graph neural networks,” 2021.
[11] Luana Ruiz, Luiz Chamon, and Alejandro Ribeiro,
“Graphon neural networks and the transferability of
graph neural networks,” Advances in Neural Informa-
tion Processing Systems, vol. 33, 2020.
[12] Fernando Gama, Joan Bruna, and Alejandro Ribeiro,
“Stability properties of graph neural networks,” IEEE
Transactions on Signal Processing, vol. 68, pp. 5680–
5695, 2020.
[13] Shai Shalev-Shwartz and Shai Ben-David, Understand-
ing machine learning: From theory to algorithms, Cam-
bridge university press, 2014.
[14] Luiz Chamon and Alejandro Ribeiro, “Probably approx-
imately correct constrained learning,” Advances in Neu-
ral Information Processing Systems, vol. 33, 2020.
[15] Stephen Boyd and Lieven Vandenberghe, Convex Opti-
mization, Cambridge University Press, 2009.
[16] Ian Goodfellow, Yoshua Bengio, Aaron Courville, and
Yoshua Bengio, Deep learning, vol. 1, MIT press Cam-
bridge, 2016.
[17] F Maxwell Harper and Joseph A Konstan, “The movie-
lens datasets: History and context,” Acm transactions
on interactive intelligent systems (tiis), vol. 5, no. 4, pp.
1–19, 2015.
[18] Weiyu Huang, Antonio G Marques, and Alejandro R
Ribeiro, “Rating prediction via graph signal process-
ing,” IEEE Transactions on Signal Processing, vol. 66,
no. 19, pp. 5066–5081, 2018.
[19] Diederik P. Kingma and Jimmy Ba, “Adam: A method
for stochastic optimization,” CoRR, vol. abs/1412.6980,
2015.
ResearchGate has not been able to resolve any citations for this publication.
Article
Graph convolutional neural networks (GCNNs) are nonlinear processing tools to learn representations from network data. A key property of GCNNs is their stability to graph perturbations. Current analysis considers deterministic perturbations but fails to provide relevant insights when topological changes are random. This paper investigates the stability of GCNNs to stochastic graph perturbations induced by link losses. In particular, it proves the expected output difference between the GCNN over random perturbed graphs and the GCNN over the nominal graph is upper bounded by a factor that is linear in the link loss probability. We perform the stability analysis in the graph spectral domain such that the result holds uniformly for any graph. This result also shows the role of the nonlinearity and the architecture width and depth, and allows identifying handle to improve the GCNN robustness. Numerical simulations on source localization and robot swarm control corroborate our theoretical findings.
Article
Lots of learning tasks require dealing with graph data which contains rich relation information among elements. Modeling physics systems, learning molecular fingerprints, predicting protein interface, and classifying diseases demand a model to learn from graph inputs. In other domains such as learning from non-structural data like texts and images, reasoning on extracted structures (like the dependency trees of sentences and the scene graphs of images) is an important research topic which also needs graph reasoning models. Graph neural networks (GNNs) are neural models that capture the dependence of graphs via message passing between the nodes of graphs. In recent years, variants of GNNs such as graph convolutional network (GCN), graph attention network (GAT), graph recurrent network (GRN) have demonstrated ground-breaking performances on many deep learning tasks. In this survey, we propose a general design pipeline for GNN models and discuss the variants of each component, systematically categorize the applications, and propose four open problems for future research.
Article
Graph neural networks (GNNs) have emerged as a powerful tool for nonlinear processing of graph signals, exhibiting success in recommender systems, power outage prediction, and motion planning, among others. GNNs consist of a cascade of layers, each of which applies a graph convolution, followed by a pointwise nonlinearity. In this work, we study the impact that changes in the underlying topology have on the output of the GNN. First, we show that GNNs are permutation equivariant, which implies that they effectively exploit internal symmetries of the underlying topology. Then, we prove that graph convolutions with integral Lipschitz filters, in combination with the frequency mixing effect of the corresponding nonlinearities, yields an architecture that is both stable to small changes in the underlying topology, and discriminative of information located at high frequencies. These are two properties that cannot simultaneously hold when using only linear graph filters, which are either discriminative or stable, thus explaining the superior performance of GNNs.
Article
Two architectures that generalize convolutional neural networks (CNNs) for the processing of signals supported on graphs are introduced. We start with the selection graph neural network (GNN), which replaces linear time invariant filters with linear shift invariant graph filters to generate convolutional features and reinterprets pooling as a possibly nonlinear subsampling stage where nearby nodes pool their information in a set of preselected sample nodes. A key component of the architecture is to remember the position of sampled nodes to permit computation of convolutional features at deeper layers. The second architecture, dubbed aggregation GNN, diffuses the signal through the graph and stores the sequence of diffused components observed by a designated node. This procedure effectively aggregates all components into a stream of information having temporal structure to which the convolution and pooling stages of regular CNNs can be applied. A multinode version of aggregation GNNs is further introduced for operation in large scale graphs. An important property of selection and aggregation GNNs is that they reduce to conventional CNNs when particularized to time signals reinterpreted as graph signals in a circulant graph. Comparative numerical analyses are performed in a source localization application over synthetic and real-world networks. Performance is also evaluated for an authorship attribution problem and text category classification. Multinode aggregation GNNs are consistently the best performing GNN architecture.
Article
This paper develops new designs for recommendation systems inspired by recent advances in graph signal processing. Recommendation systems aim to predict unknown ratings by exploiting the information revealed in a subset of user-item observed ratings. Leveraging the notions of graph frequency and graph filters, we demonstrate that classical collaborative filtering methods, such as $k$ -nearest neighbors (NN), can be modeled as a specific band-stop graph filter on networks describing similarities between users or items. We also demonstrate that linear latent factor (LF) models, such as low-rank matrix completion, can be viewed as bandlimited interpolation algorithms that operate in a frequency domain given by the spectrum of a joint user and item network. These new interpretations pave the way to new methods for enhanced rating prediction. For NN-based collaborative filtering, we develop more general band-stop graph filters, and present a novel predictor, called mirror filtering (MiFi), that filters jointly across user and item networks. For LF, we propose a low complexity method by exploiting the eigenvector of correlation matrices constructed from known ratings. The performance of our algorithms is assessed in the MovieLens 100 k dataset, showing that our designs reduce the root mean-squared error (up to a 6.85% for MiFi) compared to one incurred by the benchmark collaborative filtering approach.
Article
The MovieLens datasets are widely used in education, research, and industry. They are downloaded hundreds of thousands of times each year, reflecting their use in popular press programming books, traditional and online courses, and software. These datasets are a product of member activity in the MovieLens movie recommendation system, an active research platform that has hosted many experiments since its launch in 1997. This article documents the history of MovieLens and the MovieLens datasets. We include a discussion of lessons learned from running a long-standing, live research platform from the perspective of a research organization. We document best practices and limitations of using the MovieLens datasets in new research.
Article
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions. The method is straightforward to implement and is based an adaptive estimates of lower-order moments of the gradients. The method is computationally efficient, has little memory requirements and is well suited for problems that are large in terms of data and/or parameters. The method is also ap- propriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The method exhibits invariance to diagonal rescaling of the gradients by adapting to the geometry of the objective function. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. We demonstrate that Adam works well in practice when experimentally compared to other stochastic optimization methods.
Protein interface prediction using graph convolutional networks
  • Alex Fout
  • Jonathon Byrd
  • Basir Shariat
  • Asa Ben-Hur
Alex Fout, Jonathon Byrd, Basir Shariat, and Asa Ben-Hur, "Protein interface prediction using graph convolutional networks," in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds. 2017, vol. 30, Curran Associates, Inc.
Convolutional networks on graphs for learning molecular fingerprints
  • Dougal David K Duvenaud
  • Jorge Maclaurin
  • Rafael Iparraguirre
  • Timothy Bombarell
  • Alan Hirzel
  • Ryan P Aspuru-Guzik
  • Adams
David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alan Aspuru-Guzik, and Ryan P Adams, "Convolutional networks on graphs for learning molecular fingerprints," in Advances in Neural Information Processing Systems, C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, Eds. 2015, vol. 28, Curran Associates, Inc.