PreprintPDF Available

Parametric Graph Representations in the Era of Foundation Models: A Survey and Position

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Graphs have been widely used in the past decades of big data and AI to model comprehensive relational data. When analyzing a graph's statistical properties, graph laws serve as essential tools for parameterizing its structure. Identifying meaningful graph laws can significantly enhance the effectiveness of various applications, such as graph generation and link prediction. Facing the large-scale foundation model developments nowadays, the study of graph laws reveals new research potential, e.g., providing multi-modal information for graph neural representation learning and breaking the domain inconsistency of different graph data. In this survey, we first review the previous study of graph laws from multiple perspectives, i.e., macroscope and microscope of graphs, low-order and high-order graphs, static and dynamic graphs, different observation spaces, and newly proposed graph parameters. After we review various real-world applications benefiting from the guidance of graph laws, we conclude the paper with current challenges and future research directions.
Content may be subject to copyright.
Parametric Graph Representations in the Era of
Foundation Models: A Survey and Position
Dongqi FuLiri FangZihao Li
Hanghang Tong Vetle I. Torvik Jingrui He
University of Illinois Urbana-Champaign
{dongqif2,lirif2,zihaoli5,htong,vtorvik,jingrui}@illinois.edu
Figure 1: A Toy Example of Illustrating Graph Laws in the Real World: Different types of graphs
follow some shared properties that can be used to parameterize their structure. The left figure
depicts a social network, and the right figure represents flight routes between several major cities
after COVID-19, where flights gradually recover to their pre-pandemic state. Both graphs exhibit two
key properties: (1) Global Shrinking Law: As the graph evolves, the diameter decreases, indicating
a more connected structure. (2) Local Triangle Closure Law: New links are more likely to close
triangles. These behaviors are indicative of typical graph dynamics seen in social networks, transport
systems, and other domains of graphs. These graph laws have the potential to be quantified and serve
as fundamental principles for developing graph foundation models.
Abstract
Graphs have been widely used in the past decades of big data and AI to model
comprehensive relational data. When analyzing a graph’s statistical properties,
graph laws serve as essential tools for parameterizing its structure. Identifying
meaningful graph laws can significantly enhance the effectiveness of various
applications, such as graph generation and link prediction. Facing the large-scale
foundation model developments nowadays, the study of graph laws reveals new
research potential, e.g., providing multi-modal information for graph neural
representation learning and breaking the domain inconsistency of different graph
data. In this survey, we first review the previous study of graph laws from multiple
perspectives, i.e., macroscope and microscope of graphs, low-order and high-
order graphs, static and dynamic graphs, different observation spaces, and newly
proposed graph parameters. After we review various real-world applications
benefiting from the guidance of graph laws, we conclude the paper with current
challenges and future research directions.
Equal contribution.
Preprint. Preliminary work.
arXiv:2410.12126v1 [cs.AI] 16 Oct 2024
1 Introduction
In the era of big data and AI, graphs are popular data structures for modeling the complex relationships
between entities. Also, graph-based research (e.g., graph mining and graph representation learning)
provides the foundation for many real-world applications, such as recommender system [22,46,49,
51
], social network analysis [
23
,
35
], information retrieval [
32
,
36
,
45
], anomaly detection [
17
,
63
,
64
],
natural language processing [7,50,65], computer vision [8,27], AI4Science [15,18,59], etc.
Figure 2: Position of Graph Law in Graph Representations.
To model those real-world tasks within graphs, graph representations are indispensable middleware
that provides the basis for specific and complex task-oriented computations. To be specific, graph rep-
resentations can be decomposed into three aspects, (1) graph embedding (i.e., vector representation),
(2) graph law (i.e., parametric representation), and (3) graph visualization (i.e., visual representation).
First of all, graph representations can be in the form of embedding matrices, i.e., the graph topological
information and attributes are encoded into matrices, which has been widely discussed and studied in
the research community and usually can be referred to as graph representation learning. [
20
]. Then,
graphs can be also represented by plotting directly for a better human-understandable illustration.
For example, one interesting research topic is how to plot the graph topological structures into the
2D space with less structure distortion. More interesting works can be referred to [
16
]. Last but not
least, graphs can also be represented by a few key parameters such as Erd˝
os-Rényi random graph
G(n, p)
[
12
], where
n
stands for the number of nodes in the graph, and
p
stands for the independent
edge connection probability in the graph. As shown in Figure 2, these three representations can have
overlapping to mutually contribute to each other [16].
1.1 Motivation of this Paper
In the modern graph deep learning community, graph embedding (also referred to as graph repre-
sentation learning) has attracted unprecedented research attention varying from unsupervised (e.g.,
Node2Vec [
19
]) to semi-supervised (e.g., GCN [
28
]), and also the novel neural architecture from
transformer (e.g., GraphTransformer [
13
]). However, the graph law (also referred to as graph
parametric representation) has great research potential, yet the corresponding research stays in a
not fully exploited stage.
At the beginning, we would point out What is Graph Law or Graph Parametric Representation?”.
In general, it is referred to as finding some key parameters and their relations to describe the given
graph, which description is expected to preserve the entire or part of the property of the given
graph. According to the previous research [
33
?,
34
], the rigorous graph law (or graph parametric
representation) relies two fundamental steps. The first step is to decide which parameters we use
to describe the graph, e.g., the node degree. The second step is to observe the graph in a statistical
manner and then determine the value of parameters, the distribution of parameters, the relation among
parameters, etc. For example, the relation between the possibility of a newly-arrived node connecting
to an old node (parameter 1) and the degree of that old node (parameter 2) is studied by maximum
likelihood estimation (MLE) based on the observed real-world graph data [34].
Then, we discuss why studying the graph parametric representation is important in the graph research
community nowadays with the following two concrete examples.
2
Figure 3: Organization of Graph Law Introduction.
Foundation of Graph Foundation Models. Similar to building the foundation models in
other modalities [
3
,
61
,
62
], Graph data integration is the base to support various graph AI
developments by aggregating different domain graph data for extracting non-trivial, abundant,
and useful knowledge. But its realization is very challenging, for example, unlike the independent
and identically distributed (IID) data, the distribution of graph data from different domains can
be quite different and not easy to be jointly leveraged, e.g., the graph size, the attribute dimension,
and the attribute meaning of different domain graphs place a barrier for distilling a consensus
intelligence [
44
]. Graph law acts like a potential trigger to break the inconsistency of different
domains, such that different graphs can be described under the same statistical language. With a
suite of powerful graph laws, the cross-domain graph (or subgraph) representation complexities
can be reduced to several shareable parameters, akin to Erd˝
os–Rényi graphs but accounting
for heterogeneity and temporality, such that the large-scale foundation model training among
various graph data can become promising.
Bring Graphs to Foundation Models. Whether the input graph statistical property gets well
preserved during the representation learning process has become an increasingly interesting
research question recently, and some preliminary theoretical studies [
48
] will be discussed in
the following part of this paper. Further, take the concrete application scenarios as examples, in
the climate domain, modeling the geolocation as graphs [
31
], graph neural networks (GNNs)
have been deployed for the weather forecasting and obtain the outperformance. Although
understood by machines, a follow-up question gets raised as to whether this GNN-encoded
knowledge obeys the physical law or equation of climate, i.e., does the machine understand
the climate as humans do? Thus, a possible solution is to find a way to model the graph law
in terms of physical rules, try to encode the law along with the neural representation learning
process, and see the decision variance to verify the hypothesis. In addition to natural science,
graph law also has the potential to bring human activities into the neural representation learning
process and boost task performance. For example, one recent study [
25
] shows that adding extra
local topology information into the prompt can help large language models (LLMs) achieve
textual node classification tasks with high accuracy. More discussion on graph laws and LLMs
like [21,53] can be found in Section 6.6.
1.2 Organization of this Paper
Graph law is the study of investigating the statistical properties of graphs. In this survey, we introduce
the graph laws studies in the macroscopic view and microscopic view, plus multiple angles like
low-order and high-order connections, static and dynamic graphs, as shown in Table 1and Figure 3.
Macroscopic and Microscopic Views. The macroscopic graph laws describe the graph prop-
erties in a global view, like how the total degree (or eigenvalues) distribution of the entire
graph looks like [
33
]; while the microscopic laws try to focus on the individual behavior and
investigate their behaviors as part in the entire graph [34].
Low-Order and High-Order Connections. Most graph laws are based on the node-level
connections (i.e., low-order connections), while some graph law investigations are based on
the group activities (high-order connections), i.e., motif in [
40
,
57
], hyperedge in [
10
,
29
], and
simplex in [6,9].
Static and Dynamic Graphs. Compared with static graphs, dynamic graphs allow the graph
components like topology structure and node attributes to evolve over time. Correspondingly,
some graph laws study how the graph parameters change over time and their temporal relations.
Note that, in some graph research, the dynamics are created by the algorithms, like adding virtual
3
Table 1: A summary of parameteric representations of graphs. Some laws have multiple aspects and
are indexed by numbers in parentheses.
Input Law Parameter Scope Order Temporality Description
Graphs
Densification Law [33] Density degree αMacro Low Dynamic e(t)n(t)α, α [1,2],e(t)is # edges at t
Shrinking Law [33] Effective diameter dMacro Low Dynamic dt+1 < dt,ddecreases as network grows
Motif Differing Law(1) [40] Numbers of similar motifs nMacro High Dynamic n1=n2for different domains
Motif Differing Law(2) [40] Motif occurring timestamp tMacro High Dynamic t1=t2for different motifs
Egonet Differing Law [6] Features of Egonets XMacro High Static X1=X2for different domains
Simplicial Closure Law [6] Simplicial closure probability pMacro High Static pincreases with additional edges or tie strength
Spectral Power Law(1) [14] Degree, SVD, eigen distributions Macro High Static These distributions usually follow power-law
Spectral Power Law(2) [14] Degree, SVD, eigen distributions Macro High Static If one follow power-law, usually others follow
Edge Attachment Law(1) [34] Node degree d, edge create pe(d)Micro Low Dynamic pe(d)dfor node with degree d
Edge Attachment Law(2) [34] Node age a(u), edge create pe(d)Micro Low Dynamic pe(d)seems to be non-decreasing with a(u)
Triangle Closure Law(1) [24] Triangular connections e1, e2, e3Micro Low Dynamic Strong e3unlikely e1/e2will be weakened
Triangle Closure Law(2) [24] Triangular connections e1, e2, e3Micro Low Dynamic Strong e1/e2unlikely they will be weakened
Local Closure Law [54] Local closure coefficient H(u)Micro Low Static Please refer to Section 4for details
Spectral Density Law [11] Density of states µ(λ)Macro High Static Please refer to Section 4for details
Motif Activity Law(1) [57] Motif type Micro High Dynamic Motifs do not transit from one type to another
Motif Activity Law(2) [57] Motif re-appear rate Micro High Dynamic Motifs re-appear with configured rates
Hypergraphs
Degree Distribution Law [10] Node degree, edge link probability Macro High Dynamic High-degree nodes are likely to form new links
SVD Distribution Law [10] Singular value distribution Macro High Static Singular value distribution usually heavy-tailed
Diminishing Overlaps [29] density of interactions DoI(H(t)) Macro High Dynamic Overall hyperedge overlaps decrease over time
Densification Law [29] Density degree αMacro High Dynamic e(t)n(t)α, α 1,e(t)is # hyperedges at t
Shrinking Law [29] Hypergraph effective diameter dMacro High Dynamic dt+1 < dt,ddecreases as network grows
Edge Interacting Law [9] Edge interacting rate Micro High Dynamic Temporally adjacent interactions highly similar
Heterographs
Densification Law [47] Density degree α, # meta-path Macro Low Dynamic e(t)n(t)α, α 1for some meta-path
Non-densification Law [47] Density degree α, # meta-path Macro Low Dynamic Maybe, for some meta-path, e(t)∝ n(t)α
nodes to preserve graph representations [
1
,
37
]; this kind of research is beyond the discussion of
this paper, and we focus on the graph law with natural time.
Moreover, in Section 4, we introduce different observation spaces and newly proposed parameters,
before the corresponding laws are discovered.
Furthermore, in Section 5, we survey different real-world applications that would benefit from the
guidance of graph laws, such as graph generation, link prediction, and natural language processing.
Finally, in Section 6, we conclude the survey with current challenges and 6 future directions for graph
law research.
2 Macroscopic Graph Laws
In this section, we introduce the graph laws from the macroscope and microscope. In detail, we
will introduce what is the intuition of researchers proposing or using graph statistical properties as
parameters and how they fit the value of parameters against real-world observations.
Several classical theories model the growth of graphs, for example, Barabasi-Albert model [
4
,
5
]
assumes that the graphs follow the uniform growth pattern in terms of the number of nodes, and the
Bass model [
38
] and the Susceptible-Infected model [
2
] follow the Sigmoid growth (more random
graph models can be founded in [
12
]). However, these pre-defined graph growths have been tested
that they could not handle the complex real-world network growth patterns very well [
30
,
56
]. To this
end, researchers begin to fit the graph growth on real-world networks directly, to discover graph laws.
4
2.1 Low-Order Macroscopic Laws
Based on fitting nine real-world temporal graphs from four different domains, the authors in [
33
] found
two temporal graph laws, called (1) Densification Laws and (2) Shrinking Diameters, respectively.
First, the densification law states as follows.
e(t)n(t)α(1)
where
e(t)
denotes the number of edges at time
t
,
n(t)
denotes the number of nodes at time
t
,
α[1,2]
is an exponent representing the density degree. The second law, shrinking diameters, states
that the effective diameter is decreasing as the network grows, in most cases. Here, the diameter
means the node-pair shortest distance, and the effective diameter of the graph means the minimum
distance
d
such that approximately 90% of all connected pairs are reachable by a path of length at
most
d
. Later, in [
56
], the densification law gets in-depth confirmed on four different real social
networks, the research shows that the number of nodes and number of edges both grown exponentially
with time, i.e., following the power-law distribution.
2.2 High-Order Macroscopic Laws
Above discoveries are based on the node-level connections (i.e., low-order connections), then several
researchers start the investigation based on the group activities, for example, motifs [
40
], simplices [
6
]
and hyperedges [
10
,
29
]. Motif is defined as a subgraph induced by a sequence of selected temporal
edges in [
40
], where the authors discovered that different domain networks have significantly different
numbers of similar motifs, and different motifs usually occur at different time. Similar laws are also
discovered in [
6
] that the authors study 19 graph data sets from domains like biology, medicine, social
networks, and the web, to characterize how high-order structure emerges and differs in different
domains. They discovered that the higher-order Egonet features can discriminate the domain of the
graph, and the probability of simplicial closure events typically increases with additional edges or tie
strength.
In hypergraphs, each hyperedge could connect an arbitrary number of nodes, rather than two [
10
],
where the authors found that real-world static hypergraphs obey the following properties: (1) Giant
Connected Components, that there is a connected component comprising a large proportion of nodes,
and this proportion is significantly larger than that of the second-largest connected component. (2)
Heavy-Tailed Degree Distributions, that high-degree nodes are more likely to form new links. (3)
Small Effective Diameters, that most connected pairs can be reached by a small distance (4) High
Clustering Coefficients, that the global average of local clustering coefficient is high. (5) Skewed
Singularvalue Distributions, that the singular-value distribution is usually heavy-tailed. Later, the
evolution of real-word hypergraphs is investigated in [29], and the following laws are discovered.
Diminishing Overlaps: The overall overlaps of hyperedges decrease over time.
Densification: The average degrees increase over time.
Shrinking Diameter: The effective diameters decrease over time.
To be specific, given a hypergraph
G(t)=(V(t), E(t))
, the density of interactions is stated as
follows.
DoI(G(t)) = | {{ei, ej} | eiej=for ei, ejE(t)} |
{{ei, ej}|ei, ejE(t)}(2)
and the densification is stated as follows.
|E(t)|∝|V(t)|s(3)
where s > 1stands for the density term.
In heterogeneous information networks (where nodes and edges can have multiple types), the power
law distribution is also discovered [
47
]. For example, for the triplet "author-paper-venue" (i.e., A-P-
V), the number of authors is power law distributed w.r.t the number of A-P-V instances composed by
an author.
3 Microscopic Graph Laws
In contrast to representing the whole distribution of the entire graph, many researchers try to model
individual behavior and investigate how they interact with each other to see the evolution pattern
microscopically.
5
3.1 Low-Order Microscopic Laws
In [
34
], the authors view temporal graphs in a three-fold process, i.e., node arrival (determining how
many nodes will be added), edge initiation (how many edges will be added), and edge destination
(where are each edge will be added). They ignore the deletion of nodes and edges, and they assign
variables (models) to parameterize this process.
Edge Attachment with Locality (an inserted edge closing an open triangle): It is responsible for
the edge destination.
Node Lifetime and Time Gap between Emitting Edges: It is responsible for edge initiation.
Node Arrival Rate: It is responsible for the node arrival.
To model the individual behaviors, there are many candidate models for selection. For example,
in edge attachment, the probability of a newcomer
u
to connect the node
v
can be proportional
to
v
’s current degree or
v
’s current age or the combination. Based on fitting each model to the
real-world observation under the supervision of MLE principle, the authors empirically choose the
random-random
model for edge attachment with locality, i.e., first, let node
u
choose a neighbor
v
uniformly and let vuniform randomly choose u’s neighbor wto close a triangle. And node lifetime
and time gap between emitting edges are defined as follows.
a(u) = td(u)(u)t1(u)(4)
where
a(u)
stand for the age of node
u
,
tk(u)
is the time when node
u
links its
kth
edge,
dt(u)
denote
the degree of node uat time t, and d(u) = dT(u).Tis the final timestamp of the data.
δu(d) = td+1(u)td(u)(5)
where
δu(d)
records the time gap between the current time and the time when that node emits its
last edge. Finding the node arrival is a regression process in [
34
], for example, in Flickr graph
N(t) = exp(0.25t), and N(t) = 3900t2+ 76000t130000 in LinkedIn graph.
In [
41
,
52
], the selection of edge attachment gets flourished where the authors propose several variants
of edge attachment models for preserving the graph properties. With respect to the triangle closure
phenomenon, several in-depth researches follow up. For example, in [
24
], researchers found that (1)
the stronger the third tie (the interaction frequency of the closed edge) is, the less likely the first two
ties are weakened; (2) when the stronger the first two ties are, the more likely they are weakened.
3.2 High-Order Microscopic Laws
Hypergraph ego-network [
9
] is a structure defined to model the high-order interactions involving an
individual node. The star ego-network T(u)is defined as follows.
T(u) = {s: (us)},sS(6)
where S is the set of all hyperedges (or simplices). Also, in [
9
], there are other hypergraph ego-
networks, like radial ego-network
R(u)
and contracted ego-network
C(u)
. The relationship between
them is as follows.
T(u)R(u)C(u)(7)
In [
9
], authors observe that contiguous hyperedges (simplices) in an ego-network tend to have
relatively large interactions with each other, which suggests that temporally adjacent high-order
interactions have high similarity, i.e., the same nodes tend to appear in neighboring simplices.
In [
57
], authors try to model the temporal graph growth in terms of motif evolution activities. In
brief, this paper investigates how many motifs change and what are the exact motif types in each time
interval and fits the arrival rate parameter of each type of motif against the whole observed temporal
graph.
4 Some New Observation Space and Newly Discovered Graph Parameters
4.1 New Different Spaces
In [
14
], the power law is revisited based on the eigendecomposition and singular value decomposition
to provide guidance on the presence of power laws in terms of the degree distribution, singular
6
value (of adjacency matrix) distribution, and the eigenvalue (of Laplacian matrix) distribution. The
authors [
14
] discovered that (1) degree distribution, singular value distribution, and eigenvalue
distribution follow power law distribution in many real-world networks they collected; (2) and a
significant power law distribution of degrees usually indicates power law distributed singular values
and power law distributed eigenvalues with a high probability.
4.2 New Parameters
Currently, if not all, most graph law research focuses on the traditional graph properties, like the
number of nodes, number of edges, degrees, diameters, eigenvalues, and singular values. Here, we
provide some recently proposed graph properties, although they have not yet been tested on the scale
for fitting the graph law on real-world networks.
The local closure coefficient [
54
] is defined as the fraction of length-2 paths (wedges) emanating
from the head node (of the wedge) that induce a triangle, i.e., starting from a seed node of a wedge,
how many wedges are closed. According to [
54
], features extracted within the constraints of the local
closure coefficient can improve the link prediction accuracy. The local closure efficient of node
u
is
defined as follows.
H(u) = 2T(u)
Wh(u)
where
W(h)(u)
is the number of wedges where
u
stands for the head of the wedge, and
T(u)
denotes
the number of triangles that contain node u.
The density of states (or spectral density) [11] is defined as follows.
µ(λ) = 1
N
N
X
i=1
δ(λλi),Zf(λ)µ(λ) = trace(f(H)) (8)
where
H
denotes any symmetric graph matrix,
λ1, . . . , λN
denote the eigenvalues of
H
in the
ascending order, δstands for the Dirac delta function and fis any analytic test function.
5 Law-Guided Research Tasks
The discovered graph laws describe the graph property, which provides guidance to many down-
streaming tasks. Some examples are discussed below.
5.1 Graph Generation
If not all, in most of graph law studies [
10
,
29
,
33
,
34
,
41
,
56
,
57
], after the law (i.e., evolution pattern)
is discovered, a follow-up action is to propose the corresponding graph generative model to test
whether there is a realizable graph generator could generate graphs while preserving the discovered
law in terms of graph properties. Also, graph generation tasks have impactful application scenarios
like drug design and protein discovery [59].
For example, in [
33
], the Forest Fire model is proposed to preserve the macroscopic graph law while
larges preserve the discovered evolution pattern.
First, node vfirst chooses an ambassador (i.e., node w) uniformly random, and establish a link
to w;
Second, node
v
generates a random value
x
, and selects
x
links of node
w
, where selecting
in-links rtimes less than out-links;
Third, node
v
forms links to
w
’s neighbors; this step executes recursively (neighbors of neigh-
bors) until vdies out.
This proposed Forest Fire model holds the following graph properties most of time.
Heavy-tailed In-degrees: The highly linked nodes can easily get reached, i.e., “rich get richer”.
Communities: A newcomer copies neighbors of its ambassador.
Heavy-tailed Out-degrees: The recursive nature produces large out-degree.
7
Densification Law: A newcomer will have a lot of links near the community of its ambassador.
Shrinking Diameter: It may not always hold.
In [
34
], authors combine the microscopic edge destination model, edge initiation model, and node
arrival rate together, to model the real-world temporal network’s growth. The parameters of these
three models are fitted against the partial observation. i.e.,
GT
2
, which is the half of the entire evolving
graph. Then they three produce the residual part of
G
T
. Finally, the generated
G
T
is compared with
the ground truth
GT
, to see if the growth pattern is fully or near fully captured by these microscopic
models. The procedures are stated as follows.
First, nodes arrive using the node arrival function obtained from GT
2;
Second, node uarrives and samples its lifetime afrom the age distribution of GT
2;
Third, node uadds the first edge to node vwith probability proportional to node v’s degree;
Fourth, node uwith degree dsamples a time gap δfrom the distribution of time gap in GT
2;
When a node wakes up, if its lifetime has not expired yet, it creates a two-hop edge using the
"random-random" triangle closing model;
If a node’s lifetime has expired, then it stops adding edges; otherwise, it repeats from Step 4.
The generated graph
G
T
is tested based on the comparison with the ground truth
GT
, in terms of
degree distribution, clustering coefficient, and diameter distribution. Taking the Flickr graph for
example, the generated graph is very similar to the ground truth with aforementioned metrics [34].
5.2 Link Prediction
To learn node representation vectors for predicting links between node pairs and contributing latent
applications like recommender systems, CAW-N [
48
] is proposed by inserting causal anonymous
walks (CAWs) into the representation learning process. The CAW is a sequence of time -aware
adjacent nodes, the authors claim that the extracted CAW sequence obeys the triadic closure law.
To be specific, the temporal opening and closed triangles can be preserved in the extracted CAW
sequence
W
. Further, to realize the inductive link prediction, CAW-N replaces the identification of
each node in
W
with the relative position information, such that the CAW sequence
W
is transferred
into anonymous
ˆ
W
. Then, the entire
ˆ
W
is inserted into an RNN-like model and gets the embedding
vector of each node, the loss function states as follows.
enc(ˆ
W) = RNN({f1(ICAW (wi)) f2(ti1ti)}i=0,1,...,|ˆ
W|)(9)
where
ICAW (wi)
is the anonymous identification of node
i
in
ˆ
W
,
f1
is the node embedding function
realized by a multi-layer perceptron,
f2
is the time kernel function for representing a discrete time
by a vector, and
denotes the concatenation operation. The training loss comes from predicting
negative (disconnected) node pairs and positive (connected) node pairs.
Also, there are some related link prediction models based on the guidance of static graph laws during
the representation learning process, for example, SEAL [58] and HHNE [47].
In the SEAL framework [
58
], for each target link, SEAL extracts a local enclosing subgraph around
it, and uses a GNN to learn general graph structure features for link prediction. The corresponding
graph parameters include but are not limited to
Common Neighbors: Number of common neighbors of two nodes.
Jaccard: Jaccard similarity on the set of neighbors of two nodes.
Preferential Attachment: The product of the cardinal of the sets of neighbors of two nodes.
Katz Index: The summarization over the collection of paths of two nodes.
5.3 Natural Language Processing
To obtain the semantic representation vector of each word in the corpus, GloVe [
42
] is proposed,
which has been considered as one of the most popular word embedding models. GloVe utilizes the
8
power law distribution constraint during the representation learning process.
Xij
denotes the number
of times that word joccurs in the context of word i, and it follows
Xij =k
(rij )α(10)
where
rij
denotes the frequency rank of the word pair
i
and
j
in the whole corpus, and
k
and
α
are
constant hyperparameters. Then the loss function of GloVe is stated as follows
J=
V
X
i,j
f(Xij )(w
iwj+bi+bjlogXij )2(11)
where wis the word vector, and bis the bias vector.
6 Future Directions
In this section, we would like to list several interesting research directions of graph parametric
representation in modern graph research.
6.1 Graph Laws on Temporal Graphs
Discovering accurate temporal graph laws from real-world networks heavily relies on the number
of networks and the size of networks (e.g., number of nodes, number of edges, and time duration).
However, some of the temporal graph law studies mentioned above usually consider the number
of graphs ranging from 10 to 20, when they discover the evolution pattern. The existence of time-
dependent structure and feature information increases the difficulty of collecting real-world temporal
graph data. To obtain robust and accurate (temporal) graph laws, we may need a considerably large
amount of (temporal) network data available. Luckily, we have seen some pioneering work like
TGB [26] and TUDataset [39].
6.2 Graph Laws on Heterogeneous Networks
Though many graph laws have been proposed and verified on homogeneous graphs, real-world
networks are usually heterogeneous [
43
] and contain a large number of interacting, multi-typed com-
ponents. While the existing work [
47
] only studied 2 datasets to propose and verify the heterogeneous
graph power law, the potential exists for a transition in graph laws from homogeneous networks
to heterogeneous networks, suggesting the presence of additional parameters contributing to the
comprehensive information within heterogeneous networks. For example, in an academic network,
the paper citation subgraph and the author collaboration subgraph may have their own subgraph
laws which affect other subgraphs’ laws. Furthermore, Knowledge graphs, as a special group of
heterogeneous networks, have not yet attracted much attention from the research community to study
their laws.
6.3 Transferability of Graph Laws
As we can see in the front part of the paper, many nascent graph laws are described verbally without
the exact mathematical expression, which hinders the transfer from the graph law to the numerical
constraints for the representation learning process. One latent reason for this phenomenon is that
selecting appropriate models and parameters and fitting the exact values of parameters on large
evolving graphs are very computationally demanding.
6.4 Taxonomy of Graph Laws
After we discovered many graph laws, is there any taxonomy or hierarchy of those? For example,
graph law A stands in the superclass of graph law B, and when we preserve graph law A during the
representation, we actually have already preserved graph law B. For example, there is a hierarchy of
different computer vision tasks, recently discovered [
55
]. And corresponding research on graph law
development seems like a promising direction.
9
6.5 Domain-Specific Graph Laws
Since graphs serve as general data representations with extreme diversity, it is challenging to find
universal graph laws that fit all graph domains because each domain may be internally different from
another [
60
]. In fact, in many cases, we have prior knowledge about the domain of a graph, which
can be a social network, a protein network, or a transportation network. Thus, it is possible to study
the domain-specific graph laws that work well on only a portion of graphs and then apply the graph
laws only on those graphs.
6.6 Graph Laws with LLMs
In the background of large language models (LLMs) developments, an interesting question attracts
much research interest nowadays, i.e., can LLMs replace GNNs as the backbone model for graphs?
To answer this question, many recent works show the great efforts [
21
,
25
,
53
], where the key point
is how to represent the structural information as the input for LLMs.
For example, Instruct-GLM [
53
] follows the manner of instruction tuning and makes the template
T
of a 2-hop connection for a central node vas follows.
T(v, A) = {v}is connected with {|v2|v2∈Av
2}within two hops. (12)
where Av
krepresents the list of node v’s k-hop neighbors.
As discussed above, the topological information (e.g., 1-hop or 2-hop connections) can serve as
external modality information to contribute to (e.g., through prompting) the reasoning ability of large
language models (LLMs) [
25
] and achieve state-of-the-art on low-order tasks like node classification
and link prediction.
Therefore, a natural question can be asked, i.e., instead of inputting local topological information
to LLMs, how can we bring global topological information for LLMs to understand and make
inferences for high-order tasks like graph classification, graph matching, and graph alignment?
To the best of our knowledge, corresponding research still remains nascent but has great potential.
Finding a proper graph parametric representation in a macroscopic way may be a viable solution for
LLMs to comprehend graph-level information.
7 Conclusion
Within the survey, we first review the concepts and developing progress of graph parametric repre-
sentations (i.e., graph laws) from different perspectives like microscope and microscope, low-order
and high-order connections, and static and temporal graphs. We then discuss various real-world
application tasks that can benefit the study of graph parametric representations. Finally, we envision
the latent challenges and opportunities of graph parametric representations in modern graph research
with several interesting and possible future directions.
References
[1]
Ittai Abraham, Mahesh Balakrishnan, Fabian Kuhn, Dahlia Malkhi, Venugopalan Ramasubra-
manian, and Kunal Talwar. Reconstructing approximate tree metrics. In Proceedings of the
twenty-sixth annual ACM symposium on Principles of distributed computing, pages 43–52,
2007. 4
[2]
Roy M Anderson and Robert M May. Infectious diseases of humans: dynamics and control.
Oxford university press, 1992. 4
[3]
Muhammad Awais, Muzammal Naseer, Salman H. Khan, Rao Muhammad Anwer, Hisham
Cholakkal, Mubarak Shah, Ming-Hsuan Yang, and Fahad Shahbaz Khan. Foundational models
defining a new era in vision: A survey and outlook. CoRR, abs/2307.13721, 2023. doi:
10.48550/ARXIV.2307.13721. URL https://doi.org/10.48550/arXiv.2307.13721.3
[4]
Albert-László Barabási and Réka Albert. Emergence of scaling in random networks. science,
286(5439):509–512, 1999. 4
[5]
Albert-Laszlo Barabâsi, Hawoong Jeong, Zoltan Néda, Erzsebet Ravasz, Andras Schubert,
and Tamas Vicsek. Evolution of the social network of scientific collaborations. Physica A:
Statistical mechanics and its applications, 311(3-4):590–614, 2002. 4
10
[6]
Austin R. Benson, Rediet Abebe, Michael T. Schaub, Ali Jadbabaie, and Jon M. Kleinberg.
Simplicial closure and higher-order link prediction. Proc. Natl. Acad. Sci. USA, 115(48):
E11221–E11230, 2018. doi: 10.1073/pnas.1800683115. URL
https://doi.org/10.1073/
pnas.1800683115.3,4,5
[7]
Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas
Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, and
Torsten Hoefler. Graph of thoughts: Solving elaborate problems with large language mod-
els. In Michael J. Wooldridge, Jennifer G. Dy, and Sriraam Natarajan, editors, Thirty-Eighth
AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Inno-
vative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educa-
tional Advances in Artificial Intelligence, EAAI 2014, February 20-27, 2024, Vancouver,
Canada, pages 17682–17690. AAAI Press, 2024. doi: 10.1609/AAAI.V38I16.29720. URL
https://doi.org/10.1609/aaai.v38i16.29720.2
[8]
Chaoqi Chen, Yushuang Wu, Qiyuan Dai, Hong-Yu Zhou, Mutian Xu, Sibei Yang, Xiaoguang
Han, and Yizhou Yu. A survey on graph neural networks and graph transformers in computer
vision: A task-oriented perspective. CoRR, abs/2209.13232, 2022. doi: 10.48550/arXiv.2209.
13232. URL https://doi.org/10.48550/arXiv.2209.13232.2
[9]
Cazamere Comrie and Jon Kleinberg. Hypergraph ego-networks and their temporal evolution.
arXiv preprint arXiv:2112.03498, 2021. 3,4,6
[10]
Manh Tuan Do, Se-eun Yoon, Bryan Hooi, and Kijung Shin. Structural patterns and generative
models of real-world hypergraphs. In Rajesh Gupta, Yan Liu, Jiliang Tang, and B. Aditya
Prakash, editors, KDD 2020, pages 176–186. ACM, 2020. doi: 10.1145/3394486.3403060.
URL https://doi.org/10.1145/3394486.3403060.3,4,5,7
[11]
Kun Dong, Austin R. Benson, and David Bindel. Network density of states. In Ankur
Teredesai, Vipin Kumar, Ying Li, Rómer Rosales, Evimaria Terzi, and George Karypis, editors,
KDD 2019, Anchorage, AK, USA, August 4-8, 2019, pages 1152–1161. ACM, 2019. doi:
10.1145/3292500.3330891. URL https://doi.org/10.1145/3292500.3330891.4,7
[12]
Mikhail Drobyshevskiy and Denis Turdakov. Random graph modeling: A survey of the
concepts. ACM Comput. Surv., 52(6):131:1–131:36, 2020. doi: 10.1145/3369782. URL
https://doi.org/10.1145/3369782.2,4
[13]
Vijay Prakash Dwivedi and Xavier Bresson. A generalization of transformer networks to graphs.
CoRR, abs/2012.09699, 2020. URL https://arxiv.org/abs/2012.09699.2
[14]
Nicole Eikmeier and David F. Gleich. Revisiting power-law distributions in spectra of real
world networks. In KDD 2017, pages 817–826. ACM, 2017. doi: 10.1145/3097983.3098128.
URL https://doi.org/10.1145/3097983.3098128.4,6,7
[15]
Dongqi Fu and Jingrui He. DPPIN: A biological repository of dynamic protein-protein interac-
tion network data. In Shusaku Tsumoto, Yukio Ohsawa, Lei Chen, Dirk Van den Poel, Xiaohua
Hu, Yoichi Motomura, Takuya Takagi, Lingfei Wu, Ying Xie, Akihiro Abe, and Vijay Raghavan,
editors, IEEE International Conference on Big Data, Big Data 2022, Osaka, Japan, December
17-20, 2022, pages 5269–5277. IEEE, 2022. doi: 10.1109/BIGDATA55660.2022.10020904.
URL https://doi.org/10.1109/BigData55660.2022.10020904.2
[16]
Dongqi Fu and Jingrui He. Natural and artificial dynamics in graphs: Concept, progress, and
future. Frontiers in Big Data, 5, 2022. 2
[17]
Dongqi Fu, Yikun Ban, Hanghang Tong, Ross Maciejewski, and Jingrui He. DISCO: com-
prehensive and explainable disinformation detection. In Mohammad Al Hasan and Li Xiong,
editors, Proceedings of the 31st ACM International Conference on Information & Knowledge
Management, Atlanta, GA, USA, October 17-21, 2022, pages 4848–4852. ACM, 2022. doi:
10.1145/3511808.3557202. URL https://doi.org/10.1145/3511808.3557202.2
[18]
Dongqi Fu, Yada Zhu, Hanghang Tong, Kommy Weldemariam, Onkar Bhardwaj, and Jingrui
He. Generating fine-grained causality in climate time series data for forecasting and anomaly
detection. CoRR, abs/2408.04254, 2024. doi: 10.48550/ARXIV.2408.04254. URL
https:
//doi.org/10.48550/arXiv.2408.04254.2
[19]
Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In Balaji
Krishnapuram, Mohak Shah, Alexander J. Smola, Charu C. Aggarwal, Dou Shen, and Rajeev
11
Rastogi, editors, KDD 2016, pages 855–864. ACM, 2016. doi: 10.1145/2939672.2939754.
URL https://doi.org/10.1145/2939672.2939754.2
[20]
William L Hamilton. Graph representation learning. Morgan & Claypool Publishers, 2020. 2
[21]
Xiaoxin He, Xavier Bresson, Thomas Laurent, Adam Perold, Yann LeCun, and Bryan Hooi.
Harnessing explanations: Llm-to-lm interpreter for enhanced text-attributed graph representation
learning. In The Twelfth International Conference on Learning Representations, ICLR 2024,
Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024. URL
https://openreview.net/
forum?id=RXFVcynVe1.3,10
[22]
Xinrui He, Shuo Liu, Jacky Keung, and Jingrui He. Co-clustering for federated recommender
system. In Tat-Seng Chua, Chong-Wah Ngo, Ravi Kumar, Hady W. Lauw, and Roy Ka-
Wei Lee, editors, Proceedings of the ACM on Web Conference 2024, WWW 2024, Singapore,
May 13-17, 2024, pages 3821–3832. ACM, 2024. doi: 10.1145/3589334.3645626. URL
https://doi.org/10.1145/3589334.3645626.2
[23]
Chuxuan Hu, Qinghai Zhou, and Hanghang Tong. Genius: Subteam replacement with clustering-
based graph neural networks. In Shashi Shekhar, Vagelis Papalexakis, Jing Gao, Zhe Jiang, and
Matteo Riondato, editors, Proceedings of the 2024 SIAM International Conference on Data
Mining, SDM 2024, Houston, TX, USA, April 18-20, 2024, pages 10–18. SIAM, 2024. doi:
10.1137/1.9781611978032.2. URL https://doi.org/10.1137/1.9781611978032.2.2
[24]
Hong Huang, Yuxiao Dong, Jie Tang, Hongxia Yang, Nitesh V. Chawla, and Xiaoming Fu. Will
triadic closure strengthen ties in social networks? ACM Trans. Knowl. Discov. Data, 12(3):
30:1–30:25, 2018. doi: 10.1145/3154399. URL
https://doi.org/10.1145/3154399
.4,6
[25]
Jin Huang, Xingjian Zhang, Qiaozhu Mei, and Jiaqi Ma. Can llms effectively leverage graph
structural information: When and why. CoRR, abs/2309.16595, 2023. doi: 10.48550/ARXIV.
2309.16595. URL https://doi.org/10.48550/arXiv.2309.16595.3,10
[26]
Shenyang Huang, Farimah Poursafaei, Jacob Danovitch, Matthias Fey, Weihua Hu, Emanuele
Rossi, Jure Leskovec, Michael M. Bronstein, Guillaume Rabusseau, and Reihaneh Rabbany.
Temporal graph benchmark for machine learning on temporal graphs. CoRR, abs/2307.01026,
2023. doi: 10.48550/ARXIV.2307.01026. URL
https://doi.org/10.48550/arXiv.2307.
01026.9
[27]
Licheng Jiao, Jie Chen, Fang Liu, Shuyuan Yang, Chao You, Xu Liu, Lingling Li, and Biao
Hou. Graph representation learning meets computer vision: A survey. IEEE Trans. Artif. Intell.,
4(1):2–22, 2023. doi: 10.1109/TAI.2022.3194869. URL
https://doi.org/10.1109/TAI.
2022.3194869.2
[28]
Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional
networks. In ICLR 2017. OpenReview.net, 2017. URL
https://openreview.net/forum?
id=SJU4ayYgl.2
[29]
Yunbum Kook, Jihoon Ko, and Kijung Shin. Evolution of real-world hypergraphs: Patterns and
models without oracles. In Claudia Plant, Haixun Wang, Alfredo Cuzzocrea, Carlo Zaniolo,
and Xindong Wu, editors, ICDM 2020, Sorrento, Italy, November 17-20, 2020, pages 272–281.
IEEE, 2020. doi: 10.1109/ICDM50108.2020.00036. URL
https://doi.org/10.1109/
ICDM50108.2020.00036.3,4,5,7
[30]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue B. Moon. What is twitter, a so-
cial network or a news media? In Michael Rappa, Paul Jones, Juliana Freire, and Soumen
Chakrabarti, editors, WWW 2010, Raleigh, North Carolina, USA, April 26-30, 2010, pages
591–600. ACM, 2010. doi: 10.1145/1772690.1772751. URL
https://doi.org/10.1145/
1772690.1772751.4
[31]
Rémi Lam, Alvaro Sanchez-Gonzalez, Matthew Willson, Peter Wirnsberger, Meire Fortunato,
Alexander Pritzel, Suman V. Ravuri, Timo Ewalds, Ferran Alet, Zach Eaton-Rosen, Weihua
Hu, Alexander Merose, Stephan Hoyer, George Holland, Jacklynn Stott, Oriol Vinyals, Shakir
Mohamed, and Peter W. Battaglia. Graphcast: Learning skillful medium-range global weather
forecasting. CoRR, abs/2212.12794, 2022. doi: 10.48550/ARXIV.2212.12794. URL
https:
//doi.org/10.48550/arXiv.2212.12794.3
[32]
Lin Lan, Pinghui Wang, Rui Shi, Tingqing Liu, Juxiang Zeng, Feiyang Sun, Yang Ren, Jing
Tao, and Xiaohong Guan. Grand: A fast and accurate graph retrieval framework via knowledge
distillation. In Grace Hui Yang, Hongning Wang, Sam Han, Claudia Hauff, Guido Zuccon,
12
and Yi Zhang, editors, Proceedings of the 47th International ACM SIGIR Conference on
Research and Development in Information Retrieval, SIGIR 2024, Washington DC, USA, July
14-18, 2024, pages 1639–1648. ACM, 2024. doi: 10.1145/3626772.3657773. URL
https:
//doi.org/10.1145/3626772.3657773.2
[33]
Jure Leskovec, Jon M. Kleinberg, and Christos Faloutsos. Graphs over time: densification laws,
shrinking diameters and possible explanations. In Robert Grossman, Roberto J. Bayardo, and
Kristin P. Bennett, editors, Proceedings of the Eleventh ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, Chicago, Illinois, USA, August 21-24, 2005, pages
177–187. ACM, 2005. doi: 10.1145/1081870.1081893. URL
https://doi.org/10.1145/
1081870.1081893.2,3,4,5,7
[34]
Jure Leskovec, Lars Backstrom, Ravi Kumar, and Andrew Tomkins. Microscopic evolution of
social networks. In Ying Li, Bing Liu, and Sunita Sarawagi, editors, Proceedings of the 14th
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas,
Nevada, USA, August 24-27, 2008, pages 462–470. ACM, 2008. doi: 10.1145/1401890.1401948.
URL https://doi.org/10.1145/1401890.1401948.2,3,4,6,7,8
[35]
Zihao Li, Dongqi Fu, and Jingrui He. Everything evolves in personalized pagerank. In
Ying Ding, Jie Tang, Juan F. Sequeda, Lora Aroyo, Carlos Castillo, and Geert-Jan Houben,
editors, Proceedings of the ACM Web Conference 2023, WWW 2023, Austin, TX, USA, 30 April
2023 - 4 May 2023, pages 3342–3352. ACM, 2023. doi: 10.1145/3543507.3583474. URL
https://doi.org/10.1145/3543507.3583474.2
[36]
Zihao Li, Yuyi Ao, and Jingrui He. Sphere: Expressive and interpretable knowledge graph
embedding for set retrieval. In Grace Hui Yang, Hongning Wang, Sam Han, Claudia Hauff,
Guido Zuccon, and Yi Zhang, editors, Proceedings of the 47th International ACM SIGIR
Conference on Research and Development in Information Retrieval, SIGIR 2024, Washington
DC, USA, July 14-18, 2024, pages 2629–2634. ACM, 2024. doi: 10.1145/3626772.3657910.
URL https://doi.org/10.1145/3626772.3657910.2
[37]
Xin Liu, Jiayang Cheng, Yangqiu Song, and Xin Jiang. Boosting graph structure learning
with dummy nodes. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári,
Gang Niu, and Sivan Sabato, editors, ICML 2022, volume 162 of Proceedings of Machine
Learning Research, pages 13704–13716. PMLR, 2022. URL
https://proceedings.mlr.
press/v162/liu22d.html.4
[38]
Vijay Mahajan, Eitan Muller, and Frank M Bass. New product diffusion models in marketing:
A review and directions for research. Journal of marketing, 54(1):1–26, 1990. 4
[39]
Christopher Morris, Nils M. Kriege, Franka Bause, Kristian Kersting, Petra Mutzel, and Marion
Neumann. Tudataset: A collection of benchmark datasets for learning with graphs. In ICML
2020 Workshop on Graph Representation Learning and Beyond (GRL+ 2020), 2020. URL
www.graphlearning.io.9
[40]
Ashwin Paranjape, Austin R. Benson, and Jure Leskovec. Motifs in temporal networks. In
Maarten de Rijke, Milad Shokouhi, Andrew Tomkins, and Min Zhang, editors, WSDM 2017,
pages 601–610. ACM, 2017. doi: 10.1145/3018661.3018731. URL
https://doi.org/10.
1145/3018661.3018731.3,4,5
[41]
Himchan Park and Min-Soo Kim. Evograph: An effective and efficient graph upscaling method
for preserving graph properties. In Yike Guo and Faisal Farooq, editors, KDD 2018, London,
UK, August 19-23, 2018, pages 2051–2059. ACM, 2018. doi: 10.1145/3219819.3220123. URL
https://doi.org/10.1145/3219819.3220123.6,7
[42]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. Glove: Global vectors
for word representation. In Alessandro Moschitti, Bo Pang, and Walter Daelemans, editors,
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing,
EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest
Group of the ACL, pages 1532–1543. ACL, 2014. doi: 10.3115/v1/d14-1162. URL
https:
//doi.org/10.3115/v1/d14-1162.8
[43]
Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and Philip S. Yu. A survey of heterogeneous
information network analysis. IEEE Trans. Knowl. Data Eng., 29(1):17–37, 2017. doi: 10.
1109/TKDE.2016.2598561. URL https://doi.org/10.1109/TKDE.2016.2598561.9
13
[44]
Xiangguo Sun, Hong Cheng, Jia Li, Bo Liu, and Jihong Guan. All in one: Multi-task prompting
for graph neural networks. In Ambuj K. Singh, Yizhou Sun, Leman Akoglu, Dimitrios Gunopu-
los, Xifeng Yan, Ravi Kumar, Fatma Ozcan, and Jieping Ye, editors, Proceedings of the 29th
ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023, Long Beach,
CA, USA, August 6-10, 2023, pages 2120–2131. ACM, 2023. doi: 10.1145/3580305.3599256.
URL https://doi.org/10.1145/3580305.3599256.3
[45]
Yanran Tang, Ruihong Qiu, Hongzhi Yin, Xue Li, and Zi Huang. Caselink: Inductive graph
learning for legal case retrieval. In Grace Hui Yang, Hongning Wang, Sam Han, Claudia
Hauff, Guido Zuccon, and Yi Zhang, editors, Proceedings of the 47th International ACM SIGIR
Conference on Research and Development in Information Retrieval, SIGIR 2024, Washington
DC, USA, July 14-18, 2024, pages 2199–2209. ACM, 2024. doi: 10.1145/3626772.3657693.
URL https://doi.org/10.1145/3626772.3657693.2
[46]
Shoujin Wang, Liang Hu, Yan Wang, Xiangnan He, Quan Z. Sheng, Mehmet A. Orgun,
Longbing Cao, Francesco Ricci, and Philip S. Yu. Graph learning based recommender systems:
A review. In Zhi-Hua Zhou, editor, Proceedings of the Thirtieth International Joint Conference
on Artificial Intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19-27 August 2021,
pages 4644–4652. ijcai.org, 2021. doi: 10.24963/ijcai.2021/630. URL
https://doi.org/10.
24963/ijcai.2021/630.2
[47]
Xiao Wang, Yiding Zhang, and Chuan Shi. Hyperbolic heterogeneous information network
embedding. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The
Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth
AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu,
Hawaii, USA, January 27 - February 1, 2019, pages 5337–5344. AAAI Press, 2019. doi: 10.
1609/aaai.v33i01.33015337. URL
https://doi.org/10.1609/aaai.v33i01.33015337
.
4,5,8,9
[48]
Yanbang Wang, Yen-Yu Chang, Yunyu Liu, Jure Leskovec, and Pan Li. Inductive representation
learning in temporal networks via causal anonymous walks. In ICLR 2021. OpenReview.net,
2021. URL https://openreview.net/forum?id=KYPz4YsCPj.3,8
[49]
Tianxin Wei and Jingrui He. Comprehensive fair meta-learned recommender system. In Aidong
Zhang and Huzefa Rangwala, editors, KDD ’22: The 28th ACM SIGKDD Conference on
Knowledge Discovery and Data Mining, Washington, DC, USA, August 14 - 18, 2022, pages
1989–1999. ACM, 2022. doi: 10.1145/3534678.3539269. URL
https://doi.org/10.1145/
3534678.3539269.2
[50]
Lingfei Wu, Yu Chen, Kai Shen, Xiaojie Guo, Hanning Gao, Shucheng Li, Jian Pei, and Bo Long.
Graph neural networks for natural language processing: A survey. CoRR, abs/2106.06090, 2021.
URL https://arxiv.org/abs/2106.06090.2
[51]
Deqing Yang, Jingrui He, Huazheng Qin, Yanghua Xiao, and Wei Wang. A graph-based
recommendation across heterogeneous domains. In James Bailey, Alistair Moffat, Charu C.
Aggarwal, Maarten de Rijke, Ravi Kumar, Vanessa Murdock, Timos K. Sellis, and Jeffrey Xu Yu,
editors, Proceedings of the 24th ACM International Conference on Information and Knowledge
Management, CIKM 2015, Melbourne, VIC, Australia, October 19 - 23, 2015, pages 463–472.
ACM, 2015. doi: 10.1145/2806416.2806523. URL
https://doi.org/10.1145/2806416.
2806523.2
[52]
Yang Yang, Yuxiao Dong, and Nitesh V. Chawla. Microscopic evolution of social networks
by triad position profile. CoRR, abs/1310.1525, 2013. URL
http://arxiv.org/abs/1310.
1525.6
[53]
Ruosong Ye, Caiqi Zhang, Runhui Wang, Shuyuan Xu, and Yongfeng Zhang. Language is all
a graph needs. In Yvette Graham and Matthew Purver, editors, Findings of the Association
for Computational Linguistics: EACL 2024, St. Julian’s, Malta, March 17-22, 2024, pages
1955–1973. Association for Computational Linguistics, 2024. URL
https://aclanthology.
org/2024.findings-eacl.132.3,10
[54]
Hao Yin, Austin R. Benson, and Jure Leskovec. The local closure coefficient: A new perspective
on network clustering. In J. Shane Culpepper, Alistair Moffat, Paul N. Bennett, and Kristina
Lerman, editors, WSDM 2019, pages 303–311. ACM, 2019. doi: 10.1145/3289600.3290991.
URL https://doi.org/10.1145/3289600.3290991.4,7
14
[55]
Amir Roshan Zamir, Alexander Sax, William B. Shen, Leonidas J. Guibas, Jitendra Malik, and
Silvio Savarese. Taskonomy: Disentangling task transfer learning. In 2018 IEEE Conference
on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June
18-22, 2018, pages 3712–3722. Computer Vision Foundation / IEEE Computer Society, 2018.
doi: 10.1109/CVPR.2018.00391. URL
http://openaccess.thecvf.com/content_cvpr_
2018/html/Zamir_Taskonomy_Disentangling_Task_CVPR_2018_paper.html.9
[56]
Chengxi Zang, Peng Cui, Christos Faloutsos, and Wenwu Zhu. On power law growth of social
networks. IEEE Trans. Knowl. Data Eng., 30(9):1727–1740, 2018. doi: 10.1109/TKDE.2018.
2801844. URL https://doi.org/10.1109/TKDE.2018.2801844.4,5,7
[57]
Giselle Zeno, Timothy La Fond, and Jennifer Neville. Dynamic network modeling from
motif-activity. In Amal El Fallah Seghrouchni, Gita Sukthankar, Tie-Yan Liu, and Maarten
van Steen, editors, Companion of The 2020 Web Conference 2020, Taipei, Taiwan, April
20-24, 2020, pages 390–397. ACM / IW3C2, 2020. doi: 10.1145/3366424.3383301. URL
https://doi.org/10.1145/3366424.3383301.3,4,6,7
[58]
Muhan Zhang and Yixin Chen. Link prediction based on graph neural networks. In Samy
Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and
Roman Garnett, editors, Advances in Neural Information Processing Systems 31: Annual
Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8,
2018, Montréal, Canada, pages 5171–5181, 2018. URL
https://proceedings.neurips.
cc/paper/2018/hash/53f0d7c537d99b3824f0f99d62ea2428-Abstract.html.8
[59]
Xuan Zhang and et al. Artificial intelligence for science in quantum, atomistic, and continuum
systems. CoRR, abs/2307.08423, 2023. 2,7
[60]
Ziwei Zhang, Haoyang Li, Zeyang Zhang, Yijian Qin, Xin Wang, and Wenwu Zhu. Large graph
models: A perspective. CoRR, abs/2308.14522, 2023. doi: 10.48550/ARXIV.2308.14522. URL
https://doi.org/10.48550/arXiv.2308.14522.10
[61]
Lecheng Zheng, Baoyu Jing, Zihao Li, Hanghang Tong, and Jingrui He. Heterogeneous
contrastive learning for foundation models and beyond. In Ricardo Baeza-Yates and Francesco
Bonchi, editors, Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery
and Data Mining, KDD 2024, Barcelona, Spain, August 25-29, 2024, pages 6666–6676.
ACM, 2024. doi: 10.1145/3637528.3671454. URL
https://doi.org/10.1145/3637528.
3671454.3
[62]
Ce Zhou, Qian Li, Chen Li, Jun Yu, Yixin Liu, Guangjing Wang, Kai Zhang, Cheng Ji, Qiben
Yan, Lifang He, Hao Peng, Jianxin Li, Jia Wu, Ziwei Liu, Pengtao Xie, Caiming Xiong, Jian
Pei, Philip S. Yu, and Lichao Sun. A comprehensive survey on pretrained foundation models: A
history from BERT to chatgpt. CoRR, abs/2302.09419, 2023. doi: 10.48550/ARXIV.2302.09419.
URL https://doi.org/10.48550/arXiv.2302.09419.3
[63]
Dawei Zhou, Kangyang Wang, Nan Cao, and Jingrui He. Rare category detection on time-
evolving graphs. In Charu C. Aggarwal, Zhi-Hua Zhou, Alexander Tuzhilin, Hui Xiong, and
Xindong Wu, editors, 2015 IEEE International Conference on Data Mining, ICDM 2015,
Atlantic City, NJ, USA, November 14-17, 2015, pages 1135–1140. IEEE Computer Society,
2015. doi: 10.1109/ICDM.2015.120. URL
https://doi.org/10.1109/ICDM.2015.120
.2
[64]
Dawei Zhou, Jingrui He, Hongxia Yang, and Wei Fan. SPARC: self-paced network represen-
tation for few-shot rare category characterization. In Yike Guo and Faisal Farooq, editors,
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &
Data Mining, KDD 2018, London, UK, August 19-23, 2018, pages 2807–2816. ACM, 2018.
doi: 10.1145/3219819.3219968. URL https://doi.org/10.1145/3219819.3219968.2
[65]
Jiaru Zou, Mengyu Zhou, Tao Li, Shi Han, and Dongmei Zhang. Promptintern: Saving
inference costs by internalizing recurrent prompt during large language model fine-tuning.
CoRR, abs/2407.02211, 2024. doi: 10.48550/ARXIV.2407.02211. URL
https://doi.org/
10.48550/arXiv.2407.02211.2
15
... These models are essentially trained both semantic, inference, and generation capabilities and they have outperformed traditional methods in various complex tasks with multiple data points [1][2][3][4][5]. Such as, the most common used models in information extraction, sentiment analysis, text generation, and other tasks [6,7]; effectively promote the intelligent process of various industries. ...
... With this weighting, we are able to control the role of each client in the training process more precisely. The updated formula for the global model is Equation 6: ...
Preprint
The fast development of large language models (LLMs) and popularization of cloud computing have led to increasing concerns on privacy safeguarding and data security of cross-cloud model deployment and training as the key challenges. We present a new framework for addressing these issues along with enabling privacy preserving collaboration on training between distributed clouds based on federated learning. Our mechanism encompasses cutting-edge cryptographic primitives, dynamic model aggregation techniques, and cross-cloud data harmonization solutions to enhance security, efficiency, and scalability to the traditional federated learning paradigm. Furthermore, we proposed a hybrid aggregation scheme to mitigate the threat of Data Leakage and to optimize the aggregation of model updates, thus achieving substantial enhancement on the model effectiveness and stability. Experimental results demonstrate that the training efficiency, privacy protection, and model accuracy of the proposed model compare favorably to those of the traditional federated learning method.
... The increasing use of the large language models (LLMs) in natural language processing (NLP) and generative tasks has rendered cloud computing environment in a dominant position to provide critical infrastructure for efficient training and inference of such complex models [1][2][3][4]. Due to the excellent natural language understanding, machine translation, text generation, sentiment analysis, and other capabilities of large language models, they have been promoted to common tools in multiple industries, including business, healthcare [5,6], education, finance [7], music, and others [8][9][10][11]. Since large language models usually contain hundreds of millions or even tens of billions of parameters, training and prediction with them requires an immense amount of computing power [12]. However, these models are also demanding more and more resources including memory, storage, and bandwidth [13][14][15], resulting in dependence on high-performance computing clusters, cloud computing platforms, and distributed computing architectures. ...
Preprint
With the rapid evolution of Large Language Models (LLMs) and their large-scale experimentation in cloud-computing spaces, the challenge of guaranteeing their security and efficiency in a failure scenario has become a main issue. To ensure the reliability and availability of large-scale language models in cloud computing scenarios, such as frequent resource failures, network problems, and computational overheads, this study proposes a novel adaptive fault tolerance mechanism. It builds upon known fault-tolerant mechanisms, such as checkpointing, redundancy, and state transposition, introducing dynamic resource allocation and prediction of failure based on real-time performance metrics. The hybrid model integrates data driven deep learning-based anomaly detection technique underlining the contribution of cloud orchestration middleware for predictive prevention of system failures. Additionally, the model integrates adaptive checkpointing and recovery strategies that dynamically adapt according to load and system state to minimize the influence on the performance of the model and minimize downtime. The experimental results demonstrate that the designed model considerably enhances the fault tolerance in large-scale cloud surroundings, and decreases the system downtime by 30%\mathbf{30\%}, and has a better modeling availability than the classical fault tolerance mechanism.
... Graphs are ubiquitous and are widely used for ranking [26][27][28][29] , social network analysis [30][31][32][33][34][35][36][37][38][39], recommendation [40][41][42][43][44][45][46][47], question answering [48][49][50][51], spatial-temporal modeling [52][53][54][55] and text summarization [56] etc. Graph Representation Learning (GRL) aims to transform graph-structured data into low-dimensional representations, while preserving the graph's structural and semantic information. DGI [57] maximizes mutual information between subgraph representations and global graph summaries, enabling effective reuse for downstream node-level tasks. ...
Preprint
Graph Self-Supervised Learning (SSL) has emerged as a pivotal area of research in recent years. By engaging in pretext tasks to learn the intricate topological structures and properties of graphs using unlabeled data, these graph SSL models achieve enhanced performance, improved generalization, and heightened robustness. Despite the remarkable achievements of these graph SSL methods, their current implementation poses significant challenges for beginners and practitioners due to the complex nature of graph structures, inconsistent evaluation metrics, and concerns regarding reproducibility hinder further progress in this field. Recognizing the growing interest within the research community, there is an urgent need for a comprehensive, beginner-friendly, and accessible toolkit consisting of the most representative graph SSL algorithms. To address these challenges, we present a Graph SSL toolkit named PyG-SSL, which is built upon PyTorch and is compatible with various deep learning and scientific computing backends. Within the toolkit, we offer a unified framework encompassing dataset loading, hyper-parameter configuration, model training, and comprehensive performance evaluation for diverse downstream tasks. Moreover, we provide beginner-friendly tutorials and the best hyper-parameters of each graph SSL algorithm on different graph datasets, facilitating the reproduction of results. The GitHub repository of the library is https://github.com/iDEA-iSAIL-Lab-UIUC/pyg-ssl.
... Second, the task space of graph data could be on node-level [47,100,18,56], edge-level [69,48,2], and graph-level [64,16,103,102,49,50]. Third, in general, language tokens and visual objects retain the same conceptual meaning across different distributions, but the same graph structure may have distinct interpretations in different domains [17]. ...
Preprint
Full-text available
While great success has been achieved in building vision models with Contrastive Language-Image Pre-training (CLIP) over Internet-scale image-text pairs, building transferable Graph Neural Networks (GNNs) with CLIP pipeline is challenging because of three fundamental issues: the scarcity of labeled data and text supervision, different levels of downstream tasks, and the conceptual gaps between domains. In this work, to address these issues, we leverage multi-modal prompt learning to effectively adapt pre-trained GNN to downstream tasks and data, given only a few semantically labeled samples, each with extremely weak text supervision. Our new paradigm embeds the graphs directly in the same space as the Large Language Models (LLMs) by learning both graph prompts and text prompts simultaneously. To accomplish this, we improve state-of-the-art graph prompt method, and then propose the first graph-language multi-modal prompt learning approach for exploiting the knowledge in pre-trained models. Notably, due to the insufficient supervision for fine-tuning, in our paradigm, the pre-trained GNN and the LLM are kept frozen, so the learnable parameters are much fewer than fine-tuning any pre-trained model. Through extensive experiments on real-world datasets, we demonstrate the superior performance of our paradigm in few-shot, multi-task-level, and cross-domain settings. Moreover, we build the first CLIP-style zero-shot classification prototype that can generalize GNNs to unseen classes with extremely weak text supervision.
Preprint
Full-text available
Understanding the causal interaction of time series variables can contribute to time series data analysis for many real-world applications, such as climate forecasting and extreme weather alerts. However, causal relationships are difficult to be fully observed in real-world complex settings, such as spatial-temporal data from deployed sensor networks. Therefore, to capture fine-grained causal relations among spatial-temporal variables for further a more accurate and reliable time series analysis, we first design a conceptual fine-grained causal model named TBN Granger Causality, which adds time-respecting Bayesian Networks to the previous time-lagged Neural Granger Causality to offset the instantaneous effects. Second, we propose an end-to-end deep generative model called TacSas, which discovers TBN Granger Causality in a generative manner to help forecast time series data and detect possible anomalies during the forecast. For evaluations, besides the causality discovery benchmark Lorenz-96, we also test TacSas on climate benchmark ERA5 for climate forecasting and the extreme weather benchmark of NOAA for extreme weather alerts.
Preprint
Full-text available
Large models have emerged as the most recent groundbreaking achievements in artificial intelligence, and particularly machine learning. However, when it comes to graphs, large models have not achieved the same level of success as in other fields, such as natural language processing and computer vision. In order to promote applying large models for graphs forward, we present a perspective paper to discuss the challenges and opportunities associated with developing large graph models. First, we discuss the desired characteristics of large graph models. Then, we present detailed discussions from three key perspectives: representation basis, graph data, and graph models. In each category, we provide a brief overview of recent advances and highlight the remaining challenges together with our visions. Finally, we discuss valuable applications of large graph models. We believe this perspective paper is able to encourage further investigations into large graph models, ultimately pushing us one step closer towards artificial general intelligence (AGI).
Preprint
Large language models (LLMs) have played a fundamental role in various natural language processing tasks with powerful prompt techniques. However, in real-world applications, there are often similar prompt components for repeated queries, which causes significant computational burdens during inference. Existing prompt compression and direct fine-tuning methods aim to tackle these challenges, yet they frequently struggle to strike an optimal balance between cost-efficiency and performance effectiveness, especially in complex tasks such as NL2Code. In this paper, we propose a novel method namely PromptIntern to internalize the prompt knowledge into model parameters via progressive fine-tuning. Our method enables LLMs to emulate the human learning process for a new task, where detailed templates and examples in a prompt are gradually internalized and phased out progressively as the model grows accustomed to the task. Extensive experiments demonstrate that our method reduces inference tokens over 90%, speedups inference by 4.2 times, and saves 88.3% monetary cost.
Article
We introduce Graph of Thoughts (GoT): a framework that advances prompting capabilities in large language models (LLMs) beyond those offered by paradigms such as Chain-of-Thought or Tree of Thoughts (ToT). The key idea and primary advantage of GoT is the ability to model the information generated by an LLM as an arbitrary graph, where units of information ("LLM thoughts") are vertices, and edges correspond to dependencies between these vertices. This approach enables combining arbitrary LLM thoughts into synergistic outcomes, distilling the essence of whole networks of thoughts, or enhancing thoughts using feedback loops. We illustrate that GoT offers advantages over state of the art on different tasks, for example increasing the quality of sorting by 62% over ToT, while simultaneously reducing costs by >31%. We ensure that GoT is extensible with new thought transformations and thus can be used to spearhead new prompting schemes. This work brings the LLM reasoning closer to human thinking or brain mechanisms such as recurrence, both of which form complex networks
Article
This book combines mathematical models with extensive use of epidemiological and other data, to achieve a better understanding of the overall dynamics of populations of pathogens or parasites and their human hosts. The authors thus provide an analytical framework for evaluating public health strategies aimed at controlling or eradicating particular infections. With rising concern for programmes of primary health care against such diseases as measles, malaria, river blindness, sleeping sickness, and schistosomiasis in developing countries, and the advent of HIV/AIDS and other `emerging viruses', such a framework is increasingly important. Throughout, the mathematics is used as a tool for thinking clearly about fundamental and applied problems relating to infectious diseases. The book is divided into two major parts, one dealing with microparasites (viruses, bacteria, and protozoans) and the other with macroparasites (helminths and parasitic arthropods). Each part begins with simple models, developed in a biologically intuitive way, and then goes on to develop more complicated and realistic models as tools for public health planning. A major contribution by two of the leaders in the field, this book synthesizes previous work in this rapidly growing area with much new material, combining work scattered between the ecological and medical literature.