Content uploaded by Nikolaos Nakis

Author content

All content in this area was uploaded by Nikolaos Nakis on Mar 22, 2023

Content may be subject to copyright.

Available via license: CC BY 4.0

Content may be subject to copyright.

Characterizing Polarization in Social Networks using the Signed Relational

Latent Distance Model

Nikolaos Nakis Abdulkadir C¸ elikkanat Louis Boucherie Christian Djurhuus

Technical University

of Denmark

Technical University

of Denmark

Technical University

of Denmark

Technical University

of Denmark

Felix Burmester Daniel Mathias Holmelund Monika Frolcov´

a Morten Mørup

Technical University

of Denmark

Technical University

of Denmark

Technical University

of Denmark

Technical University

of Denmark

Abstract

Graph representation learning has become a

prominent tool for the characterization and un-

derstanding of the structure of networks in gen-

eral and social networks in particular. Typically,

these representation learning approaches embed

the networks into a low-dimensional space in

which the role of each individual can be char-

acterized in terms of their latent position. A

major current concern in social networks is the

emergence of polarization and ﬁlter bubbles pro-

moting a mindset of ”us-versus-them” that may

be deﬁned by extreme positions believed to ul-

timately lead to political violence and the ero-

sion of democracy. Such polarized networks are

typically characterized in terms of signed links

reﬂecting likes and dislikes. We propose the

Signed Latent Distance Model (SLDM) utiliz-

ing for the ﬁrst time the Skellam distribution as

a likelihood function for signed networks. We

further extend the modeling to the characteriza-

tion of distinct extreme positions by constrain-

ing the embedding space to polytopes, form-

ing the Signed Latent relational dIstance Model

(SL IM). On four real social signed networks of

polarization, we demonstrate that the models ex-

tract low-dimensional characterizations that well

predict friendships and animosity while SLIM

provides interpretable visualizations deﬁned by

extreme positions when restricting the embed-

ding space to polytopes.

Proceedings of the 26th International Conference on Artiﬁcial

Intelligence and Statistics (AISTATS) 2023, Valencia, Spain.

PMLR: Volume 206. Copyright 2023 by the author(s).

1 INTRODUCTION

For several decades, the origin and inﬂuence of political

polarization have been issues receiving considerable at-

tention both within scholarly research and the public me-

dia (Hetherington, 2009). Several studies have demon-

strated an increasing partisan polarization among the po-

litical elites, some of which rely on network science ap-

proaches, for instance, using co-voting similarity networks

and modularity to model and explain the distinct aspects

of the data (Moody and Mucha, 2013). Whereas polariza-

tion has been described in terms of communities and their

boundary properties (Guerra et al., 2013), latent distance

modeling has also been used to extract bipolar structures

(Barber´

a et al., 2015).

Ideological polarization is the distance between policy

preferences, typically of elites taking extreme stands on is-

sues whereas the electoral behavior is denoted affective po-

larization. When these extremes are portrayed as existen-

tial in the media, they typically form an ”us-versus-them”-

mindset (Dagnes, 2019). From a social network perspec-

tive, the process of polarization has been described to occur

when ”homophily and inﬂuence become self-reinforcing

when the attraction to those who are similar and differenti-

ation from those who are dissimilar entail greater openness

to inﬂuence. The result is network autocorrelation—the

tendency for people to resemble their network neighbors”

(DellaPosta et al., 2015).

To better capture ideological polarization, we turn to signed

networks. Signed networks reﬂect complex social polar-

ization better than unsigned networks because they capture

positive, negative, and neutral relationships between enti-

ties. The study of signed networks goes back to the ’50s

and was motivated by friendly and hostile social relation-

ships (Harary, 1953). Since then they have been used to

study networks of Twitter users (Keuchenius et al., 2021)

arXiv:2301.09507v3 [stat.ML] 3 Mar 2023

Characterizing Polarization in Social Networks using the Signed Relational Latent Distance Model

and US Congress members (Thomas et al., 2006), two ex-

amples of polarized social networks (Garimella and Weber,

2017; Neal, 2020).

In this paper, we focus on polarization as extreme positions

and argue that the multi-polarity of ”us-versus-them” rein-

forced by homophily and inﬂuence can be characterized by

a latent position model (i.e., the latent distance model (Hoff

et al., 2002)) of networks conﬁned to a constrained social

space formed by a polytope, what we denote a sociotope.

As such, the corners of the sociotope deﬁne distinct aspects

(i.e., poles) formed by polarized networks’ tendencies to

self-reinforce homophily by positive ties driving those who

are similar close as opposed to those that are negatively

tied being repelled. This can be revealed in terms of the

important multiple poles of social network deﬁning cor-

ners of such sociotope. Within these corners, positive in-

teractions between nodes place them in close proximity in

space thereby accounting for homophily while negative in-

teractions ”push” nodes far apart (towards opposing poles)

yielding the ”us-versus-them” effect.

The conceptual idea of polytopes as formed by pure types

can be traced back to Plato’s forms, which characterize the

physical world as a limited projection of the forms also re-

ferred to as ideal categories. Later, Carl Jung introduced

the concept of universal archetypes, described as a collec-

tive unconscious, in which he related to Plato’s forms by

describing the forms as a Jungian version of the Platonian

archetypes (Williamson, 1985). Employing the theoretical

concept of archetypes to political and ideological polariza-

tion, the archetypes could be interpreted as genuine ideolo-

gies, while the ideological advocates can be expressed as a

mixture of distinct ideologies.

Archetypal Analysis (AA) is a prominent framework for

extracting polytopes in tabular data. AA was originally

proposed by Cutler and Breiman (1994) as an unsupervised

learning method that favors distinct aspects, archetypes, of

the data in which observations are characterized by con-

vex combinations (i.e., mixtures) of these archetypes as op-

posed to clustering procedures extracting prototypical ob-

servations (Mørup and Kai Hansen, 2010). AA has pre-

viously been used to model societal conﬂicts in Europe

(Beugelsdijk et al., 2022). However, given that AA was

proposed for tabular data, the applications are currently re-

stricted to non-relational data. Thus, whereas the charac-

terization of data in terms of distinct aspects and polytopes

has a long history, such representation learning approaches

have not previously been considered in the context of net-

work analysis for the extraction of polarization by several

extremes.

In the last years, representation learning of signed graphs

has gathered substantial attention, with applications such

as signed link prediction (Chiang et al., 2011), and com-

munity detection (Tzeng et al., 2020). Initial works ex-

tended the prominent random walks framework (Perozzi

et al., 2014; Grover and Leskovec, 2016) to the analysis of

signed networks. SID E (Kim et al., 2018b) exploits trun-

cated random walks on the signed graph with interaction

signs for each node pair inferred based on balance theory

(Cartwright and Harary, 1956). Balance theory is a socio-

psychological theory admitting four rules: “The friend of

my friend is my friend,” “The friend of my enemy is my

enemy,” “The enemy of my friend is my enemy,” and “The

enemy of my enemy is my friend.” POLE (Huang et al.,

2022), also utilizes balance theory-based signed random

walks to construct an auto-covariance similarity which is

used to obtain the embedding space. Neural networks have

also been adopted for the analysis of signed networks. Both

SINE (Wang et al., 2017) and SI GNE T (Islam et al., 2018)

combine balance theory and multi-layer neural networks

to learn the network embeddings while SIGNet uses tar-

geted node sampling to provide scalable inference. In ad-

dition, graph neural networks have also been studied in

the context of signed graphs. More speciﬁcally, SIGAT

(Huang et al., 2019) and SD GNN (Huang et al., 2021)

combine balance and status theory with graph attention to

learn signed network embeddings. The status theory is an-

other important socio-psychological theory for directed re-

lationships where for a source and a target node, a positive

directed connection assumes a higher status of the target,

i.e. {status(target) >status(source)}, while the inequal-

ity is opposite for a negative connection. Lastly, SLF (Xu

et al., 2019) learns multiple latent factors of the signed net-

work, modeling positive, negative, and neutral, as well as

the absence of a relationship between node pairs.

A prominent approach for graph representation learning is

the Latent Distance Model (Hoff et al., 2002) in which the

tendency of nodes to connect is deﬁned in terms of their

proximity in latent space. Notably, the LDM can express

the properties transitivity (”a friend of a friend is a friend”)

and homophily (”akin nodes tend to have links”). Re-

cently, it has been shown that LDMs can account for the

structure of networks in ultra-low dimensions (Nakis et al.,

2022, 2023; C¸ elikkanat et al., 2022). It has further been

demonstrated that an LDM of one dimension can be used

to extract bipolar network properties (Barber´

a et al., 2015).

For the modeling of signed networks for the characteri-

zation of polarization, we ﬁrst present the Signed Latent

Distance Model (SL DM). The model utilizes a likelihood

function for weighted signed links based on the Skellam

distribution (Skellam, 1946). The Skellam distribution is

the discrete probability distribution of the difference be-

tween two independent Poisson random variables. It was

introduced by John Gordon Skellam to model the dynamics

of populations (Skellam, 1946). Since then it was used in

medicine to model treatment measurements (Karlis and Nt-

zoufras, 2006), sports results (Karlis and Ntzoufras, 2008),

as well as, econometric studies (Barndorff-Nielsen et al.,

Nakis, C¸ elikkanat, Boucherie, Burmester, Djurhuus, Holmelund, Frolcov´

a, Mørup

2010). Furthermore, we introduce the Signed relational La-

tent dIstance Model (SL IM) being able to characterize the

latent social space in terms of extreme positions forming

polytopes inspired by archetypal analysis enabling archety-

pal analysis for relational data, i.e. relational AA (RAA).

We apply SLDM and SLIM on four real signed networks

believed to reﬂect polarization and demonstrate how SLIM

uncovers prominent distinct positions (poles). To the best

of our knowledge, this is the ﬁrst work to model signed

weighted networks using the Skellam distribution and the

ﬁrst time AA has been extended to relational data by lever-

aging latent position modeling approaches for the charac-

terization of polytopes in social networks. The implemen-

tation is available at: github.com/Nicknakis/SLIM RAA.

2 PROPOSED METHODOLOGY

Let G= (V,Y)be a signed graph where V={1, . . . , N }

denotes the set of vertices and Y:V2→X⊆Ris a map

indicating the weight of node pairs, such that there is an

edge (i, j)∈ V2if the weight Y(i, j )is different from 0.

In other words, E:= {(i, j)∈ V2:Y(i, j )6= 0}indicates

the set of edges of the network. Since many real networks

consist of only integer-valued edges, in this paper, we set

Xto Z, and we will call the graph undirected if the pairs

(i, j)and (j, i)represent the same link. (The directed case

is provided in the supplementary materials.) For simplicity,

yij denotes each edge weight.

2.1 The Skellam Latent Distance Model (SLDM)

Our main purpose is to learn latent node representations

{zi}i∈V ∈RKin a low dimensional space for a given

signed network G= (V,Y)(K |V|). Therefore, the

edge weights can take any integer value to represent the

positive or negative tendencies between the correspond-

ing nodes. We model these signed interactions among

the nodes using the Skellam distribution (Skellam, 1946),

which can be formulated as the difference of two indepen-

dent Poisson-distributed random variables (y=N1−N2∈

Z) with respect to the rates λ+and λ−:

P(y|λ+, λ−) = e−(λ++λ−)λ+

λ−y/2

I|y|2√λ+λ−,

where N1∼P ois(λ+)and N2∼P ois(λ−), and I|y|

is the modiﬁed Bessel function of the ﬁrst kind and order

|y|. To the best of our knowledge, the Skellam distribu-

tion has not been adapted before for modeling the network

likelihood. More speciﬁcally, we propose a novel latent

space model utilizing the Skellam distribution by adopting

the latent distance model, which was proposed originally

for undirected, and unsigned binary networks as a logistic

regression model (Hoff et al., 2002). It was later extended

to multiple generalized linear models (Hoff, 2005), includ-

ing the Poisson regression model for integer-weighted net-

works. We can formulate the negative log-likelihood of a

latent distance model under the Skellam distribution as:

L(Y) := log p(yij |λ+

ij , λ−

ij )

=X

i<j

(λ+

ij +λ−

ij )−yij

2log λ+

ij

λ−

ij !−log(I∗

ij ),

where I∗

ij := I|yij |2qλ+

ij λ−

ij . As it can be noticed,

the Skellam distribution has two rate parameters, and we

consider them to learn latent node representations {zi}i∈V

by deﬁning them as follows:

λ+

ij = exp (γi+γj− ||zi−zj||2),(1)

λ−

ij = exp (δi+δj+||zi−zj||2),(2)

where the set {γi, δi}i∈V denote the node-speciﬁc random

effect terms, and ||·||2is the Euclidean distance function.

More speciﬁcally, γi, γjrepresent the ”social” effects/reach

of a node and the tendency to form (as a receiver and

as a sender, respectively) positive interactions, expressing

positive degree heterogeneity (indicated by +as a super-

script of λ). In contrast, δi, δjprovide the ”anti-social” ef-

fect/reach of a node to form negative connections, and thus

models negative degree heterogeneity (indicated by −as a

superscript of λ).

By imposing standard normally distributed priors elemen-

twise on all model parameters θ={γ,δ,Z}, i.e., θi∼

N(0,1), We deﬁne a maximum a posteriori (MAP) esti-

mation over the model parameters, via the loss function to

be minimized (ignoring constant terms):

Loss =X

i<j λ+

ij +λ−

ij −yij

2log λ+

ij

λ−

ij !!

−X

i<j

log I|yij |2qλ+

ij λ−

ij

+ρ

2||Z||2

F+||γ||2

F+||δ||2

F,

(3)

where ||·||Fdenotes the Frobenius norm. In addition, ρis

the regularization strength with ρ= 1 yielding the adopted

normal prior with zero mean and unit variance. Impor-

tantly, by setting λ+

ij and λ−

ij based on Eq. (11) and (2),

the model effectively makes positive (weighted) links at-

tract and negative (weighted links) deter nodes from being

in proximity of each other.

2.2 Archetypal Analysis

Archetypal Analysis (AA) (Cutler and Breiman, 1994;

Mørup and Kai Hansen, 2010) is an approach developed

for the modeling of observational data in which the data is

expressed in terms of convex combinations of characteris-

tics (i.e. archetypes). The deﬁnition of the embedded data

Characterizing Polarization in Social Networks using the Signed Relational Latent Distance Model

points is given as follows:

X≈XCZ s.t. cd∈∆Nand zj∈∆K(4)

where ∆Pdenotes the standard simplex in (P+ 1) dimen-

sions such that q∈∆Prequires qi≥0and kqk1= 1 (i.e.

Piqi= 1). Notably, the archetypes given by the columns

of A=XC deﬁne the corners of the extracted polytope as

convex combinations of the observations, whereas Zdeﬁne

how each observation is reconstructed as convex combina-

tions of the extracted archetypes.

Whereas archetypal analysis constrains the representation

to the convex hull of the data, other approaches to model

pure/ideal forms have been Minimal Volume (MV) ap-

proaches deﬁned by

X≈AZ s.t. vol(A) = vand zj∈∆K,(5)

in which vol(A)deﬁnes the volume of A. When Ais a

square matrix this can be deﬁned by vol(A) = |det(A)|,

see also Hart et al. (2015); Zhuang et al. (2019) for a review

on such end-member extraction procedures. A strength is

that, as opposed to AA, the approach does not require the

presence of pure observations, however, a drawback is a

need for regularization tuning to deﬁne an adequate vol-

ume (Zhuang et al., 2019) whereas the exact computation

of the volume of general polytopes requires the computa-

tion of determinants of the sum of all simplices deﬁning

the polytope (B¨

ueler et al., 2000). Importantly, Archetypal

Analysis and Minimal volume extraction procedures have

been found to identify latent polytopes deﬁning trade-offs

in which vertices of the polytopes represent maximally en-

riched distinct aspects (archetypes), allowing identiﬁcation

of tasks or prominent roles the vertices of the polytope rep-

resent (Shoval et al., 2012; Hart et al., 2015). Due to the

computational issues of regularizing high-dimensional vol-

umes and the need for careful tuning of such regularization

parameters, we presently focus on polytope extraction as

deﬁned through the AA formulation rather than the MV

formulation.

2.3 A Generative Model of Polarization

Considering a latent space for the modeling of polarization,

we presently extend the Skellam LDM and deﬁne polar-

ization as extreme positions (pure forms/archetypes) that

optimally represent the social dynamics observed in terms

of the induced polytope - what we denote a sociotope, in

which each observation is a convex combination of these

extremes. In particular, we characterize polarization in

terms of extreme positions in a latent space deﬁned as a

polytope akin to AA and MV.

In our generative model of polarization, we further suppose

that the bias terms introduced in the deﬁnitions of the Pois-

son rates, (λ+

ij , λ−

ij ), are normally distributed. Since latent

representations {zi}i∈V according to AA and MV lie in the

standard simplex set ∆K, we further assume that they fol-

low a Dirichlet distribution. Formally, we can summarize

the generative model as follows:

γi∼ N(µγ, σ 2

γ)∀i∈ V,

δi∼ N(µδ, σ 2

δ)∀i∈ V,

ak∼ N(µA, σ 2

AI)∀k∈ {1, . . . , K},

zi∼Dir(α)∀i∈ V,

λ+

ij = exp (γi+γj− kA(zi−zj)k2),

λ−

ij = exp (δi+δj+kA(zi−zj)k2),

yij ∼Skellam(λ+

ij , λ−

ij )∀(i, j)∈ V2.

According to the above generative process, positive (γ) and

negative (δ) random effects for the nodes are ﬁrst drawn,

upon which the location of extreme positions A(i.e., cor-

ners of the polytope denoted archetypes) are generated. In

addition, as the dimensionality of the latent space increases

linearly with the number of archetypes, i.e. Ais a square

matrix, with probability zero archetypes will be placed in

the interior of the convex hull of the other archetypes. Sub-

sequently, the node-speciﬁc convex combinations Zof the

generated archetypes are drawn, and ﬁnally, the weighted

signed link is generated according to the node-speciﬁc bi-

ases and distances between dyads within the polytope uti-

lizing the Skellam distribution.

2.4 The Signed Relational Latent Distance Model

For inference, we exploit how polytopes can be efﬁciently

extracted using archetypal analysis. We, therefore, de-

ﬁne the Signed Latent relational dIstance Model (SLIM)

by deﬁning a relational archetypal analysis approach en-

dowing the generative model a parameterization akin to

archetypal analysis in order to efﬁciently extract polytopes

from relational data deﬁned by signed weighted networks.

Speciﬁcally, we formulate the relational AA in the context

of the family of LDMs, as:

λ+

ij = exp (γi+γj− kA(zi−zj)k2)(6)

= exp (γi+γj− kRZC(zi−zj)k2).(7)

λ−

ij = exp (δi+δj+kA(zi−zj)k2)(8)

= exp (δi+δj+kRZC(zi−zj)k2).(9)

Notably, in the AA formulation X=RZ corresponds to

observations formed by convex combinations Zof posi-

tions given by the columns of RK×K. Furthermore, in or-

der to ensure what is used to deﬁne archetypes A=XC =

RZC corresponds to observations using these archetypes

in their reconstruction Z, we deﬁne C∈RN×Kas a gated

version of Znormalized to the simplex such that cd∈∆N

by deﬁning

cnd =(Z>◦[σ(G)]>)nd

Pn0(Z>◦[σ(G)]>)n0d

(10)

Nakis, C¸ elikkanat, Boucherie, Burmester, Djurhuus, Holmelund, Frolcov´

a, Mørup

in which ◦denotes the elementwise (Hadamard) product

and σ(G)deﬁnes the logistic sigmoid elementwise applied

to the matrix G. As a result, the extracted archetypes are

ensured to correspond to the nodes assigned the archetype,

whereas the location of the archetypes can be ﬂexibly

placed in space as deﬁned by R. By deﬁning zi=

softmax(˜

zi)we further ensure zi∈∆K.

Importantly, the loss function of Eq. (13) is adopted for

the relational AA formulation forming the SL IM, with the

prior regularization applied to the corners of the extracted

polytope A=RZC instead of the latent embeddings Z

imposing a standard elementwise normal distribution as

prior ak,k0∼ N(0,1). Furthermore, we impose a uniform

Dirichlet prior on the columns of Z, i.e. (zi∼Dir(1K),

this only contributes constant terms to the joint distribu-

tion, and therefore the maximum a posteriori (MAP) opti-

mization only constant terms. As a result, the loss function

optimized is given by Eq. (13) replacing kZk2

Fwith kAk2

F.

Complexity analysis. With SLDM/SLIM being distance

models, they scale prohibitively as O(N2)since the node

pairwise distance matrix needs to be computed. This does

not allow the analysis of large-scale networks. For that, we

adopt an unbiased estimation of the log-likelihood through

random sampling. More speciﬁcally, gradient steps are

based on the log-likelihood of the block formed by a sam-

pled (per iteration and with replacement) set Sof network

nodes. This makes inference scalable deﬁning an O(S2)

space and time complexity. More options for scalable infer-

ence of distance models have also been proposed in Nakis

et al. (2022); Raftery et al. (2012).

3 RESULTS AND DISCUSSION

We extensively evaluate the performance of our proposed

methods by comparing them to the prominent GRL ap-

proaches designed for signed networks. All experiments

regarding SLDM/SLI M have been conducted on an 8GB

NVIDIA RTX 2070 Super GPU. In addition, we adopted

the Adam optimizer Kingma and Ba (2017) with learning

rate lr = 0.05 and for 5000 iterations. The sample size

for the node set was chosen as approximately 3000 nodes

for all networks. The initialization of the SLDM/SL IM

frameworks is deterministic and based on the spectral de-

composition of the normalized Laplacian (more details are

provided in the supplementary).

Artiﬁcial networks. We ﬁrst, introduce experiments on

artiﬁcial networks, as generated by the generative process

described in Section 2.3. We create two networks express-

ing different levels of polarization. Results are presented

in Fig. 1. More speciﬁcally, sub-Figs 1a and 1e show the

ground truth latent spaces generating the networks with ad-

jacency matrices as shown by sub-Figs 1b and 1f, respec-

tively. The inferred latent spaces of the two networks are

provided in sub-Figs 1c and 1g where it is clear that the

Table 1: Network statistics; |V|: # Nodes, |Y+|: # Positive

links, |Y−|: # Negative links.

|V| |Y+| |Y−|Density

Reddit 35,776 128,182 9,639 0.0001

Twitter 10,885 238,612 12,794 0.0021

wiki-Elec 7,117 81,277 21,909 0.0020

wiki-RfA 11,332 117,982 66,839 0.0014

model successfully distinguishes the difference in the level

of polarization of the two networks. We also verify the gen-

erated networks based on the inferred parameters given by

sub-Figs 1d and 1h. We observe that the model success-

fully generates sparse networks accounting for the positive

and negative link imbalance.

Real networks. We employed four networks of varying

sizes and structures. (i)Reddit is constructed based on hy-

perlinks representing the directed connections between two

communities in a social platform (Kumar et al., 2018). (ii)

wikiRfA and (iii)wikiElec are the election networks cov-

ering the different time intervals in which nodes indicate

the users and the directed links show supporting, neutral,

and opposing votes to be selected as an administrator on

the Wikipedia platform (West et al., 2014; Leskovec et al.,

2010). Finally, (iv)Twitter is an undirected social network

built on the corpus of tweets concerning the highly polar-

ized debate about the reform of the Italian Constitution (Or-

dozgoiti et al., 2020).

In our experiments, we consider the greatest connected

component of the networks, and if the original network

is temporal, we construct the static network by summing

the weights of the links through time. For the experiments

performed on undirected graphs, we similarly combine di-

rected links to obtain the undirected version of the net-

works.

Baselines. We benchmark the performance of our pro-

posed frameworks against ﬁve prominent graph representa-

tion learning methods, designed for the analysis of signed

networks: (i) PO LE (Huang et al., 2022) which learns

the network embeddings by decomposing the signed ran-

dom walks auto-covariance similarity matrix, (ii) SLF (Xu

et al., 2019) learns embeddings that are the concatenation

of two latent factors targeting positive and negative rela-

tions, (iii) SIGAT (Huang et al., 2019) is a graph neural

network approach using graph attention to learn node em-

beddings, (iv) SI DE (Kim et al., 2018b) is another random

walk based method for signed networks, (v) SI GNE T (Is-

lam et al., 2018) is a multi-layer neural network approach

constructing a Hadamard product similarity to accommo-

date for signed proximity on the network pairwise relations.

Characterizing Polarization in Social Networks using the Signed Relational Latent Distance Model

(a) Ground Truth (b) (.017,77,23) (c) Inferred space (d) (.018,73,27)

(e) Ground Truth (f) (.012,63,37) (g) Inferred space (h) (.014,59,41)

Figure 1: Two artiﬁcially generated networks with different levels of polarization {zi∼Dir(1)(top row), and

zi∼Dir(0.1·1)(bottom row)}. Both size N= 5000 nodes and K= 3 archetypes. The ﬁrst column shows the

ﬁrst two principal components of the original latent space ˜

Z=AZ, the second column the original adjacency matrix,

while the parenthesis shows the network statistics as: (density,% of positive (blue) links,% of negative (red) links). The

third column displays the ﬁrst two principal components of the inferred latent space, and the fourth column is the SL IM

generated network based on inferred parameters. All network adjacency matrices are ordered based on zi, in terms of

maximum archetype membership and internally according to the magnitude of the corresponding archetype most used for

their reconstruction.

Table 2: Area Under Curve (AUC-ROC) scores for representation size of K= 8.

WikiElec WikiRfa Twitter Reddit

Task p@n p@z n@z p@n p@z n@z p@n p@z n@z p@n p@z n@z

POLE .809 .896 .853 .904 .921 .767 .965 .902 .922 x x x

SLF .888 .954 .952 .971 .963 .961 .914 .877 .968 .729 .955 .968

SIGAT .874 .775 .754 .944 .766 .792 .998 .875 .963 .707 .682 .712

SIDE .728 .866 .895 .869 .861 .908 .799 .843 .910 .653 .830 .892

SIG NET .841 .774 .635 .920 .736 .717 .968 .719 .891 .646 .547 .623

SLIM (OU RS ) .862 .965 .935 .956 .980 .960 .988 .963 .972 .667 .955 .978

SLDM (OU RS ) .876 .969 .936 .963 .982 .963 .986 .962 .973 .648 .951 .975

Table 3: Area Under Curve (AUC-PR) scores for representation size of K= 8.

WikiElec WikiRfa Twitter Reddit

Task p@n p@z n@z p@n p@z n@z p@n p@z n@z p@n p@z n@z

POLE .929 .922 .544 .927 .937 .779 .998 .932 .668 x x x

SLF .964 .926 .787 .983 .922 .881 .994 .870 .740 .966 .956 .850

SIGAT .960 .724 .439 .969 .646 .497 .999 .861 .582 .965 .692 .232

SIDE .907 .779 .608 .920 .806 .739 .974 .831 .469 .957 .820 .614

SIG NET .944 .670 .298 .950 .572 .417 .998 .647 .248 .956 .510 .083

SLIM (OU RS ) .953 .956 .785 .973 .969 .907 .999 .962 .813 .958 .960 .850

SLDM (OU RS ) .960 .963 .787 .977 .971 .912 .999 .963 .809 .954 .955 .846

Nakis, C¸ elikkanat, Boucherie, Burmester, Djurhuus, Holmelund, Frolcov´

a, Mørup

3.1 Link prediction

We evaluate performance considering the link prediction

task considering the ability of our model to predict links of

disconnected network pairs which should be connected, as

well as, infer the sign of these links (positive or negative).

For this, we remove/hide 20% of the total network links

while preserving connectivity on the residual network. For

the testing set, the removed edges are paired with a sample

of the same number of node pairs that are not the edges of

the original network to create zero instances. To learn the

node embeddings, we make use of the residual network.

Predictions and evaluation metrics. For our methods

we ﬁt a logistic regression classiﬁer on the concatena-

tion of the corresponding Skellam rates and log-rates, as

χij = [λ+

ij , λ−

ij ,log λ+

ij ,log λ−

ij ]. Since our Skellam like-

lihood formulation relies both on the ratio and products

of the rates, a concatenation can take advantage of a lin-

ear function of the rates, as well as, their ratio or prod-

uct as allowed from the log transformation. For the base-

lines, we use ﬁve binary operators {average, weighted L1,

weighted L2, concatenate, Hadamard product}to construct

feature vectors. For each of these feature vectors, we ﬁt a

logistic regression model (except for the Hadamard prod-

uct which is used directly for predictions). Since different

operators provide different performances, for the baselines

we choose the operator that returns the maximum perfor-

mance per individual task. As a consequence of the class

imbalances and the sparsity present in signed networks, we

adopt robust evaluation metrics, such as area-under-curve

of the receiver operating characteristic (AUC-ROC) and

precision-recall (AUC-PR) curves. Lastly, we denote with

”x” the performance of a baseline if it was unable to run

due to high memory/runtime complexity.

Link sign prediction. In this setting, we utilize the link

test set containing the negative/positive cases of removed

connections. We then ask the models to predict the sign

of the removed links. We denote the task of the link sign

prediction task as p@n. In Table 2 we provide the AUC-

ROC scores while in Table 3 the AUC-PR scores for the

undirected case. Here we observe that our proposed mod-

els outperform the baselines in most networks while be-

ing competitive in the Reddit network against SLF. This

speciﬁc baseline is the most competitive across networks

showing high and consistent performance similar to SLIM

and SLDM. Comparing now SLIM with SLDM we get

mostly on-par results, verifying that constraining the model

to a polytope still provides enough expressive capability as

the unconstrained model while allowing for accurate ex-

traction of ”extreme” positions.

Signed link prediction. The second and more challeng-

ing task is to predict removed links against disconnected

pairs of the network, as well as, infer the sign of each

link correctly. For that, the test set is split into two sub-

(a) (b)

Figure 2: wikiElec: Performance of SL IM across di-

mensions for different tasks, (a) Area-Under-Curve Re-

ceiver Operating Characteristic scores, (b) Area-Under-

Curve Precision-Recall scores. Both AUC-ROC and AUC-

PR scores are almost constant across different dimensions

sets positive/disconnected and negative/disconnected. We

then evaluate the performance of each model on those sub-

sets. The tasks of signed link prediction between positive

and zero samples are denoted as p@zwhile the negative

against zero is n@z. We summarize our results by present-

ing AUC-ROC and AUC-PR scores in Table 2 and Table 3

respectively. Once more our models outperform the base-

lines in most networks and for both versions of signed link

prediction. The SLF baseline is again the most competitive

baseline yielding on-par results for Reddit.

Directed networks. Directed network results are provided

in the supplementary. Since SLF has higher modeling

capacity it outperforms the simple model formulation of

SLDM and SLIM. For that, we explore and discuss for-

mulations allowing for more capacity in the SLDM/SLI M

model for the directed case (see supplementary).

Effect of dimensionality. In Figure 2, we provide the per-

formance across dimensions for the different downstream

task and for the wikiElec dataset. We observe that both

AUC-ROC and AUC-PR scores are almost constant across

different dimensions (note that as RK×Kdimensions for

the SL IM is given by the number of archetypes), showcas-

ing that increasing the models’ capacity (in terms of dimen-

sions) does not have a signiﬁcant effect on the performance

of these downstream tasks (similar results were observed

for all networks and most of the baselines).

Visualizations. The RAA formulation facilitates the infer-

ence of a polytope describing the distinct aspects of net-

works. Here, we visualize the latent space across K= 8

dimensions for all of the corresponding networks. To facil-

itate visualizations we use Principal Component Analysis

(PCA), and project the space based on the ﬁrst two princi-

pal components of the ﬁnal embedding matrix ˜

Z=AZ.

In addition, we provide circular plots where each archetype

of the polytope is mapped to a circle every radk=2π

Kra-

dians, with Kbeing the number of archetypes. Figure 3

contains three columns with the ﬁrst denoting the PCA-

Characterizing Polarization in Social Networks using the Signed Relational Latent Distance Model

(a) WikiElec (b) WikiElec (c) WikiElec

(d) WikiRfa (e) WikiRfa (f) WikiRfa

(g) Reddit (h) Reddit (i) Reddit

(j) Twitter (k) Twitter (l) Twitter

Figure 3: Inferred polytope visualizations for various networks. The ﬁrst column showcases the K= 8 dimensional

sociotope projected on the ﬁrst two principal components (PCA) — second and third columns provide circular plots of the

sociotope enriched with the negative (red) and positive (blue) links, respectively.

Nakis, C¸ elikkanat, Boucherie, Burmester, Djurhuus, Holmelund, Frolcov´

a, Mørup

induced space while the second and third columns corre-

spond to the circular plots enriched by the negative (red)

and positive (blue) links, respectively. We observe how the

polytope successfully uncovers extreme positional nodes.

More speciﬁcally, all networks have at least one archetype

which acts as a ”dislike” hub and at least one as a ”like”

hub. Meaning that these archetypes contain high values of

negative/positive interactions. For the wiki-RfA and Twit-

ter networks we observe archetypes of very low degree,

this is explained due to some only ”disliked” nodes being

pushed away from the main node population. These can be

regarded as ”outliers” of the sociotope. Nevertheless, such

outliers are discovered since they provide high expressive

power for the model.

Discussion. The Signed Relational Latent Distance Model

has been presented for the undirected case setting, and

we employed the Euclidean distance for both Skellam

rates λ+

ij , λ−

ij . The capacity of the current formulation

works well for undirected networks. Nevertheless, there

are alternative model formulations, and keeping the dis-

tance identical for the positive and negative rates constrains

the models’ expressive capability, especially for the di-

rected/bipartite signed network case. We therefore explore

additional model formulations such as setting the Skel-

lam rates as, λ+

ij = exp(βi+βj− ||zi−wj||2)and

λ−

ij = exp(γi+γj− ||ui−wj||2)in the supplementary

material. Under this assumption, a positive directed rela-

tionship (i→j)shows that node i”likes” node jand ”dis-

likes” node jif it is negative. The latent embedding wjis

then the receiver position for the ”likes” and ”dislikes” with

embeddings ziand uibeing the sender positions for pos-

itive and negative relationships, respectively. In this case,

we introduce three latent embeddings instead of the con-

ventional two for the undirected case. The disparity of lo-

cation ziand uihere can point out how polarity is formed

between the two regions of the latent space (Please see the

supplementary material for further discussion and results).

Another important design characteristic for the

SLDM/SLI M frameworks is the choice of the

prior/regularization of the different parameters. So

far, we did not tune any regularization strength of the

priors and simply adopted a normal distribution on the

model parameters and non-informative uniform Dirichlet

prior on Zin the case of SLIM. Potential tuning of

the priors with cross-validation is expected to boost the

performance and results.

A prominent characteristic of signed networks is the spar-

sity or, in other words, the excess of ”zero” weights

among node pairs. An intriguing direction to account for

it might be the zero-inﬂated version of the Skellam dis-

tribution (Karlis and Ntzoufras, 2008). Here essentially,

we can deﬁne a mixture model responsible for the im-

balance between cases (sign-weighted links) and controls

(neutral zero links) in the network. Such zero-inﬂated

SLDM/SLI M models can thereby deﬁne a generative pro-

cess that can straightforwardly address different levels of

network sparsity.

Whereas we consider the generalization of SL DM and

SLIM to directed networks in the supplementary, a pos-

sible future direction should consider generalizations to bi-

partite networks in which we expect the directed general-

izations to be applicable (Kim et al., 2018a; Nakis et al.,

2022). Furthermore, networks of polarization typically

evolve over time. Future work should thus investigate

how the proposed modeling framework can be extended to

characterize dynamic networks leveraging existing works

by exploring dynamic extensions of latent space model-

ing approaches, including the diffusion model of (Sarkar

and Moore, 2005) and approaches reviewed in Kim et al.

(2018a).

4 CONCLUSION AND LIMITATIONS

The proposed Skellam Latent Distance Model (SL DM)

and Signed Latent Relational Distance model (SL IM) pro-

vide easily interpretable network visualization with favor-

able performance in the link prediction tasks for weighted

signed networks. In particular, endowing the model with

a space constrained to polytopes (forming the SLIM) en-

abled us to characterize distinct aspects in terms of ex-

treme positions in the social networks akin to conventional

archetypal analysis but for graph-structured data. The

Skellam distribution is considerably beneﬁcial in modeling

signed networks, whereas the relational extension of AA

can be applied for other likelihood speciﬁcations, such as

LDMs in general. This work thereby provides a founda-

tion for using likelihoods accommodating weighted signed

networks and representations akin to AA in general for an-

alyzing networks.

The optimization for the SL DM/S LIM frameworks is a

highly non-convex problem and thus relies on the quality

of initialization in terms of convergence speed. In this re-

gard, we use a deterministic initialization based on the nor-

malized Laplacian. In addition, we observed that a max-

imum likelihood estimation of the model parameters be-

came unstable when the network contained some nodes

having only negative interactions. This is a direct conse-

quence of the presence of the distance term (exp(+||·||2))

for negative interactions, which can lead to overﬂow during

inference. Nevertheless, the adopted MAP estimation was

found to be stable across all networks. For real networks,

the generative model created an ”excess” of negative links

increasing the overall network sparsity. For that, a modiﬁed

SLIM excluding the regularization over the model param-

eters was introduced which achieved correct network spar-

sity (as shown in supplementary). Assuming priors over the

model parameters created a bias over the generated network

when compared to the ground truth network statistics.

Characterizing Polarization in Social Networks using the Signed Relational Latent Distance Model

Acknowledgements

We would like to express sincere appreciation and thank the

reviewers for their constructive feedback and their insight-

ful comments. We gratefully acknowledge the Independent

Research Fund Denmark for supporting this work [grant

number: 0136-00315B].

References

F. Atay and H. Tunc¸el G ¨

olpek. On the spectrum of the

normalized laplacian for signed graphs: Interlacing,

contraction, and replication. Linear Algebra and its

Applications, 442:165–177, 02 2014. doi: 10.1016/j.

laa.2013.08.022.

P. Barber´

a, J. T. Jost, J. Nagler, J. A. Tucker, and R. Bon-

neau. Tweeting from left to right: Is online political

communication more than an echo chamber? Psycho-

logical science, 26(10):1531–1542, 2015.

O. Barndorff-Nielsen, D. Pollard, and N. Shephard.

Integer-valued l´

evy processes and low latency ﬁnan-

cial econometrics. Quantitative Finance, 12, 10 2010.

S. Beugelsdijk, H. van Herk, and R. Maseland. The nature

of societal conﬂict in europe; an archetypal analysis of

the postmodern cosmopolitan, rural traditionalist and

urban precariat. JCMS, n/a(n/a), 2022.

B. B¨

ueler, A. Enge, and K. Fukuda. Exact volume com-

putation for polytopes: a practical study. In Poly-

topes—combinatorics and computation, pages 131–

154. Springer, 2000.

D. Cartwright and F. Harary. Structural balance: a general-

ization of heider’s theory. Psychological review, 63 5:

277–93, 1956.

A. C¸ elikkanat, N. Nakis, and M. Mørup. Piecewise-

velocity model for learning continuous-time dynamic

node representations. In The First Learning on

Graphs Conference, 2022.

K.-Y. Chiang, N. Natarajan, A. Tewari, and I. S. Dhillon.

Exploiting longer cycles for link prediction in signed

networks. In CIKM, page 1157–1162. Association for

Computing Machinery, 2011.

A. Cutler and L. Breiman. Archetypal analysis. Techno-

metrics, 36(4):338–347, 1994.

A. Dagnes. Us vs. them: Political polarization and the

politicization of everything. In Super Mad at Every-

thing All the Time, pages 119–165. Springer, 2019.

D. DellaPosta, Y. Shi, and M. Macy. Why do liberals drink

lattes? American Journal of Sociology, 120(5):1473–

1511, 2015.

V. R. K. Garimella and I. Weber. A long-term analysis of

polarization on twitter. Proceedings of the Interna-

tional AAAI Conference on Web and Social Media, 11:

528–531, May 2017.

G. H. Golub and C. F. Van Loan. Matrix Computations (3rd

Ed.). Johns Hopkins University Press, USA, 1996.

ISBN 0801854148.

A. Grover and J. Leskovec. Node2Vec: Scalable feature

learning for networks. In KDD, pages 855–864, 2016.

P. Guerra, W. Meira Jr, C. Cardie, and R. Kleinberg. A mea-

sure of polarization on social media networks based

on community boundaries. In Proceedings of the in-

ternational AAAI conference on web and social media,

volume 7, pages 215–224, 2013.

F. Harary. On the notion of balance of a signed graph.

Michigan Mathematical Journal, 2(2):143 – 146,

1953.

Y. Hart, H. Sheftel, J. Hausser, P. Szekely, N. B. Ben-

Moshe, Y. Korem, A. Tendler, A. E. Mayo, and

U. Alon. Inferring biological tasks using pareto anal-

ysis of high-dimensional data. Nature methods, 12(3):

233–235, 2015.

M. J. Hetherington. Review article: Putting polarization in

perspective. British Journal of Political Science, 39

(2):413–448, 2009.

P. D. Hoff. Bilinear mixed-effects models for dyadic data.

JASA, 100(469):286–295, 2005.

P. D. Hoff, A. E. Raftery, and M. S. Handcock. Latent space

approaches to social network analysis. JASA, 97(460):

1090–1098, 2002.

J. Huang, H. Shen, L. Hou, and X. Cheng. Signed graph

attention networks, 2019. URL https://arxiv.

org/abs/1906.10958.

J. Huang, H. Shen, L. Hou, and X. Cheng. Sdgnn: Learn-

ing node representation for signed directed networks,

2021. URL https://arxiv.org/abs/2101.

02390.

Z. Huang, A. Silva, and A. Singh. Pole: Polarized em-

bedding for signed networks. WSDM, pages 390–400,

2022.

M. R. Islam, B. Aditya Prakash, and N. Ramakrishnan.

SIGNet: Scalable embeddings for signed networks. In

D. Phung, V. S. Tseng, G. I. Webb, B. Ho, M. Ganji,

and L. Rashidi, editors, Advances in Knowledge Dis-

covery and Data Mining, pages 157–169, Cham,

2018. Springer International Publishing.

Nakis, C¸ elikkanat, Boucherie, Burmester, Djurhuus, Holmelund, Frolcov´

a, Mørup

D. Karlis and I. Ntzoufras. Bayesian analysis of the differ-

ences of count data. Statistics in medicine, 25:1885–

905, 06 2006.

D. Karlis and I. Ntzoufras. Bayesian modelling of foot-

ball outcomes: using the Skellam’s distribution for the

goal difference. IMA Journal of Management Mathe-

matics, 20(2):133–145, 2008.

A. Keuchenius, P. T¨

ornberg, and J. Uitermark. Why it is

important to consider negative ties when studying po-

larized debates: A signed network analysis of a dutch

cultural controversy on twitter. PLOS ONE, 2021.

B. Kim, K. H. Lee, L. Xue, and X. Niu. A review of dy-

namic network models with latent variables. Statistics

surveys, 12:105, 2018a.

J. Kim, H. Park, J.-E. Lee, and U. Kang. SIDE: Representa-

tion learning in signed directed networks. In Proceed-

ings of the 2018 World Wide Web Conference, page

509–518. International World Wide Web Conferences

Steering Committee, 2018b.

D. P. Kingma and J. Ba. Adam: A method for stochastic

optimization, 2017.

S. Kumar, W. L. Hamilton, J. Leskovec, and D. Juraf-

sky. Community interaction and conﬂict on the web.

In Proceedings of the 2018 World Wide Web Confer-

ence on World Wide Web, pages 933–943. Interna-

tional World Wide Web Conferences Steering Com-

mittee, 2018.

J. Leskovec, D. Huttenlocher, and J. Kleinberg. Predicting

positive and negative links in online social networks.

In WWW, page 641–650, 2010.

J. Moody and P. J. Mucha. Portrait of political party polar-

ization. Network Science, 1(1):119–121, 2013. doi:

10.1017/nws.2012.3.

M. Mørup and L. Kai Hansen. Archetypal analysis for ma-

chine learning. In 2010 IEEE International Workshop

on Machine Learning for Signal Processing, pages

172–177, 2010. doi: 10.1109/MLSP.2010.5589222.

N. Nakis, A. C¸ elikkanat, S. L. Jørgensen, and M. Mørup.

A hierarchical block distance model for ultra low-

dimensional graph representations. 2022. URL

https://arxiv.org/abs/2204.05885.

N. Nakis, A. C¸ elikkanat, and M. Mørup. HM-LDM: A

hybrid-membership latent distance model. In Com-

plex Networks and Their Applications XI, pages 350–

363, Cham, 2023. Springer International Publishing.

ISBN 978-3-031-21127-0.

Z. P. Neal. A sign of the times? weak and strong polariza-

tion in the u.s. congress, 1973–2016. Social Networks,

60:103–112, 2020.

B. Ordozgoiti, A. Matakos, and A. Gionis. Finding large

balanced subgraphs in signed networks. In Proceed-

ings of The Web Conference 2020, page 1378–1388,

2020.

B. Perozzi, R. Al-Rfou, and S. Skiena. DeepWalk:

Online learning of social representations. CoRR,

abs/1403.6652, 2014.

A. E. Raftery, X. Niu, P. D. Hoff, and K. Y. Yeung. Fast

inference for the latent space network model using a

case-control approximate likelihood. Journal of Com-

putational and Graphical Statistics, 21(4):901–919,

2012.

P. Sarkar and A. Moore. Dynamic social network analysis

using latent space models. In Y. Weiss, B. Sch¨

olkopf,

and J. Platt, editors, NeurIPS, volume 18, 2005.

O. Shoval, H. Sheftel, G. Shinar, Y. Hart, O. Ramote,

A. Mayo, E. Dekel, K. Kavanagh, and U. Alon. Evolu-

tionary trade-offs, pareto optimality, and the geometry

of phenotype space. Science, 336(6085):1157–1160,

2012.

J. G. Skellam. The frequency distribution of the difference

between two poisson variates belonging to different

populations. Journal of the Royal Statistical Society.

Series A (General), 109(Pt 3):296–296, 1946.

M. Thomas, B. Pang, and L. Lee. Get out the vote: De-

termining support or opposition from congressional

ﬂoor-debate transcripts. CoRR, abs/cs/0607062, 2006.

R.-C. Tzeng, B. Ordozgoiti, and A. Gionis. Discovering

conﬂicting groups in signed networks. In NeurIPS,

2020.

S. Wang, J. Tang, C. Aggarwal, Y. Chang, and H. Liu.

Signed Network Embedding in Social Media, pages

327–335. 2017.

R. West, H. S. Paskov, J. Leskovec, and C. Potts. Exploit-

ing social network structure for person-to-person sen-

timent analysis. TACL, 2:297–310, 2014.

E. Williamson. Plato’s ”eidos” and the archetypes of jung

and frye. Interpretations, 16(1):94–104, 1985. ISSN

0196903X.

P. Xu, J. Wu, W. Hu, and B. Du. Link prediction with

signed latent factors in signed social networks. Pro-

ceedings of the Acm Sigkdd International Conference

on Knowledge Discovery and Data Mining, pages

1046–1054, 2019.

L. Zhuang, C.-H. Lin, M. A. Figueiredo, and J. M. Bioucas-

Dias. Regularization parameter selection in mini-

mum volume hyperspectral unmixing. IEEE Trans-

actions on Geoscience and Remote Sensing, 57(12):

9858–9877, 2019.

Characterizing Polarization in Social Networks using the Signed Relational Latent Distance Model

A Directed Case Model Formulations

In this section, we describe how our proposed frameworks can be extended to the study of directed networks, and we

further explore additional model formulations allowing for more capacity and expressive power.

A.1 The Skellam Latent Distance Model for the Directed Case (LDM)

Our main purpose here is to learn two latent node representations {zi}i∈V ∈RKand {wi}i∈V ∈RKin a low dimensional

space for a given directed signed network G= (V,Y)(K |V|). The two sets of the latent embeddings correspond to

modeling directed relationships i→jof nodes, with zithe source node and wjthe target node, and vice-versa for an

oppositely directed relationship i←j. Similar to the main paper, we can formulate the negative log-likelihood of a latent

distance model under the Skellam distribution as:

L(Y) := log p(yij |λ+

ij , λ−

ij )

=X

i,j

(λ+

ij +λ−

ij )−yij

2log λ+

ij

λ−

ij !−log I|yij |2qλ+

ij λ−

ij ,

For the directed case, the Skellam distribution has two rate parameters as well, and we consider them to learn latent node

representations {zi}i∈V and {wj}j∈V ∈RKby deﬁning them as follows:

λ+

ij = exp (βi+γj− ||zi−wj||2),(11)

λ−

ij = exp (δi+j+||zi−wj||2),(12)

where the set {βi, γi, δi, i}i∈V denote the node-speciﬁc random effect terms, and ||·||2is the Euclidean distance function.

More speciﬁcally, the sender βiand the receiver γjrandom effects represent the ”social” reach of a node and the tendency

to form positive interactions, expressing positive degree heterogeneity (indicated by +as a superscript of λ). In contrast, δi

and jprovide the ”anti-social” sender and receiver effect of a node to form negative connections, and thus model negative

degree heterogeneity (indicated by −as a superscript of λ).

By imposing (as in the undirected case) standard normally distributed priors elementwise on all model parameters θ=

{β,γ,δ,,Z,W}, i.e., θi∼ N(0,1), We deﬁne a maximum a posteriori (MAP) estimation over the model parameters,

via the loss function to be minimized (ignoring constant terms):

Loss =X

i,j λ+

ij +λ−

ij −yij

2log λ+

ij

λ−

ij !!−X

i,j

log I|yij |2qλ+

ij λ−

ij

+ρ

2||Z||2

F+||W||2

F+||γ||2

F+||β||2

F+||δ||2

F+||||2

F,

(13)

where ||·||Fdenotes the Frobenius norm. In addition, ρis the regularization strength with ρ= 1 yielding the adopted

normal prior with zero mean and unit variance.

A.2 The Signed Relational Latent Distance Model for Directed Networks

We formulate the relational AA in the context of the family of LDMs and for directed networks, as:

λ+

ij = exp (βi+γj− kA(zi−wj)k2)(14)

= exp (βi+γj− kR[Z;W]C(zi−wj)k2).(15)

λ−

ij = exp (δi+j+kA(zi−wj)k2)(16)

= exp (δi+j+kR[Z;W]C(zi−wj)k2).(17)

Notably, in the AA formulation X=R[Z;W]corresponds to observations formed by the concatenations of the convex

combinations Zand Wof positions given by the columns of RK×K. Furthermore, in order to ensure what is used to deﬁne

archetypes A=XC =R[Z;W]Ccorresponds to observations using these archetypes in their reconstruction [Z;W], we

deﬁne C∈R2N×Kas a gated version of [Z;W]normalized to the simplex such that cd∈∆2Nby deﬁning

cnd =([Z;W]>◦[σ(G)]>)nd

Pn0([Z;W]>◦[σ(G)]>)n0d

(18)

Nakis, C¸ elikkanat, Boucherie, Burmester, Djurhuus, Holmelund, Frolcov´

a, Mørup

in which ◦denotes the elementwise (Hadamard) product and σ(G)deﬁnes the logistic sigmoid elementwise applied to the

matrix G. As a result, the extracted archetypes are ensured to correspond to the nodes assigned the archetype, whereas

the location of the archetypes can be ﬂexibly placed in space as deﬁned by R. By deﬁning zi= softmax(˜

zi)and

wi= softmax( ˜

wi)we further ensure zi,wi∈∆K.

As in the undirected case, the loss function of Eq. (13) is adopted for the relational AA formulation forming the SLI M,

with the prior regularization applied to the corners of the extracted polytope A=R[Z;W]Cinstead of the latent embed-

dings Z,Wimposing a standard elementwise normal distribution as prior ak,k0∼ N(0,1). Furthermore, we impose a

uniform Dirichlet prior on the columns of Z,W, i.e. (zi,wi∼Dir(1K), this only contributes constant terms to the joint

distribution. As a result, the loss function is given by Eq. (13) replacing kZk2

Fand kWk2

Fwith kAk2

Ffor the maximum a

posteriori (MAP) optimization.

A.3 Model Extensions for Additional Capacity

In the main paper, we brieﬂy introduced an additional formulation for the rates of the Skellam distribution as adopted by

our models. In this case (and for directed networks), the rates are:

λ+

ij = exp(βi+γj− ||zi−wj||2)and λ−

ij = exp(δi+j− ||ui−wj||2)(19)

In this proposition, we have adopted three latent embeddings instead of the two previously described for the directed case.

The disparity of location ziand uihere can point out how polarity is formed between the two regions of the latent space.

This model speciﬁcation introduces an additional regularization for the third embedding matrix Uin the loss function

of Equation (13). For the RAA case, we thereby deﬁne X=R[Z;U;W], i.e., as the concatenation of all three latent

positions and with C∈R3N×K.

A.4 Directed case — Results and performance

In Table 4 and Table 5, we provide the results for the directed networks against various prominent baselines. Note that

POLE is not deﬁned for the directed case while SID E failed to create embeddings for one-degree nodes. For the frame-

works, we use two additional variations for SL DM and SLIM. The ﬁrst ones are the SLDM R EG=0.01 and SLIM

RE G= 0.01, where we have used a regularization power ρ= 0.01 in Equation (13). This shows how performance is af-

fected by less regularized parameters. In addition, we also provide results for SL DM-EXP R and SL IM-EXP R which denote

the more expressive model as described in Subsection A.3. The results showcase our models’ capability to outperform the

baselines or provide competitive performance. Comparing now the SLDM and SLIM different variations we observe that

performance is boosted by just using the vanilla methods. It seems that the most important trait is the regularization power

of the model rather than the expressive capabilities that extra parameters provide to the model. Lastly, Figure 4 provides

the same visualizations as in the main paper but for the directed networks.

Table 4: Area Under Curve (AUC-ROC) scores for varying representation sizes (Directed). The symbol ’-’ denotes that

the corresponding model is not able to run on directed networks while ’x’ that the model returned errors.

WikiElec WikiRfa Reddit

Task p@n p@z n@z p@n p@z n@z p@n p@z n@z

POLE - - - - - - - - -

SLF .938 .971 .980 .991 .980 .985 .823 .974 .984

SIGAT .921 .750 .871 .988 .772 .927 .982 .713 .980

SIDE x x x x x x x x x

SIG NET .929 .907 .835 .991 .921 .873 .881 .757 .719

SLIM (OU RS ) .910 .981 .963 .984 .989 .981 .713 .973 .982

SLDM (OU RS ) .914 .977 .966 .983 .987 .978 .657 .937 .964

SLIM RE G=0.01 (O URS ) .927 .989 .980 .992 .994 .990 .827 .982 .989

SLDM RE G=0.01 (O URS ) .940 .989 .980 .984 .987 .976 .774 .982 .986

SLIM-EX PR (O URS ) .922 .984 .977 .987 .988 .982 .706 .930 .949

SLDM-EX PR (O URS ) .915 .987 .985 .989 .994 .992 .657 .965 .965

Characterizing Polarization in Social Networks using the Signed Relational Latent Distance Model

Table 5: Area Under Curve (AUC-PR) scores for varying representation sizes (Directed). The symbol ’-’ denotes that the

corresponding model is not able to run on directed networks while ’x’ that the model returned errors.

WikiElec WikiRfa Reddit

Task p@n p@z n@z p@n p@z n@z p@n p@z n@z

POLE - - - - - - - - -

SLF .981 .949 .890 .995 .954 .951 .978 .972 .919

SIGAT .977 .689 .562 .993 .685 .714 .998 .727 .659

SIDE x x x x x x x x x

SIG NET .979 .831 .577 .995 .840 .671 .988 .675 .233

SLIM (OU RS ) .971 .974 .852 .989 .981 .951 .962 .971 .874

SLDM (OU RS ) .972 .967 .862 .988 .978 .939 .952 .948 .861

SLIM RE G=0.01 (O URS ) .976 .983 .910 .995 .988 .973 .980 .982 .918

SLDM RE G=0.01 (O URS ) .981 .983 .912 .991 .976 .930 .972 .981 .911

SLIM-EX PR (O URS ) .976 .978 .914 .992 .980 .953 .958 .938 .823

SLDM-EX PR (O URS ) .973 .981 .936 .993 .987 .979 .949 .966 .871

B Initialization

For the SLDM model, we used the Eigen-decomposition of the normalized Laplacian for singed networks (Atay and

Tunc¸el G ¨

olpek, 2014). Solving the eigenproblem for a few eigenvalues can be done efﬁciently through the Lanczos method

(Golub and Van Loan, 1996), due to the high sparsity of real large-scale networks.

For SLIM, we would like to initialize matrix Abased on the convex hull of the spectral decomposition of the normalized

Laplacian. This is very costly since ﬁnding the convex hull has an exponential increase in complexity in terms of the

dimensionality of the space. For that purpose, we use the furthest sum algorithm (Mørup and Kai Hansen, 2010) to

discover guaranteed distinct aspects of the spectral space. Lastly, since we are unable to directly initialize A, we use the

furthest sum discovered points to initialize Rwhile also tuning Gfor picking up the correct points in the latent space.

C Bessel Function Approximation

We need to compute the modiﬁed Bessel function of the ﬁrst kind and of order yfor the implementation of our proposed

approach, which is deﬁned by

Iy(x)=(x

2)y

∞

X

k=0

(x2

4)k

k! Γ(y+k+ 1)

We approximate the actual value by only considering the ﬁrst 50 terms of the inﬁnite sum. Since we have small orders of

yand small values of x, the series components converge to zero quickly. We observed that taking the ﬁrst 50 components

does not affect the performance/accuracy of the model.

D Generating based on real networks

Here, we test how the model generates based on real networks. We use the wikiElec to train an K= 8 dimensional SLIM

model and we then generate a network based on the inferred parameters. Results are shown in Fig. 5 where we observe

that the generated network learns successfully the main structure of the network but generates more non-zero elements and

more negative links thereby decreasing the sparsity and increasing the percentage of negative links when compared to the

ground truth. Modifying SLIM to exclude the regularization over the model parameters achieves correct network sparsity

as shown in Fig 6 with only a 2% increase in the inferred percentage of negative links when compared to the ground

truth. Adding priors to the model creates a bias over the network generation. Lastly, the un-regularized SL IM boosted

performance in the link prediction tasks ranging from 1% to 5% for the wikiElec network. Nevertheless, priors over the

model parameters stabilize the inference when ”extreme” negative nodes exist in the network (nodes with only negative

links) that can also be considered outliers.

Nakis, C¸ elikkanat, Boucherie, Burmester, Djurhuus, Holmelund, Frolcov´

a, Mørup

(a) WikiElec (b) WikiElec (c) WikiElec

(d) WikiRfa (e) WikiRfa (f) WikiRfa

(g) Reddit (h) Reddit (i) Reddit

Figure 4: Inferred polytope visualizations for various directed networks. The ﬁrst column showcases the K= 8 di-

mensional sociotope projected on the ﬁrst two principal components (PCA) of the combined embeddings (source, target)

[Z;W]— second and third columns provide circular plots of the sociotope enriched with the negative (red) and positive

(blue) links, respectively.

E Effect of Sampling Size

In the main paper, the sample size was set to the maximum number (∼3000 nodes) that our 8GB GPU could ﬁt in memory.

Here, we provide a study on how different sample sizes affect the performance of the SLIM model. In Fig 7, we provide the

performance across different tasks for the wikiElec dataset, considering sampling size of {10%,20%,30%,40%,50%}.

We observe with small differences almost constant performance across different sampling sizes. As we decrease the

sampling size to 10% and 20% we observe some more signiﬁcant decreases in the p@ntask performance. This is because

we keep the total number of training iterations (it=5000) for all cases. Overall, smaller sampling sizes require additional

iterations to converge to the performance of the model with larger sampling sizes.

F Effect of Learning Rate

The learning rate for SL DM and S LIM was set to lr = 0.05. In Fig 8, we provide the performance across different tasks

for the wikiElec dataset, considering three different learning rates lr ∈ {0.01,0.05,0.1}. We observe that the performance

can be considered constant for the different learning rates, showing small sensitivity to the choice of this hyperparameter.

Characterizing Polarization in Social Networks using the Signed Relational Latent Distance Model

(a) Ground Truth: (.003,78%,22%) (b) Generated: (.006,63%,37%)

Figure 5: wikiElec ground truth (left) adjacency matrix and generated (right) adjacency matrix based on inferred param-

eters. The parenthesis shows the network statistics as: (density,% of positive (blue) links,% of negative (red) links). All

network adjacency matrices are ordered based on zi, in terms of maximum archetype membership and internally according

to the magnitude of the corresponding archetype most used for their reconstruction.

(a) Ground Truth: (.003,78%,22%) (b) Generated: (.003,76%,24%)

Figure 6: wikiElec ground truth (left) adjacency matrix and generated (right) adjacency matrix based on inferred param-

eters with a SL IM without regularization priors over the parameters. The parenthesis shows the network statistics as:

(density,% of positive (blue) links,% of negative (red) links). All network adjacency matrices are ordered based on zi, in

terms of maximum archetype membership and internally according to the magnitude of the corresponding archetype most

used for their reconstruction.

Nakis, C¸ elikkanat, Boucherie, Burmester, Djurhuus, Holmelund, Frolcov´

a, Mørup

(a) (b)

Figure 7: wikiElec : Performance of SLI M across sample sizes for different tasks, (a) Area-Under-Curve Receiver Operat-

ing Characteristic scores, (b) Area-Under-Curve Precision-Recall scores. Both AUC-ROC and AUC-PR scores are almost

constant across different dimensions

(a) (b)

Figure 8: wikiElec: Performance of S LIM across learning rates for different tasks, (a) Area-Under-Curve Receiver Op-

erating Characteristic scores, (b) Area-Under-Curve Precision-Recall scores. Both AUC-ROC and AUC-PR scores are

constant across different dimensions