Technical ReportPDF Available

Evaluating the psychological plausibility of word2vec and GloVe distributional semantic models

Authors:

Abstract

The representation of semantic knowledge poses a central modelling decision in many models of cognitive phenomena. However, not all such representations reflect properties observed in human semantic networks. Here, we evaluate the psychological plausibility of two distributional semantic models widely used in natural language processing: word2vec and GloVe. We use these models to construct directed and undirected semantic networks and compare them to networks of human association norms using a set of graph-theoretic analyses. Our results show that all such networks display small-world characteristics, while only undirected networks show similar degree distributions to those in the human semantic network. Directed networks also exhibit a hierarchical organization that is reminiscent of the human semantic network.
Evaluating the psychological plausibility of word2vec and GloVe distributional
semantic models
Ivana Kaji´
c, Chris Eliasmith
Centre for Theoretical Neuroscience, University of Waterloo
Waterloo, ON, Canada N2L 3G1
{i2kajic, celiasmith}@uwaterloo.ca
Abstract
The representation of semantic knowledge poses a central
modelling decision in many models of cognitive phenomena.
However, not all such representations reflect properties ob-
served in human semantic networks. Here, we evaluate the
psychological plausibility of two distributional semantic mod-
els widely used in natural language processing: word2vec and
GloVe. We use these models to construct directed and undi-
rected semantic networks and compare them to networks of hu-
man association norms using a set of graph-theoretic analyses.
Our results show that all such networks display small-world
characteristics, while only undirected networks show similar
degree distributions to those in the human semantic network.
Directed networks also exhibit a hierarchical organization that
is reminiscent of the human semantic network.
Keywords: semantic spaces, distributional semantic models,
free association norms, network analysis
Introduction
The representation of semantic knowledge is instrumental to
many models of linguistic processing in cognitive modelling
and machine learning. In particular, the decision of how to
represent such knowledge entails the selection of a vocabu-
lary and a computational representation of vocabulary items.
Many computational models studying human semantic
memory and related processes have been relying on The Uni-
versity of South Florida Free Association Norms (Nelson,
McEvoy, & Schreiber, 2004, USF Norms). Because it is a
psychologically plausible representation of a semantic net-
work, the USF Norms have been successfully used to repro-
duce human-level performance on tasks such as verbal se-
mantic search (Abbott, Austerweil, & Griffiths, 2015; Kaji´
c
et al., 2017), and recognition memory and recall (Steyvers,
Shiffrin, & Nelson, 2004).
Another common choice for the representation of semantic
knowledge is models derived from co-occurrence and word
frequency data. Such models learn vector representations of
words from large linguistic corpora and are often referred to
as spatial or distributed semantic models (DSMs).
In the domain of natural language processing,
word2vec (Mikolov, Sutskever, Chen, Corrado, & Dean,
2013) and GloVe (Pennington, Socher, & Manning, 2014)
have been two widely used DSMs. They are shown to
achieve high accuracy on a variety of lexical semantic tasks
such as word analogy and named entity recognition. Their
capacity to perform well on such tasks also makes them
attractive candidates for semantic representations in cognitive
models. Yet, it remains unclear which, if any, aspects of such
representations are psychologically plausible.
This study evaluates psychological plausibility of GloVe
and word2vec models by analyzing semantic networks con-
structed from those models and comparing them to semantic
networks constructed from the USF Norms.
In particular, we evaluate networks in terms of their small-
world characteristics, degree distributions and hierarchical or-
ganization. We characterize networks that capture properties
of human association networks, and identify differences that
might have important implications for modelling of human
semantic memory.
Semantic Spaces
Vector-based word representations are generated to capture
statistical regularities observed in natural language. Often, a
high-dimensional co-occurrence matrix is created by count-
ing word occurrences in a set of texts or contexts. Then,
by applying a dimensionality reduction method such a ma-
trix is factorized into components that can be used to re-
construct low-dimensional vectors representing individual
words. DSMs using this approach have been known as count-
based models, with Latent Semantic Analysis (Deerwester,
Dumais, Furnas, Landauer, & Harshman, 1990, LSA) be-
ing one such prominent example. GloVe vectors used in this
work are derived from a count-based model (Pennington et
al., 2014) that is based on methods similar to LSA.
Although networks created from LSA vectors have been
criticized as unable to reproduce connectivity patterns be-
tween words as observed in the USF Norms (Steyvers &
Tenenbaum, 2005), this has been challenged by more re-
cent work demonstrating that some DSMs indeed produce
degree distributions that resemble that of human association
norms (Utsumi, 2015).
In contrast to count-based models such as LSA, more
recently developed predictive models use iterative training
procedures in complex neural networks to learn word vec-
tors based on the contexts in which those words occur.
Word2vec (Mikolov et al., 2013) is a popular predictive
model that computes vectors from a large corpus of text by
maximizing the probability of a target, which can either be a
single word or a set of context words.
In this work, we use pre-trained GloVe and word2vec vec-
tors. GloVe vectors were trained using the Common Crawl
dataset containing approximately 840 billion word tokens.
Word2vec vectors were trained on the Google News dataset
containing about 100 billion words. In both cases, the result-
ing vectors used here contain 300 dimensions. To allow for
a comparable analyses, we restrict the size of vocabularies of
these two datasets to that of the USF Norms, which contains
5018 words.
Constructing Semantic Networks
We generate undirected and directed semantic networks
from word2vec and GloVe semantic models, and com-
pare them to corresponding networks constructed from
the USF Norms.1The cosine angle is used as a mea-
sure of similarity between two word vectors. We have
also tested inner angle as a similarity measure, but found
no major differences between the two and therefore re-
port results of analyses based on cosine similarity. The
data processing and analysis code is available online at
https://github.com/ctn-archive/kajic-cogsci2018.
Undirected Networks
We create three undirected networks: one from the human
free association norms (USF Norms), one from the word2vec
vectors (word2vec) and one from the GloVe vectors (glove).
In every network, a node represents a word and an edge be-
tween two nodes corresponds to an associative relationship
between two words. To construct the undirected version of
the USF network, we place an edge between two nodes rep-
resenting words w1and w2if an association pair (w1,w2) or
(w2,w1) occurs in the USF word association database.
To construct undirected networks from DSMs, we com-
pute the similarity between all word pairs in the correspond-
ing vocabulary. An edge is placed between two nodes in
a network if the similarity between words represented by
those nodes exceeds a certain threshold τ. To select a thresh-
old for each network, we first test a sequence of uniformly
distributed thresholds (with two decimal places) in different
ranges. Then, we select the threshold that produces a network
with the average node degree <k>that is closest to that
of the USF network. Overall, increasing the threshold had
the effect of shifting the degree distribution from the ”right”
(the regime where many nodes have many connections) to the
”left” (very sparse connectivity with most nodes having only
few connections). For the word2vec network the threshold is
τ=0.38 and for the glove network it is τ=0.53.
The degree distribution of the word2vec network was
strongly correlated (r=0.74,p<0.001) with the degree dis-
tribution of the USF network, while a moderate correlation
(r=0.49,p<0.001) was observed with the glove network.
Directed Networks
A directed network is a more accurate representation of hu-
man word associations, as it captures the directionality of as-
sociative relationship between a cue word and a target word.
Only 26.4% of all association pairs in the USF Norms are
1We will refer to all networks constructed from word2vec and
GloVe models as synthetic semantic networks, to differentiate them
from the experimentally derived USF network.
reciprocal.2To construct the directed version of the USF net-
work, a directed edge is placed between two nodes w1and w2
only if w2was an associate of a cue word w1.
We adopt two different methods to construct directed net-
works from DSMs: the k-nn method (Steyvers & Tenenbaum,
2005) and the cs-method (Utsumi, 2015). In both cases, the
local neighborhood of a node iis determined by placing out-
going edges to kother nodes that represent words most sim-
ilar to the word represented by the node i. The k-nn method
determines the number kfor each node as the number of as-
sociates of that word in the USF Norms, resulting in a net-
work that has the same out-degree distribution as the directed
USF network. The cs-method finds the smallest number kfor
which a certain empirical threshold Ris exceeded that pro-
duces the same average degree connectivity <k>as in the
directed USF network. We refer to the networks constructed
by the k-nn method as word2vec-knn and glove-knn, and those
constructed with the cs-method as word2vec-cs and glove-cs.
We observe strong and significant correlations between
degree distributions of the directed networks word2vec-knn
(r=0.69), glove-knn (r=0.83), word2vec-cs (r=0.75),
glove-cs (r=0.88,p<0.001 in all cases), and the directed
USF network.
Network Statistics
Previous research (Steyvers et al., 2004; Utsumi, 2015;
Morais, Olsson, & Schooler, 2013) has identified that hu-
man association networks can be characterized in terms of
their small-world properties: although the networks are very
sparse (i.e., a node in the network is on average connected to
only a small subset of all nodes in the network), they have a
small average shortest path length L,L=1
n(n1)i,jGd(i,j),
where nis the number of nodes in the network, iand jare two
different nodes and d(i,j)is the shortest distance between the
two nodes measured as the number of edges between them.
In addition, the average clustering coefficients of such net-
works are higher than clustering coefficients of random net-
works of the same size that have the same probability of
a connection between any two nodes. The average cluster-
ing coefficient Cfor a network with nnodes is computed as
C=1
niGciwhere ciis a local clustering coefficient of a
node i, given by: ci=2ti
ki(ki1).tiis the number of triangles in
the neighborhood of the node i. A triangle is a connectivity
pattern where a node iis connected to two other nodes jand
k, and at the same time the nodes jand kare also connected.
The denominator in the equation for ciis the number of possi-
ble connections in a neighborhood of a node with the degree
ki:ki(ki1)/2. In the context of semantic networks, such
small-world structure is important for the efficient search and
retrieval of items from memory.
To test whether networks constructed from word2vec and
GloVe models exhibit small-world characteristics, we run dif-
ferent graph-theoretic analyses. The sparsity sof a network is
2A reciprocal association is one where the word w1is an asso-
ciate of a word w2and vice versa.
Table 1: Graph-theoretic statistics of networks derived from USF Norms, word2vec and GloVe vectors. The results of undi-
rected networks are presented in the first three rows. Abbreviations: L=the average shortest path length, k=average node
degree, C=the average clustering coefficient, Ck=connectivity, the number of nodes in the largest connected component
(expressed in %), D=the network diameter, m=the number of edges, n=the number of nodes, Lrnd =the average shortest
path of a randomly connected network of similar size, Crnd =the average clustering coefficient of a random network, s=the
sparsity of the network (expressed in %).
L<k>C CkD m n Lrnd Crnd s
USF undirected 3.04 22.0 0.186 100.00 5 55,236 5,018 3.03 0.004 0.44
word2vec 4.24 21.3 0.325 99.84 12 52,317 4,902 3.04 0.004 0.44
glove 4.61 22.1 0.373 98.88 12 51,244 4,632 2.99 0.005 0.48
USF directed 4.26 12.7 0.187 96.51 10 63,619 5,018 3.62 0.005 0.25
word2vec-cs 4.81 12.5 0.237 99.28 11 62,328 4,977 3.64 0.005 0.25
word2vec-knn 4.77 12.7 0.232 99.32 12 63,165 4,977 3.64 0.005 0.26
glove-cs 5.06 12.3 0.266 97.21 12 61,470 4,988 3.65 0.005 0.25
glove-knn 5.03 12.7 0.259 97.91 13 63,262 4,988 3.62 0.005 0.25
computed by dividing the average node degree <k>with the
total number of edges in the network. Other measures such as
the clustering coefficient C, the average shortest path length L
and the diameter Dhave been performed on the largest con-
nected component of each network.
The results of analyses are summarized in Table 1. Our
results for the two USF networks are consistent with the pre-
vious reports (Steyvers et al., 2004; Utsumi, 2015). Due to
the methods used to construct the networks, all synthetic net-
works have sparsity that is comparable to the sparsity of the
human association network. Also, their average shortest path
lengths are consistently higher, but still comparable to those
of the human association networks. However, in undirected
networks, the diameter of synthetic networks is more than
twice as long as that of the USF network, meaning that the
distance between the two farthest words is longer in the syn-
thetic networks than it is in the association network. Further-
more, all synthetic networks also exhibit a degree of cluster-
ing that is higher than that of the USF network. This effect is
more pronounced in the undirected versions of the network.
Degree Distributions
To obtain the distribution of degrees in a network, we count
the number of nodes with kdegrees, where kranges from
one to kmax. The kmax value denotes the highest node degree
and it is different for different networks. The distribution of
in-degrees of the directed association network is known to
follow a truncated power-law distribution P(k)eλkα, or,
in some cases, a pure power-law P(k)kα(Utsumi, 2015;
Morais et al., 2013). The power-law predicts that most nodes
in the network have a few connections, while a small number
of nodes, regarded as hubs, have a rich local neighborhood.
In our analyses of degree distributions, we first test the
plausibility of a power-law behavior using the goodness-of-
fit test. Then we test whether other heavy-tailed distributions
provide a better fit using the loglikelihood-ratio (LR) test. To
fit and evaluate different models, we use the Python powerlaw
package (Alstott, Bullmore, & Plenz, 2014).
First, we fit the empirical degree distribution to a power-
law model using the maximum likelihood estimation for the
parameter α. The fit is performed for values of k>kmin ,
where kmin was determined such that the Kolmogorov-
Smirnov (KS) distance between the empirical distribution and
the model distribution for values greater than kmin is mini-
mized. Given a model, the goodness-of-fit test uses the KS
distance between the model and the empirical distribution,
as well as the model and thousands of distributions sampled
from the model, to evaluate its plausibility. It produces a
p-value that is a fraction of sampled distributions that have
a greater KS distance than the empirical distribution. Large
p-values denote that sampled distributions are more distant
than the empirical distribution, in which case the model is
regarded as a plausible fit to the empirical data.
The LR-test is a comparative test that evaluates which of
the two distributions is more likely to generate samples from
the empirical data based on maximum likelihood functions
of each distribution. The resulting Rvalue is positive if the
first distribution is more likely, and negative otherwise. The
alternative heavy-tailed distributions we tested are: truncated
power-law, (discretized) lognormal and exponential.
Results of our analyses are summarized in Table 2. As
consistent with previous research, we find that the truncated
power-law, rather than a pure power-law, is a better descrip-
tion for the distribution of degrees for the direct USF net-
work (Morais et al., 2013; Utsumi, 2015). In addition, our
results indicate that the lognormal distribution is a plausible
model for the directed USF network, as it is not possible to
distinguish between the lognormal and truncated power law
distributions (R=0.43,p=0.67). Pure power-law is ex-
cluded as a plausible model for all our networks as p-values
for the goodness-of-fit test are all close to 0.
We also find that the truncated power-law is a plausible
model for the undirected USF network and both undirected
versions of the synthetic networks. It is important to notice
Table 2: Goodness-of-fit test for the power-law distribution and loglikelihood ratio tests evaluating plausibility of the power-law
versus other heavy-tailed distributions. The results of undirected networks are presented in the first three rows. Abbreviations:
KS =Kolmogorov-Smirnov statistic, LR =loglikelihood ratio.
Power Law Power Law vs.
Truncated Power Law
Power Law vs.
Lognormal
Power Law vs.
Exponential
KS pLR pLR pLR p
USF undirected 0.014 0.01 -1.80 0.02 -0.91 0.37 7.46 0.00
glove 0.035 0.00 -7.29 0.00 -5.03 0.00 7.82 0.00
word2vec 0.064 0.00 -7.72 0.00 -5.31 0.00 -5.89 0.00
USF directed 0.055 0.00 -2.54 0.00 -2.36 0.02 0.27 0.79
glove-cs 0.016 0.00 -1.14 0.17 -0.67 0.50 3.52 0.00
glove-knn 0.020 0.00 -1.05 0.16 -0.82 0.41 1.08 0.28
word2vec-cs 0.032 0.00 -0.57 0.40 -0.51 0.61 0.29 0.77
word2vec-knn 0.028 0.00 -0.13 0.82 -0.12 0.90 0.69 0.49
that the exponential distribution, as well as the lognormal dis-
tribution cannot be ruled out for the undirected word2vec net-
work. However, due to the high p-values of the LR tests it is
not possible to reach similar conclusions for the degree dis-
tributions of the directed synthetic networks.
To better understand these numerical results, we plot the
empirical data and model fits on a semi-log scale in Fig-
ure 1. Degree distributions are expressed as complementary
cumulative distribution functions, and fits for the power-law,
truncated power-law, lognormal, and exponential models are
shown. The scarcity of nodes with high degrees (>80) in cer-
tain variants of synthetic graphs such as glove-knn, word2vec-
cs and word2vec-knn are likely to contribute to large p-values
in LR tests in Table 2. While the distribution of degrees of
USF networks is bounded from above by the power-law dis-
tribution and from below by the exponential distribution, this
is only somewhat the case for the glove networks and less
so for the word2vec networks, indicating differences between
degree distributions of human and synthetic word networks.
Hierarchical Topology
Human association networks have been shown to exhibit a hi-
erarchical organization resulting from the high modularity of
the network (Utsumi, 2015). Such modules, or clusters, are
highly interconnected groups of nodes that form only a few
connections to nodes that are not part of the group. The pres-
ence of such clusters indicates that there are features shared
among nodes in the network, such as semantic or lexical re-
latedness.
While the average clustering coefficients are reported in
Table 1, to investigate the presence of hierarchical struc-
ture, we consider the relationship between a node degree and
local clustering coefficients ci(Ravasz & Barab´
asi, 2003).
In networks that exhibit hierarchical organization, the lo-
cal clustering coefficient is dependent on the node degree
and has been observed to follow a scaling law of the form
C(k)kγ(Ravasz & Barab´
asi, 2003). While many hierar-
chical networks have been observed to have γ=1, hierarchi-
cal structure has also been observed in networks with γ<1.
To investigate whether the tendency for clustering is depen-
dent on the node degree, we implement methods proposed
by Utsumi (2015). We first compute local clustering coef-
ficients for all nodes in the largest connected component in
each network. Then, we compute the average clustering coef-
ficient for each neighborhood size kand connect those values
to form a line. Finally, we use linear regression in the log-
arithmic space to determine the slope of the regression line
and the correlation coefficient.
The results are shown in Figure 2. First, we confirm that
the directed USF network exhibits hierarchical organization
with γ=0.75 (r=0.97). We also found strong negative
correlation between the size of a neighborhood and the av-
erage clustering coefficient for the undirected USF network
(γ=0.76,r=0.97). For undirected versions of synthetic
networks, we find a small positive slope γ=0.05 and a pos-
itive correlation (r=0.27) for the glove network, and simi-
larly γ=0.10 (r=0.49) for the word2vec network. There-
fore, there is no dependency between the local clustering co-
efficients and the node degree in the undirected versions of
synthetic semantic networks.
In contrast, some of the directed semantic networks ex-
hibit higher levels of hierarchical organization. Directed net-
works constructed with the k-nn method have negative slopes
with strong correlations: γ=0.49 (r=0.96) for glove-knn
and γ=0.39 (r=0.91) for word2vec-knn. The hierarchi-
cal relationship is less apparent in networks constructed with
the cs-method (glove-cs:γ=0.32,r=0.88, word2vec-cs:
γ=0.17,r=0.54).
Discussion
The goal of the present study is to evaluate the psychological
plausibility of semantic networks constructed from the widely
used word2vec and GloVe distributional semantic models.
To this end, a number of graph-theoretic analyses were per-
Figure 1: Complementary cumulative distributions and model fits for the USF Norms,glove and word2vec networks. Distribu-
tions and fits for undirected networks are shown in the first three plots in the first row. The x-axis for each network is bounded
by kmin and kmax that are unique for each network (see text for details).
formed that compared undirected and directed versions of
networks with semantic networks constructed from human
association norms.
We found that all networks exhibit the small-world prop-
erty, characterized by short path lengths and high clustering
coefficients. In other words, it is possible to efficiently search
in such networks as any two words in a network are only a
few words apart. These results are consistent with previous
studies that demonstrated small-world structure in different
DSMs (Steyvers & Tenenbaum, 2005; Utsumi, 2015).
Degree distribution analyses based on a goodness-of-fit test
revealed that the power-law is not a plausible model in any
of the networks. This finding may not be as surprising con-
sidering that some semantic networks have distributions that
can be described well with alternative heavy-tailed distribu-
tions (Morais et al., 2013; Utsumi, 2015). We contribute to
the existing research by adding that the truncated power-law
is a plausible explanation of the degree distribution for the
undirected USF network. The synthetic undirected networks
were also explained best by the truncated power-law. How-
ever, the lognormal and the exponential distribution are also
a plausible fit for the undirected word2vec network.
Truncated power-law behavior could not be inferred for the
directed word2vec and glove networks. Analyzing the tails of
distributions in Fig. 1 provides more insight as to why it is
difficult to obtain a clear fit in those cases. Directed networks
have only very few nodes with a high number of connections.
For example, there are only four nodes with k>80 for glove-
knn and less than ten nodes with k>55 for both word2vec
networks. What distinguishes different heavy-tailed distribu-
tions are nodes ”contained” in the tail of a distribution, and
in this case it is possible that the LR test did not have enough
data to reliably discriminate between different distributions.
We found that directed networks, more specifically those
constructed with the k-nn method, exhibit a moderate level
of hierarchical organization that is reminiscent of cluster-
ing observed in the human association network. The di-
rected glove-cs and glove-knn networks contain nodes with
high connectivity that exhibit fluctuations in clustering coeffi-
cients, as observed with the network created from association
norms. Such nodes are hubs, some of which have higher clus-
tering coefficients since they are embedded in clusters. Hubs
with lower clustering coefficients act as intermediaries in the
network by connecting different modules of the network that
have less connections.
Overall, these results indicate that different semantic net-
works constructed from word2vec and GloVe models are ca-
pable of capturing some aspects of human association net-
works. However, for most synthetic datasets there are clear
differences with the empirical networks. The glove-knn net-
Figure 2: Scatter plots of local clustering coefficients. Every blue dot represents a local clustering coefficient of a node with the
degree k. The red line connects averages. The first three plots in the first row are obtained from undirected networks.
work exhibits properties that are most similar to those of the
USF Norms, but future work should address methods that
yield a greater number of nodes with rich neighborhoods.
Acknowledgments
The authors would like to thank Terry Stewart for useful com-
ments and discussions. This work has been supported by
AFOSR, grant number FA9550-17-1-002.
References
Abbott, J. T., Austerweil, J. L., & Griffiths, T. L. (2015).
Random walks on semantic networks can resemble optimal
foraging. Psyc. Rev.,122(3), 558.
Alstott, J., Bullmore, E., & Plenz, D. (2014). powerlaw: a
python package for analysis of heavy-tailed distributions.
PloS One,9(1), e85777.
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K.,
& Harshman, R. (1990). Indexing by latent semantic anal-
ysis. Journal of the Am. soc.: for inf. sci.,41(6), 391.
Kaji´
c, I., Gosmann, J., Komer, B., Orr, R. W., Stewart, T. C.,
& Eliasmith, C. (2017). A biologically constrained model
of semantic memory search. In Proceedings of the 39th
Annual Conference of the Cognitive Science Society (pp.
631–636). Austin, TX: Cognitive Science Society.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean,
J. (2013). Distributed representations of words and phrases
and their compositionality. In Advances in Neural Infor-
mation Processing Systems 26 (pp. 3111–3119). Curran
Associates, Inc.
Morais, A. S., Olsson, H., & Schooler, L. J. (2013). Mapping
the structure of semantic memory. Cog. Sci.,37(1), 125–
145.
Nelson, D. L., McEvoy, C. L., & Schreiber, T. A. (2004).
The University of South Florida free association, rhyme,
and word fragment norms. Behavior Research Methods,
Instruments, & Computers,36(3), 402-407.
Pennington, J., Socher, R., & Manning, C. (2014). Glove:
Global vectors for word representation. In Proceedings of
the 2014 conference on empirical methods in natural lan-
guage processing (EMNLP) (pp. 1532–1543).
Ravasz, E., & Barab´
asi, A.-L. (2003). Hierarchical organiza-
tion in complex networks. Phys. Rev. E,67(2), 026112.
Steyvers, M., Shiffrin, R. M., & Nelson, D. L. (2004). Word
association spaces for predicting semantic similarity effects
in episodic memory. In Experimental Cognitive Psychol-
ogy and its Applications (pp. 237–249). American Psycho-
logical Association.
Steyvers, M., & Tenenbaum, J. B. (2005). The large-scale
structure of semantic networks: Statistical analyses and a
model of semantic growth. Cog. Sci.,29(1), 41–78.
Utsumi, A. (2015). A complex network approach to distribu-
tional semantic models. PloS One,10(8), e0136277.
... Semantic networks derived from free association responses have been used in evaluating the cognitive plausibility of various types of distributional models (Griffiths, Steyvers, & Tenenbaum, 2007;Gruenenfelder et al., 2015;Jones et al., 2018;Kajić & Eliasmith, 2018;Utsumi, 2015). In the study by Griffiths, Steyvers and Tenenbaum (2007), the authors compared the networks generated from the LSA and Topic models, in terms of their relation to the networks extracted from the USF norms. ...
... In the study by Kajić & Eliasmith (2018), the authors examined the "small-world" and "scale-free" properties of networks derived from the Skip-gram and GloVe models, comparing them to those of networks obtained from the USF norms. Directed and undirected versions of networks were constructed, and word neighbourhoods were determined using the two methods described in the studies by Utsumi (2015) and Griffiths, Steyvers and Tenenbaum (2007). ...
Conference Paper
Full-text available
Motivated by the widespread use of distributional models of semantics within the cognitive science community, we follow a computational modelling approach in order to better understand and expand the applicability of such models, as well as to test potential ways in which they can be improved and extended. We review evidence in favour of the assumption that distributional models capture important aspects of semantic cognition. We look at the models’ ability to account for behavioural data and fMRI patterns of brain activity, and investigate the structure of model-based, semantic networks. We test whether introducing affective information, obtained from a neural network model designed to predict emojis from co-occurring text, can improve the performance of linguistic and linguistic-visual models of semantics, in accounting for similarity/relatedness ratings. We find that adding visual and affective representations improves performance, especially for concrete and abstract words, respectively. We describe a processing model based on distributional semantics, in which activation spreads throughout a semantic network, as dictated by the patterns of semantic similarity between words. We show that the activation profile of the network, measured at various time points, can account for response time and accuracies in lexical and semantic decision tasks, as well as for concreteness/imageability and similarity/relatedness ratings. We evaluate the differences between concrete and abstract words, in terms of the structure of the semantic networks derived from distributional models of semantics. We examine how the structure is related to a number of factors that have been argued to differ between concrete and abstract words, namely imageability, age of acquisition, hedonic valence, contextual diversity, and semantic diversity. We use distributional models to explore factors that might be responsible for the poor linguistic performance of children suffering from Developmental Language Disorder. Based on the assumption that certain model parameters can be given a psychological interpretation, we start from “healthy” models, and generate “lesioned” models, by manipulating the parameters. This allows us to determine the importance of each factor, and their effects with respect to learning concrete vs abstract words.
Article
Full-text available
A number of studies on network analysis have focused on language networks based on free word association, which reflects human lexical knowledge, and have demonstrated the small-world and scale-free properties in the word association network. Nevertheless, there have been very few attempts at applying network analysis to distributional semantic models, despite the fact that these models have been studied extensively as computational or cognitive models of human lexical knowledge. In this paper, we analyze three network properties, namely, small-world, scale-free, and hierarchical properties, of semantic networks created by distributional semantic models. We demonstrate that the created networks generally exhibit the same properties as word association networks. In particular, we show that the distribution of the number of connections in these networks follows the truncated power law, which is also observed in an association network. This indicates that distributional semantic models can provide a plausible model of lexical knowledge. Additionally, the observed differences in the network properties of various implementations of distributional semantic models are consistently explained or predicted by considering the intrinsic semantic features of a word-context matrix and the functions of matrix weighting and smoothing. Furthermore, to simulate a semantic network with the observed network properties, we propose a new growing network model based on the model of Steyvers and Tenenbaum. The idea underlying the proposed model is that both preferential and random attachments are required to reflect different types of semantic relations in network growth process. We demonstrate that this model provides a better explanation of network behaviors generated by distributional semantic models.
Article
Full-text available
When people are asked to retrieve members of a category from memory, clusters of semantically related items tend to be retrieved together. A recent article by Hills, Jones, and Todd (2012) argued that this pattern reflects a process similar to optimal strategies for foraging for food in patchy spatial environments, with an individual making a strategic decision to switch away from a cluster of related information as it becomes depleted. We demonstrate that similar behavioral phenomena also emerge from a random walk on a semantic network derived from human word-association data. Random walks provide an alternative account of how people search their memories, postulating an undirected rather than a strategic search process. We show that results resembling optimal foraging are produced by random walks when related items are close together in the semantic network. These findings are reminiscent of arguments from the debate on mental imagery, showing how different processes can produce similar results when operating on different representations. (PsycINFO Database Record (c) 2015 APA, all rights reserved).
Article
Full-text available
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.
Article
Full-text available
Preexisting word knowledge is accessed in many cognitive tasks, and this article offers a means for indexing this knowledge so that it can be manipulated or controlled. We offer free association data for 72,000 word pairs, along with over a million entries of related data, such as forward and backward strength, number of competing associates, and printed frequency. A separate file contains the 5,019 normed words, their statistics, and thousands of independently normed rhyme, stem, and fragment cues. Other files provide n x n associative networks for more than 4,000 words and a list of idiosyncratic responses for each normed word. The database will be useful for investigators interested in cuing, priming, recognition, network theory, linguistics, and implicit testing applications. They also will be useful for evaluating the predictive value of free association probabilities as compared with other measures, such as similarity ratings and co-occurrence norms. Of several procedures for measuring preexisting strength between two words, the best remains to be determined. The norms may be downloaded from www.psychonomic.org/archive/.
Article
Aggregating snippets from the semantic memories of many individuals may not yield a good map of an individual's semantic memory. The authors analyze the structure of semantic networks that they sampled from individuals through a new snowball sampling paradigm during approximately 6 weeks of 1-hr daily sessions. The semantic networks of individuals have a small-world structure with short distances between words and high clustering. The distribution of links follows a power law truncated by an exponential cutoff, meaning that most words are poorly connected and a minority of words has a high, although bounded, number of connections. Existing aggregate networks mirror the individual link distributions, and so they are not scale-free, as has been previously assumed; still, there are properties of individual structure that the aggregate networks do not reflect. A simulation of the new sampling process suggests that it can uncover the true structure of an individual's semantic memory.
Article
We present statistical analyses of the large-scale structure of 3 types of semantic networks: word associations, WordNet, and Roget's Thesaurus. We show that they have a small-world structure, characterized by sparse connectivity, short average path lengths between words, and strong local clustering. In addition, the distributions of the number of connections follow power laws that indicate a scale-free pattern of connectivity, with most nodes having relatively few connections joined together through a small number of hubs with many connections. These regularities have also been found in certain other complex natural networks, such as the World Wide Web, but they are not consistent with many conventional models of semantic organization, based on inheritance hierarchies, arbitrarily structured networks, or high-dimensional vector spaces. We propose that these structures reflect the mechanisms by which semantic networks grow. We describe a simple model for semantic growth, in which each new word or concept is connected to an existing network by differentiating the connectivity pattern of an existing node. This model generates appropriate small-world statistics and power-law connectivity distributions, and it also suggests one possible mechanistic basis for the effects of learning history variables (age of acquisition, usage frequency) on behavioral performance in semantic processing tasks.
Article
Many real networks in nature and society share two generic properties: they are scale-free and they display a high degree of clustering. We show that these two features are the consequence of a hierarchical organization, implying that small groups of nodes organize in a hierarchical manner into increasingly large groups, while maintaining a scale-free topology. In hierarchical networks, the degree of clustering characterizing the different groups follows a strict scaling law, which can be used to identify the presence of a hierarchical organization in real networks. We find that several real networks, such as the Worldwideweb, actor network, the Internet at the domain level, and the semantic web obey this scaling law, indicating that hierarchy is a fundamental characteristic of many complex systems.