ArticlePDF Available

Family lexicon: Using language models to encode memories of personally familiar and famous people and places in the brain

PLOS
PLOS One
Authors:

Abstract and Figures

Knowledge about personally familiar people and places is extremely rich and varied, involving pieces of semantic information connected in unpredictable ways through past autobiographical memories. In this work, we investigate whether we can capture brain processing of personally familiar people and places using subject-specific memories, after transforming them into vectorial semantic representations using language models. First, we asked participants to provide us with the names of the closest people and places in their lives. Then we collected open-ended answers to a questionnaire, aimed at capturing various facets of declarative knowledge. We collected EEG data from the same participants while they were reading the names and subsequently mentally visualizing their referents. As a control set of stimuli, we also recorded evoked responses to a matched set of famous people and places. We then created original semantic representations for the individual entities using language models. For personally familiar entities, we used the text of the answers to the questionnaire. For famous entities, we employed their Wikipedia page, which reflects shared declarative knowledge about them. Through whole-scalp time-resolved and searchlight encoding analyses, we found that we could capture how the brain processes one’s closest people and places using person-specific answers to questionnaires, as well as famous entities. Overall encoding performance was significant in a large time window (200-800ms). Using spatio-temporal EEG searchlight, we found that we could predict brain responses significantly better than chance earlier (200-500ms) in bilateral temporo-parietal electrodes and later (500-700ms) in frontal and posterior central electrodes. We also found that XLM, a contextualized (or large) language model, provided superior encoding scores when compared with a simpler static language model as word2vec. Overall, these results indicate that language models can capture subject-specific semantic representations as they are processed in the human brain, by exploiting small-scale distributional lexical data.
This content is subject to copyright.
RESEARCH ARTICLE
Family lexicon: Using language models to
encode memories of personally familiar and
famous people and places in the brain
Andrea BrueraID
1,2
*, Massimo Poesio
1
1Max Planck Institute for Human Cognitive and Brain Sciences, Cognition and Plasticity Research Group,
Leipzig, Germany, 2Queen Mary University of London, London, United Kingdom
*bruera@cbs.mpg.de
Abstract
Knowledge about personally familiar people and places is extremely rich and varied, involv-
ing pieces of semantic information connected in unpredictable ways through past autobio-
graphical memories. In this work, we investigate whether we can capture brain processing
of personally familiar people and places using subject-specific memories, after transforming
them into vectorial semantic representations using language models. First, we asked partici-
pants to provide us with the names of the closest people and places in their lives. Then we
collected open-ended answers to a questionnaire, aimed at capturing various facets of
declarative knowledge. We collected EEG data from the same participants while they were
reading the names and subsequently mentally visualizing their referents. As a control set of
stimuli, we also recorded evoked responses to a matched set of famous people and places.
We then created original semantic representations for the individual entities using language
models. For personally familiar entities, we used the text of the answers to the question-
naire. For famous entities, we employed their Wikipedia page, which reflects shared declar-
ative knowledge about them. Through whole-scalp time-resolved and searchlight encoding
analyses, we found that we could capture how the brain processes one’s closest people and
places using person-specific answers to questionnaires, as well as famous entities. Overall
encoding performance was significant in a large time window (200-800ms). Using spatio-
temporal EEG searchlight, we found that we could predict brain responses significantly bet-
ter than chance earlier (200-500ms) in bilateral temporo-parietal electrodes and later (500-
700ms) in frontal and posterior central electrodes. We also found that XLM, a contextualized
(or large) language model, provided superior encoding scores when compared with a sim-
pler static language model as word2vec. Overall, these results indicate that language mod-
els can capture subject-specific semantic representations as they are processed in the
human brain, by exploiting small-scale distributional lexical data.
1 Introduction
Being asked to describe one’s closest friend, or one’s favourite neighbourhood, is not an easy
question to answer. One will find that just describing physical and personality traits is not
PLOS ONE
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 1 / 28
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
OPEN ACCESS
Citation: Bruera A, Poesio M (2024) Family lexicon:
Using language models to encode memories of
personally familiar and famous people and places
in the brain. PLoS ONE 19(11): e0291099. https://
doi.org/10.1371/journal.pone.0291099
Editor: Michal Ptaszynski, Kitami Institute of
Technology, JAPAN
Received: August 31, 2023
Accepted: September 15, 2024
Published: November 22, 2024
Peer Review History: PLOS recognizes the
benefits of transparency in the peer review
process; therefore, we enable the publication of
all of the content of peer review and author
responses alongside final, published articles. The
editorial history of this article is available here:
https://doi.org/10.1371/journal.pone.0291099
Copyright: ©2024 Bruera, Poesio. This is an open
access article distributed under the terms of the
Creative Commons Attribution License, which
permits unrestricted use, distribution, and
reproduction in any medium, provided the original
author and source are credited.
Data Availability Statement: Because of privacy
reasons, it is not possible to publicly share the
responses to the questionnaires, nor the subject-
specific computational representations extracted
from them. The parts of the data that could be
enough—to do justice to that specific person or place it will be necessary to bring up much
more. Anecdotes, past events and stories involving them, together with other disparate pieces
of information that just come up to one’s mind when talking about familiar entities. And,
taken together, this will form an extremely idiosyncratic mixture, that however captures the
fundamental uniqueness of that person or place in one’s memory.
In cognitive neuroscience, as a reflection of this, it has been found that knowledge for indi-
vidual entities, such as people and places, is particularly rich and multifaceted [13]. Aside
from its episodic (event-specific) and semantic (encyclopedic) components [46], it seems to
strongly involve another type of knowledge, called ‘personal semantics’ [7]. Personal semantics
can be described as a type of knowledge which is abstracted from individual events and occur-
rences, but not fully so. An example could be knowing what a friend enjoys doing. It is not
entirely encyclopedic, as it is dependent from memories of repeated events that took place in
the past. Nor is it completely episodic, as it does not strictly depend from individual instances
of that event. In this sense, it is part of personal semantics.
Episodic, semantic and personal knowledge compose declarative, or explicit, memory [6,
7]. By definition, declarative memory is knowledge that can be expressed through natural lan-
guage [8]. For generic concepts, such as a cello, declarative knowledge is widely shared across a
community of speakers. Because of this, it can be easily extracted in a data-driven fashion from
large collections of text (called corpora) such as Wikipedia. This is done by creating vectorial
representations, reflecting distributional properties of their corresponding words in the cor-
pora, through language models—also called distributional semantics models. Such
approaches follow the hunch that the semantic content of a word can be captured by the way
in which this word is used [9]. To take a simplified example, an important part of the meaning
of the word ‘leaf’ can be captured by observing that it co-occurs frequently in natural language
with words like ‘tree’, ‘branch’ or ‘flower’, but much more rarely with words such as ‘citizen’,
‘musician’ or ‘factory’. This, in turn, can be operationalized as the so-called distributional
hypothesis—that words having similar meanings will be found in similar linguistic contexts
[10]. The resulting representations are traditionally called word vectors. Word vectors have
been shown to capture and encompass in a single high-dimensional vectorial space multiple
traditional semantic dimensions proposed in cognitive science [1114]. This can explain their
effectiveness at modelling semantic processing, both in behaviour [15,16] and in the brain
[1720].
However, when it comes to personally familiar people or places, two main challenges arise.
First, the extraction of semantic representations for concepts based on distributional informa-
tion requires words to be frequent enough in the corpora in order to robustly capture their
meaning [2125]. Despite their fundamental importance in our lives, personally familiar peo-
ple and places (a close friend, one’s favourite neighbourhood) never—or extremely rarely—get
mentioned in large-scale corpora such as Wikipedia. Therefore, it seems impossible, in princi-
ple, to capture their meaning by way of word vectors.
Secondly, even if one were able to sidestep this issue, a more pervasive one would emerge:
namely, that each person has highly idiosyncratic and subjective ways of perceiving and
describing personally familiar people and places. This makes it hard to capture semantic repre-
sentations from recollections of autobiographic memories expressed in natural language,
which constitute an exceptionally diverse and reduced linguistic dataset [2628].
Such limitations posed by language models have had an impact on studies employing them
as models to capture semantic representations in the brain. As a consequence of the need for
sufficient training data, previous work has focused on generic concepts for whom a representa-
tion could be obtained from large corpora [17,18,29,30]—or, in the case of individual entities,
on famous entities which are mentioned with enough frequency in corpora [31]. The only
PLOS ONE
Encoding personal memories of people and places in the brain using language models
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 2 / 28
published, together with the code, are publicly
available online, on the Open Science Foundation
website (https://osf.io/sjtmn).
Funding: AB was supported by a doctoral
studentship from the School of Electronic
Engineering and Computer Science, Queen Mary
University of London. The funders had no role in
study design, data collection and analysis, decision
to publish, or preparation of the manuscript.
Competing interests: The authors have declared
that no competing interests exist.
partial exception taking into consideration subject-specific semantic knowledge is, to our
knowledge, [32], which, however, focused on personal interpretations of generic concepts (e.g.
personal interpretations of ‘dance’ as an event) and not of individual entities. Previous work in
cognitive neuroscience looking at subject-specific knowledge of individual entities has not
used distributional linguistic information, but rather semantic dimensions defined a priori by
the experimenters [3336].
In this work, we set out to investigate whether we could use short texts containing personal
memories to build unique vectorial semantic representations for personally familiar entities,
such as people and places, that could encode the way in which the brain of each subject pro-
cesses such entities. Our approach follows a framework that has been recently emerging in
neuroscience, aiming at recognizing and effectively accounting for the uniqueness of individu-
als and of their cognitive and neural processes [32,3740].
In our experiment, first we captured person-specific knowledge in the form of text. We did
so by asking subjects to talk about the most important people and places in their lives (eight
people and eight places; see Fig 1). This allowed us to collect textual data regarding semantic
knowledge (physical and personality traits), episodic memory (the most salient memories
involving a person/place) and personal semantics knowledge (operationalized as words and
topics associated with each person/place). We then used language models to encode each sub-
ject’s text, thus creating subject-specific vector representations of personally familiar entities
from small-scale textual distributional information. We also created in a mirrored way vector
representations for eight famous people and eight famous places, as a ‘control’ set of entities,
for which it is known that language models are able to create reliable semantic representations
given their high frequency in corpora [31,41,42].
In parallel, we collected electroencephalography (EEG) data while the same subjects were
reading the entities’ names (both famous and familiar). We ran a set of encoding analyses,
looking at where in time (time-resolved encoding using all electrodes) and in space (spatio-
temporal searchlight encoding, looking at clusters of electrodes separately) our subject-specific
vector representations captured brain processing. Names were used instead of faces in order to
avoid non-semantic, low-level visual differences among different categories of stimuli (people
and places, famous and personally familiar).
We compared two language models for Italian, the language in which the experiment was
carried out, from two different families of language models. The first model is word2vec [43],
a so-called static model. In static models, each word is represented as a single, fixed vector
representation. This captures distributional information encoded in the training data only for
word types (i.e. all aggregated mentions of each word), but not tokens (i.e. actual individual
occurrences in context). Static models are the most commonly used models in brain encoding/
decoding studies that involve individual concepts [18,20,29,30,44]. The second model is
XLM-Roberta-large (XLM, [45]), a so-called contextualized language model, or large lan-
guage model (LLM). LLMs are more recent and much more complex than static models, and
they are based on the Transformer architecture [46]. They do not represent word types (as
static models do), but word tokens: given a linguistic context, such as a sentence, the vectors
for the words are created by adapting pre-trained representations to the specific sentence—
thus incorporating both type- and token- level information. LLMs have been used to encode
or decode to/from the brain the meaning of words in context—i.e. words appearing in passages
of text of various complexity, ranging from phrases [47] to sentences [48] and narratives [49
51]. However, this type of model has never been tested, to our knowledge, as a model for per-
sonal, subject-specific semantic representations of extremely specific entities like people and
places.
PLOS ONE
Encoding personal memories of people and places in the brain using language models
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 3 / 28
The results reported in Figs 3 and 4 indicate that, by encoding personal memories with lan-
guage models, it was possible to create semantic representations of individuals that captured
the way in which they are processed in the brain. Importantly, we were able to create semantic
representations for the most important people and places in each subject’s life using only the
Fig 1. Visualization of the experimental setup. Our stimuli were divided into proper and personally familiar names. For each level of familiarity,
sixteen famous individual entities (eight people and eight places) were present. Famous entities were selected after controlling for familiarity,
imageability and name length. Personally familiar names were obtained by asking each participant to provide the names themselves. We also collected
short amounts of text describing each entity (Wikipedia pages for famous entities, and answers to a questionnaire for personally familiar people and
places). From these, we extracted one semantic vector per entity using a language model. Then we collected the EEG data, and carried out separately for
each participant the encoding analyses.
https://doi.org/10.1371/journal.pone.0291099.g001
PLOS ONE
Encoding personal memories of people and places in the brain using language models
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 4 / 28
reduced amount of text available to us through a questionnaire –i.e. small-scale distributional
lexical information. Furthermore, this approach worked also for a matched set of famous enti-
ties, for which non-personal textual information obtained from Wikipedia was used.
2 Methods
2.1 Stimuli
2.1.1 Famous entities. As one half of the experimental stimuli, we selected eight famous
people and eight famous places to be used for all participants (see Fig 1, right portion). Before
each EEG experimental session took place, we made sure that subjects had neither personally
known any famous person nor visited a famous place among the stimuli. Only two subjects
had visited one of the famous places selected as experimental stimuli. In those cases we substi-
tuted the names with other place names matched for length, familiarity and imageability. All
ratings, including those for the famous places that acted as substitutions, can be found online
together with the code and the data (see below).
We balanced the stimuli in terms of name length, familiarity and imageability, in order to
avoid significant differences across the two semantic categories. The final set of stimuli was
selected from a larger set of 100 stimuli, for which we obtained familiarity and imageability rat-
ings. The subjects for the rating tasks (carried out separately; 33 subjects for familiarity, 30 for
imageability) were native speakers of Italian, and none of them took part to the EEG experi-
ment. Familiarity was defined in the same way as in [52]—a quantification on a scale from 1 to
5 of the number of cumulative encounters with an individual entity, across time and media.
The average familiarity overall was 3.59 (standard deviation 0.44); average familiarity for peo-
ple was 3.75 (standard deviation 0.45), while that for places was 3.44 (standard deviation 0.36),
and their difference across people and places was not statistically significant (non-parametric
two-sample permutation test: t= 1.42, p= 0.15). Imageability was defined after [53] as the ease
or difficulty of arousing a mental image. Imageability was controlled because, during the
experiment, subjects were asked to read names and picture their referents mentally (see Sec-
tion 2.2). We used the most common scale in imageability rating experiments, going from 1 to
7 [54]. The average imageability across all entities was 4.97 (standard deviation 0.85). Average
imageability for people was 5.17 (standard deviation 0.66), while average imageability for
places was 4.78 (standard deviation 0.96). The difference between the two was not statistically
significant (non-parametric two-sample permutation test: t= 0.88, p= 0.37). Name length was
on average overall 13.5 (standard deviation 3.12); average person name length was 14.1 (stan-
dard deviation 1.83), while average place name length was 13 (standard deviation 3.9), and the
difference across categories was again not significant (non-parametric two-sample permuta-
tion test: t= 0.68, p= 0.47). All ratings are provided together with the code for replication (see
below).
For each famous individual entity we also collected the text from their Wikipedia page,
under the assumption that such texts are a source of explicit knowledge regarding individual
entities that can be mapped to the brain using their distributional information—an approach
validated in neuroscience in [55] and in Natural Language Processing (NLP) in [41,56,57].
These texts will be used as described in Section 2.4.1 in order to extract semantic representa-
tions using language models. We also manually annotated for each famous person their occu-
pation (e.g. politician, musician) and for each famous place its type (e.g. city, monument),
since this information was used during the experimental task (see Section 2.2).
2.1.2 Personally familiar entities. Before starting the EEG experiment, we asked partici-
pants to provide the names for eight personally familiar people and eight places (see Fig 1, left
portion). As a framework, we followed research on social circles [58]. For people we focused,
PLOS ONE
Encoding personal memories of people and places in the brain using language models
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 5 / 28
in our definition, on members of the so-called ‘support clique’. This circle consists of people
with whom one has a positive relationship, is in touch regularly, and from whom one would
seek personal advice or help [59]. For places, we tried to match as closely as possible the defini-
tion given for people, since no relevant literature was available. We defined ‘support places’ as
places with whom one has a special, positive relationship, and they are places where one would
(if possible) return to when in a situation of distress. We provide as S1 File the text of the spe-
cific instructions given to the subjects, translated to English (see S1 File).
Additionally, subjects were asked to respond to a questionnaire, whose aim was capturing
the main components of declarative knowledge about personally familiar entities—i.e. what is
explicitly known about them. We looked at the two components most traditionally associated
with explicit knowledge (semantic memory and episodic memory, which differ by being
respectively dependent or not from specific events [4,6]), as well as at what has been called
personal semantics [7], which is a highly personal type of knowledge, lying at the intersection
of episodic and semantic memory (see Introduction). The questionnaire therefore involved
nine questions, divided equally among the three types of knowledge. For semantic memory,
we asked to talk about the type of relationship being shared, and physical and personality traits;
for episodic memory, we asked to talk about how each entity was first encountered, as well as
two most recent salient autobiographical episodes involving that entity; for personal semantics,
we asked to name up to 10 words that came to mind when thinking about that entity, what
one would talk about (for people) or do (for places) if they met or visited that entity, and finally
a sentence that is associated with a person/place. Notice that participants were not only asked
to provide names and answers to the questionnaires, but also to provide either the person’s
occupation or the type of place (i.e. monument, city, river, etc.), to be used during the experi-
mental paradigm (see Section 2.2).
We recorded the answers using a Zoom H2 stereo digital audio recorder, and we automati-
cally transcribed the texts using Microsoft Azure’s Speech-To-Text service. Before extracting
the semantic representations from the texts using the language models (see Section 2.4), we
checked the automatic transcriptions in order to verify that quality was sufficiently good,
which we found to be the case.
In the case of personally familiar names, we could not control in advance for name length,
since participants came up with the names. Therefore, we implemented in the analyses a pro-
cedure for explicitly removing all variance associated with name length from the EEG data,
described below in Section 2.5.1.
Notice that, because of privacy reasons, it is not possible to publicly share the responses to
the questionnaires, nor the subject-specific computational representations extracted from
them. The parts of the data that could be published, together with the code, the ratings and all
results and plots, are publicly available online, on the Open Science Foundation website
(https://osf.io/sjtmn/). Also, notice that the same dataset has been used for a different study,
focused on decoding semantic categories [60].
2.2 Experimental paradigm
Thirty-tree right-handed subjects (age from 20 to 31 years old, with 21 female participants)
took part to the experiment. Sample size was determined following [61], where the authors
show thirty-two subjects is an adequate sample size for event-related potentials (ERP) studies
involving semantic processes. As the experiment was conducted in Italian, all the subjects were
native Italian speakers. All experimental procedures were approved by the Ethical Committee
of SISSA, Trieste, where the data were collected. The recruitment of subjects and data
PLOS ONE
Encoding personal memories of people and places in the brain using language models
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 6 / 28
collection took place concurrently between June and September 2021. Before the experiment
took place, subjects gave their written informed consent.
Before the EEG experiment subjects provided names for personally familiar stimuli, as well
their occupations or types of places. Then, participants took part to 24 experimental EEG runs.
Each name would appear once per run, in randomized order. For each name we thus recorded
twenty-four ERPs, which were averaged after preprocessing and before entering the analyses.
This is routinely done in encoding/decoding studies for EEG in order to improve the signal-
to-noise ratio [62].
Each trial proceeded as follows. First, a fixation cross appeared for 500 ms. Then a name
appeared on screen for 500 milliseconds, followed by a fixation cross lasting 1 second. Subjects
were instructed to read the name and mentally picture its referent. This was done because,
even if the mental imagery task may bias participants to focus on visual semantic features, it
has been shown to elicit solid semantic processing and to provide good performance in encod-
ing/decoding [18,31,63]. After the fixation cross disappeared, a binary yes/no question
appeared on screen. The question was added exclusively to ensure participants actually
engaged in the task. To avoid strategic preparation, questions were randomized among two
templates. They could reflect either a coarse-category question (e.g. ‘is it a person/place?’), or a
fine-grained question (e.g. ‘does the name refer to a student?’ or ‘does the name refer to a
city?’), a methodology previously used in [31,64]. Questions were balanced between yes/no
answers, and subjects could answer using two keys. In the case of fine-grained questions, also
the occupation or place type was randomized.
2.3 EEG data collection and preprocessing
The EEG data was collected using a BIOSEMI ActiveTwo system with 128 channels, recording
unfiltered signals at a sampling rate of 2048 hz. We also collected signals from two electro-ocu-
logram channels (EOG) so as to be able to use them later for artifact rejection with Indipen-
dent Component Analysis (ICA; details below). For preprocessing, we adapted an automated
procedure using MNE (originally standing for Minimum Norm Estimate) [65], previously val-
idated in [66].
We set the montage to the default for the 128-channel BIOSEMI system. A visualization of
the montage, together with the codes for the channels, is included in the publicly available
code and can be retrieved online (https://www.biosemi.com/pics/cap_128_layout_medium.
jpg). Then, following [66], we used the standard ICA-based ocular artifact removal imple-
mented in MNE. We applied a low-pass filter to the ERP data to 80 hz [67]. Then we epoched
the data and subsampled it to 200hz, and removed the independent components correlating
the most with eye-movement artifacts [66]. Baseline correction consisted of the average signal
between 100 and 0ms before the appearance of a stimulus. We used the autoreject algorithm
[68] to interpolate bad channels, and we set the reference to the grand average. We then aver-
aged, separately for each subject, all the available trials for each entity, in order to improve the
signal-to-noise ratio in the evoked responses [62]. This left us with one evoked response per
entity, to be used for the analyses.
The final preprocessing step was removing all the variance associated with word length
from the EEG signal using cross-validated confound regression, which was validated in [69,
70].
2.4 Models
2.4.1 Language models. In order to create semantic representations for famous and per-
sonally familiar entities, we employed the texts described in Section 2.1 as inputs for two types
PLOS ONE
Encoding personal memories of people and places in the brain using language models
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 7 / 28
of language models, one static (word2vec) and one contextualized (XLM-Roberta-large). The
rationale for this procedure is that words appearing in a text revolving around an entity carry a
rich bundle of semantic information with respect to that entity. In other words, we interpret
the texts we collected as entity-specific distributional lexical information: for personally famil-
iar people and places, the answers to the questionnaires; for their famous counterparts, the text
from their Wikipedia pages [41,55,56].
The procedure was matched across models, so as to avoid methodological confounds. For
each individual entity, we split the text in passages of at least 20 words. We set a lower passage
length threshold because, when encoding sentences for downstream use, LLMs have been
shown to work better with rather long passages [71]. In order to ensure that text portions were
long enough to work well with XLM, we rearranged the text so that sequential passages of at
least 20 words were created (i.e. if a sentence were shorter than 20 words, we considered to be
part of the same passage as the following sentence). We chose 20 words as a threshold since in
English sentence lengths are most commonly is between 10 and 30 words [72].
After having encoded all passages of text using the language models, we retained only the
vectors in the sequences corresponding to content words (i.e. open-class words: nouns, verbs,
adjectives, adverbs) from the corresponding descriptive text (Wikipedia or questionnaire). We
decided to follow [44] and exclude closed-class words, such as function words, since they do
not carry semantic information. We reasoned that their presence would lead to the static lan-
guage model being disadvantaged, since it has been shown that they struggle at representing
function words properly [43,73,74].
Finally, we obtained a single entity representation in two steps. First, we averaged all the
word vectors retained for each passage, thus obtaining one vector per passage. Then, we aver-
aged all of the vectors for the individual passages of texts [32,55,75].
Therefore, at the end of the procedure we had one vector representation for each individual
entity per model, capturing the distributional lexical information contained in our small-scale
textual data.
We will now briefly describe the language models used. Word2vec is a feedforward neural
network which learns vector representations (word vectors) from large-scale corpora [43].
Therefore, one single vector representation is created for each word type, regardless of homon-
ymy and polysemy phenomena (e.g. the word ‘bat’ is modelled as a single vector, collapsing
the two senses of animal and baseball instrument in a single representation). Such vector rep-
resentations have been interpreted as models of cognitive semantic representations [11,23,76]
and have been shown to capture well lexical processing [15,16,55]. We pre-trained a word2-
vec model on the Italian version of Wikipedia (whose size was approximately 12 gigabytes).
We used parameters suggested in the literature [15,77]: skip-gram training regime, a vocabu-
lary of 250000 words, a window size of 10 words and a vector size of 300 dimensions.
As a second model we used a contextualized language model [75], also often called a large
language model (LLM). A contextualized language model is a deep neural network based on
the Transformer architecture [46]. Its main difference with respect to a static language model
is that, by design, it is aimed at the representation of words in context—i.e. sentences, para-
graphs or longer passages. In LLMs, there are no ‘static’ word representations: in contrast, rep-
resentations are adapted to each linguistic context used as input. This allows to capture fine-
grained shifts in meaning, at both lexical and supra-lexical (e.g. discourse, dialogue) levels.
Notice, however, that this is achieved at the cost of an extremely more sophisticated neural net-
work architecture and costly training procedure [78]. Since publicly available LLMs for lan-
guages other than English are not on a par in terms of quality with their English counterpart
[79], and our questionnaires were collected in Italian, we used a state-of-the-art multi-lan-
guage model, XLM-Roberta-large [45]. Multi-language models are not specialized for
PLOS ONE
Encoding personal memories of people and places in the brain using language models
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 8 / 28
individual languages, however they can exploit transfer of cross-lingual semantic information
during pre-training, a feature that can make them surprisingly effective [80,81]. We chose
XLM-Roberta-large because it is a widely used multilingual model with excellent performance
both on cognitive [82] and NLP [83] tasks. Importantly, Italian was part of the languages con-
tained in the pre-training corpus, thus making it a solid choice to encode text in Italian. It has
24 layers, 560 million parameters, and its vector dimensionality is 1024. It was trained on 2.5
terabytes of text covering 100 languages, including Italian, from a filtered CommonCrawl cor-
pus [84]. We used the pre-trained model provided by Huggingface [85].
Since LLMs are designed to take as input natural language sentences, we used XLM to
encode full sentences—but we averaged only the vectors corresponding to content words (see
Section 2.4.1). Following previous work, we modelled entity representations using so-called
(sentence) representation pooling for the top four layers [41,75]—which consists of averaging
the top four layers for each of the chosen words.
2.4.2 Non-semantic baseline models. We also chose to report time-resolved encoding
scores for two baseline, non-semantic models: name length and orthographic distance. We did
so in order to show that the evoked responses did not contain signal that could be explained by
such low-level variables. In the first model, we represented each individual entity by the length
of their name. In the second one, we leveraged the Levenshtein distance, a popular measure of
orthographic distance representing the number of letter substitutions required to transform
one string into another [86]. We thus represented the orthographic properties of each name in
a single value, as the average Levenshtein distance between the entity’s name and all other
names in the set of stimuli.
2.5 Encoding
2.5.1 Representational Similarity Analysis encoding. For encoding, we used the
approach proposed and validated in [87], which is based on Representation Similarity Analysis
(RSA). We illustrate visually the RSA encoding procedure in Fig 2. The main advantage
afforded by this methodology is that it ensures excellent performance without the need of fit-
ting a model—which would be a concern given the small size of our dataset. RSA encoding is
conceptually based on standard RSA [88], which we will here summarize shortly. Given a set
of stimuli and a model that represents them in any quantitative form (numbers, vectors, etc.),
the similarity between the brain responses to a given stimulus and its model representation
can be measured in two steps. The first step is looking at how similar they are to all other repre-
sentations in their own representational space (in the brain and in the model, respectively; the
so-called first-order similarity). The second step consists of measuring directly the similarity
between the two vectors of the pairwise similarities through the so-called second-order simi-
larity. This approach has the advantage of providing a straightforward way to compare the rep-
resentational structure of brains and models (via second-order similarity), something that
would be otherwise difficult, given their difference in dimensionality.
In RSA encoding, as in most multivariate (encoding/decoding) approaches, the data is split
iteratively in a train set and a test set [62,89]. For each train/test split, the model predicts
evoked responses for the test stimuli, which are compared with the real responses. A similarity
metric (i.e. a correlation or distance measure) is used to evaluate how well the model captures
brain activity. This procedure is repeated for all train/test split and correlations are then aver-
aged, so as to provide an unbiased summary measurement [90].
As a similarity metric we use Spearman correlation between predicted and real evoked
responses. We use Spearman correlation because it is nonparametric, thus making minimal
assumptions about the relation holding between the model predictions and brain data; for this
PLOS ONE
Encoding personal memories of people and places in the brain using language models
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 9 / 28
reason it is the recommended metric for comparing model predictions and brain data (i.e. sec-
ond-order similarity) with RSA [88,91].
Prediction is carried out in the following way. The evoked response to a given item from
the train set is predicted as a weighted sum of the evoked responses to the stimuli in the train-
ing set. Following the original implementation, the weights to be used for the sum are the pair-
wise Pearson correlations between the test item and all of the training items in the model—in
our case, a language model. Take for instance a toy training set composed of three individuals,
for which both model representations ~
a¼Ana,~
b¼Milo,~
c¼Pati, and their corresponding
evoked responses a
brain
,b
brain
,c
brain
are available. Given a test item d=Nico, its evoked
response d
brain
is predicted to be dbrain ¼abrainr~
a;~
dþbbrainr~
b;~
dþcbrainr~
c;~
d, where ris the oper-
ation of Pearson correlation.
2.5.2 Evaluation. For the evaluation we followed standard practice in encoding/decoding
brain studies [62,90]. We carried out analyses separately for each individual subject (i.e. within
subjects), and we then averaged results across subjects. Since for each subject we had limited
data, we used a leave-two-out training/testing regime. This entails running iterated training/
testing procedures, using each time as a test set one of all possible pairs of stimuli. At each iter-
ation we recorded the Spearman correlation between each predicted test item and its true
counterpart. The final correlation score was given by the average of all correlations collected
by the leave-two-out train/test iterations.
2.5.3 Confound control. We controlled for name length for personally familiar people
and places, since they could not be controlled a priori during stimulus selection, as was instead
the case for famous names. To this aim, we used a cross-validated confound regression proce-
dure that was previously validated [70] and used with this dataset in a category decoding set-
ting [60]. For each train-test split, we fitted a linear regression model from the confound
variable to the brain data within the train set. Then, only the residuals of the regression (for
both the training and the test set) entered the analyses—effectively removing from the brain
data the variance that could be explained by the confound variable. We use the original python
implementation provided by the authors (https://skbold.readthedocs.io/en/latest/).
Fig 2. Visualization of the RSA encoding procedure. We visualize a toy example, where the whole dataset is composed of three exemplars. The input
(part A) are similarities (e.g. Pearson correlations) in the model between entity representations. The targets (part B) are the corresponding EEG
reponses for the same entities. Prediction happens in part C. For each entity (the test set, e.g. entity 1) its predicted EEG response corresponds to the the
sum of the real evoked responses for the other entities (the train set, e.g. entity 2 and entity 3), where each response from the train set is weighted by its
similarity in the model with the entity from the test set. Encoding performance is evaluated by computing the correlation between the predicted and the
real response.
https://doi.org/10.1371/journal.pone.0291099.g002
PLOS ONE
Encoding personal memories of people and places in the brain using language models
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 10 / 28
2.5.4 Time-resolved encoding. In our time-resolved analyses, we ran the encoding analy-
sis separately for every time point [62] using all of the electrodes as target for the prediction.
This whole-scalp approach allows to look primarily at how distributed patterns of evoked
activity develop across time. By measuring the correlation between the real brain signal and
the one predicted by the model, it is possible to understand when a model captures informa-
tion as it is processed in the brain. Time points where correlations are above chance with statis-
tical significance (using the TFCE method described in Section 2.5.6) indicate reliable
encoding of such information.
2.5.5 Spatio-temporal searchlight. We were also interested in going beyond patterns of
activity widely distributed across the whole scalp, and look at specific areas on the scalp where
a model could explain evoked responses. To this aim, we implemented a searchlight encoding
analysis. Searchlight allows to find in a bottom-up fashion localized clusters of brain activity
associated with an experimental condition, while exploiting the high sensitivity of multivariate
analyses [92,93]. In practice, searchlight consists of running the encoding analyses repeatedly
across smaller clusters of electrodes on the scalp.
To reduce the computational effort, we followed previous work [94] and used spatio-tem-
poral clusters, where multiple time points are considered for each electrode within the cluster.
As in [95], we employed a temporal radius of 50ms and a spatial radius of 30mm (i.e. a cluster
contains evoked activity for 100ms, for electrodes falling within a circle having a diameter of
60mm). We computed statistical significance tests using the TFCE method described in Sec-
tion 2.5.6. If the p-value for a cluster of electrodes at a given point in time fell below 0.05, we
considered encoding to be significantly above chance.
2.5.6 Statistical testing and corrections for multiple comparisons. We ran one-tailed
permutation statistical tests with the Threshold-Free Cluster Enhancement (TFCE) procedure,
as implemented in MNE. This approach has two advantages. First of all, it is non-parametric,
thus making minimal assumptions about the underlying data distribution. Secondly, it also
controls by design for multiple comparisons, thus countering the risk of false positive due to
the high number of statistical tests [62,96,97]. We tested the hypothesis that correlations
between real and predicted scores were reliably above chance (correlation
chance
= 0.) across
subjects, separately for each time point (time-resolved analysis) or spatio-temporal cluster
(searchlight) within a time window between 0 and 800ms [64]. We set the significance thresh-
old at 0.05, following conventional practice in the field [98].
3 Results
3.1 Comparing language models
In Fig 3 we report the overall encoding results for the time-resolved and searchlight analyses,
respectively using all electrodes and localized clusters of electrodes. Encoding performance
was measured as Spearman correlation among the predicted and the real evoked responses;
the threshold used for statistical significance was p<0.05 after TFCE correction for multiple
comparisons. We also report the results of a direct comparison between XLM and word2vec,
with the aim of finding out if, where and when the LLM can significantly outperform the static
language model. In the following we will focus on data points reaching statistical significance;
Tand pvalues for all models and data points can be found together with the code and the data.
In the time-resolved analysis, for XLM, the contextualized language model, correlation
scores were significantly above zero in a wide time window, from 250 to 800 ms (peak at
442.5ms, T= 6.318, p<0.001). For the static language model, by contrast, the time window
where the predicted responses correlated significantly with the real evoked potentials was
PLOS ONE
Encoding personal memories of people and places in the brain using language models
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 11 / 28
Fig 3. Overall encoding performance for XLM and word2vec (w2v). In the upper part of the plot we report time-resolved results obtained using all
electrodes. The lower part contains searchlight results, which reflect encoding performance for smaller spatio-temporal clusters. In the final row we also
report the results of the direct comparison between XLM and word2vec (i.e. the difference between XLM and word2vec). The evaluation metric used
was Spearman correlation (on the y-axis for time-resolved analysis, color-coded for searchlight). Time (time-resolved) and space-time (searchlight)
points where encoding (or, for the XLM >word2vec comparison, the difference among the two models) is significantly above chance (p<0.05 after
TFCE correction) are marked by thicker dots either at the bottom of the plot (time-resolved) or on scalp locations (searchlight). Results reported here
are averaged across all entity types.
https://doi.org/10.1371/journal.pone.0291099.g003
PLOS ONE
Encoding personal memories of people and places in the brain using language models
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 12 / 28
shorter, between 300 and 700ms (peak at 482.5ms, T= 5.047, p<0.001). Also, across all time
points, correlation was consistently higher for XLM when compared to word2vec.
Regarding the two baseline non-semantic models (name length and orthography), correla-
tions in the time-resolved analyses were never significantly above chance (peak for name
length: 572.5ms, T= 1.449, p= 0.145; peak for orthography: 432.5ms, T= 1.086, p= 0.403).
This confirms that such confounds were successfully removed from the signal through the
confound removal procedure.
In searchlight, the predictions from XLM showed significant correlations from 200 to
800ms. Clusters could initially (200–300ms) be found in both hemispheres, in temporo-parie-
tal electrodes (left peak: D22, T= 14.307, p<0.001; right peak: B17, T= 14.005, p<0.001).
Encoding with XLM kept providing significant scores in bilateral electrodes up until 700ms
(left peak: D20 between 500–600ms, T= 13.106, p<0.001; right peak: B11 between 500–
600ms, T= 15.846, p<0.001). After that time point, significant clusters were right lateralized,
again in temporo-parietal regions (peak: B17 between 700–800ms, T= 12.403, p<0.001). Cor-
relations in central regions were significant in smaller time windows—frontally, between 300
and 600ms (peak: C28 between 500–600ms, T= 17.452, p<0.001); posteriorly, between 400
and 700ms (peak: A30 between 600–700ms, T= 16.162, p<0.001).
For searchlight encoding with word2vec, clusters where correlation was significantly above
zero emerged from 200–300ms (overall peak: C21 between 500–600ms, T= 11.544, p<0.001).
Significant clusters for the static language model followed a rather similar spatial and temporal
distribution with respect to XLM: bilateral temporo-parietal electrodes between 200 and
800ms (left peak: D24 between 300–400ms, T= 10.745, p=<0.001; right peak: B16 between
400–500ms, T= 11.241, p<0.001); fronto-central electrodes between 300 and 600ms (peak:
C21 between 500–600ms, T= 10.544, p<0.001); posterior-central electrodes between 400 and
700ms (peak: A26 between 700–800ms, T= 10.003, p= 0.002).
Aside from such general trends of similarity, significant differences were revealed among
the two models by the direct comparisons reported in the final row of Fig 3. XLM performed
significantly better in a very large amount of clusters between 400–600ms (peak difference:
B18 between 400–500ms, T= 12.655, p<0.001). Also, the encoding scores of XLM were sig-
nificantly higher in central frontal and posterior electrodes between 200–300ms (frontal peak:
C14, T= 10.157, p= 0.002; posterior peak: A23, T= 8.405, p= 0.0146). However, between
300–400ms and 700–800ms the difference showed a much more restrained spatial distribu-
tion. This converges with the time-resolved results (see upper portion of Fig 3), where it can be
seen that the difference in performance in those time ranges was smaller than in other time
points. Between 300–400ms, significant differences were mostly localized around the vertex
and the right hemisphere (peak: C1, T= 9.464, p= 0.002). Between 700–800ms, by contrast,
the largest difference was found in a right temporo-parietal cluster (B20, T= 9.424, p= 0.003).
3.2 Encoding scores for specific types of entities
In Fig 4 we detail how the best-performing model, XLM, captures semantic information for
each specific type of individual entity (personally familiar places, famous places, personally
familiar people, famous people). In Fig 5 we report the equivalent set of results for word2vec—
thus allowing to better understand what categories drive the differences in performance
among the models. The goal of this analysis was understanding whether our small-scale distri-
butional information, as encoded by language models, worked equally well for all types of enti-
ties, despite their obvious differences.
To report scores separately for each type of entity, we repeated the encoding analyses using
as test items only the stimuli belonging to the relevant category. The rationale is that
PLOS ONE
Encoding personal memories of people and places in the brain using language models
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 13 / 28
Fig 4. Detailed evaluation of encoding performance for XLM for all categories and levels of familiarity. In the upper part of the plot we report time-
resolved results obtained using all electrodes. The lower part contains searchlight results, which reflect encoding performance for smaller spatio-
temporal clusters. The evaluation metric used was Spearman correlation (on the y-axis for time-resolved analysis, color-coded for searchlight). Time
(time-resolved) and space-time (searchlight) points where encoding is significantly above chance (p<0.05 after TFCE correction) are marked by
thicker dots either at the bottom of the plot (time-resolved) or on scalp locations (searchlight). Results reported here refer to XLM only, and are
separated for each entity type.
https://doi.org/10.1371/journal.pone.0291099.g004
PLOS ONE
Encoding personal memories of people and places in the brain using language models
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 14 / 28
Fig 5. Detailed evaluation of encoding performance for word2vec for all categories andlevels of familiarity. In the upper part of the plot we report
time-resolved results obtained using all electrodes. The lower part contains searchlight results, which reflect encoding performance for smaller spatio-
temporal clusters. The evaluation metric used was Spearman correlation (on the y-axis for time-resolved analysis, color-coded for searchlight). Time
(time-resolved) and space-time (searchlight) points where encoding is significantly above chance (p<0.05 after TFCE correction) are marked by
thicker dots either at the bottom of the plot (time-resolved) or on scalp locations (searchlight). Results reported here refer to word2vec only, and are
separated for each entity type.
https://doi.org/10.1371/journal.pone.0291099.g005
PLOS ONE
Encoding personal memories of people and places in the brain using language models
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 15 / 28
performance on a test item is a quantification of the extent with which a model is able to cap-
ture the specific phenomenon of which the test item is an instance [31,47,99103].
Both time-resolved and searchlight results indicate that personally familiar people were the
type of entities best captured by XLM, and personally familiar places the worst. For personally
familiar people, correlation was significantly above zero between 200 and 800ms in a large
bilateral set of electrodes, concentrated in temporo-parietal and fronto-central areas (tem-
poro-parietal peaks: for the right hemisphere, B17 between 400–500ms, T= 16.511, p<0.001;
for the left hemisphere, A8 between 500–600ms, T= 16.061, p<0.001; fronto-central peak:
C26 between 500–600ms, T= 16.612, p<0.001). Interestingly, right-hemisphere temporo-
parietal electrodes provided the highest and most consistent encoding scores overall (as
reported above, around B17). For personally familiar places the only time point in the time-
resolved analysis where statistical significance was reached was 417.5ms (T= 1.982, p= 0.04).
However, searchlight revealed several clusters where encoding was reliably above chance. Such
clusters were first aligned over a bilateral temporo-parietal axis (left peak: A8 between 500–
600ms, T= 7.711, p= 0.0263; right peak: B25 between 400–500ms, T= 7.54, p= 0.03), and
later over a central frontal-to-posterior axis (frontal peak: C17 between 500–600ms, T= 7.26,
p= 0.047; posterior peak: A24 between 500–600ms, T= 7.572, p<0.029).
Famous entities showed a less dramatic difference across categories: in the time-resolved
setting, evoked responses for famous people could be encoded significantly better than chance
between 350 and 800ms (peak at 557.5ms, T= 4.384, p<0.001), whereas famous places did so
in the 350–650ms range (peak at 437.5ms, T= 4.482, p<0.001).
Searchlight encoding for XLM revealed both differences (before 400ms and after 500ms)
and similarities (between 300–400ms) in spatial patterns across famous people and places. For
famous places, clusters where encoding was significantly better than chance appeared mostly
in temporo-parietal bilateral areas (left peak: D22 between 400–500ms, T= 9.307, p= 0.005;
right peak: B17 between 400–500ms, T= 10.474, p= 0.002). The involvement of frontal elec-
trodes was minor, between 400 and 600ms (peak around C28 between 500–600ms, T= 9.486,
p= 0.004). Finally, after 600ms, a clear right lateralization was found (peak around B16
between 600–700ms, T= 9.737, p= 0.001). For famous people, if until 500ms encoding perfor-
mance with XLM was statistically significant in bilateral temporo-parietal clusters (left peak:
D26 between 400–500ms, T= 13.423, p<0.001; right peak: B24 between 400–500ms,
T= 14.544, p<0.001), later it expanded to central frontal and posterior electrodes (frontal
peak: C14 between 500–600ms, T= 14.073, p<0.001; posterior peak: A15 between 500–
600ms, T= 13.355, p<0.001), similarly to what happened for personally familiar people.
The results obtained for word2vec with the same approach (Fig 5) were quite different,
revealing which types of entities determined its overall lower performance (see Fig 3). In the
time-resolved setting, encoding performance was above chance only for personally familiar
people (peak at 467.5ms, T= 7.014, p<0.001) and famous places (peak at 417.5ms, T= 2.405,
p= 0.007), and approached significance for personally familiar places (peak at 437.5ms,
T= 1.663, p= 0.095). By contrast, word2vec did not seem able to explain evoked responses for
famous people (peak: 747ms, T= 0.908, p= 0.568). When looking at individual clusters using
searchlight, patterns were similar to XLM for personally familiar entities, but different for
famous entities. In particular, for the former, the encoding performance of word2vec was sig-
nificant in a set of clusters largely correspoding to those emerging from the matched analyses
with XLM. In the case of personally familiar people, this consisted of temporo-parietal and
frontal electrodes (left peak: D23 between 400–500ms, T= 15.568, p<0.001; right peak: B17
between 400–500ms, T= 16.521, p<0.001; frontal peak: C20 between 500–600ms, T= 16.655,
p<0.001); for personally familiar places, bilateral temporo-parietal clusters first (left peak: A8
between 300–400ms, T= 8.032, p= 0.015; right peak: B26 between 400–500ms, T= 7.698,
PLOS ONE
Encoding personal memories of people and places in the brain using language models
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 16 / 28
p= 0.027), and centro-posterior later (peak: A14 between 600–700ms, T= 7.413, p= 0.033).
For famous entities, performance was never above chance, for any cluster (peak for famous
people: B7 between 700–800ms, T= 4.138, p= 0.541; peak for famous places: A5 between 300–
400ms, T= 3.01, p= 0.801).
This indicates that word2vec interacted differently, depending on the category, with distrib-
uted and localized patterns of evoked activity: while in the case of famous places distributed
activity was crucial in reaching significance (i.e. encoding was significant only in the time-
resolved, but not in the searchlight analysis), for personally familiar places the opposite was
true.
4 Discussion
4.1 Personal memories as a window into semantic processing of individual
entities
The main contribution of this work is showing that personal memories revolving around a
subject’s closest people and places, collected from participants through a questionnaire, can be
used to create semantic representations with language models that capture how the brain rep-
resents those very same individuals.
Our approach was motivated by the hypothesis that the way in which we talk about the
world reflects our internal cognitive states [104]—and, more specifically for semantics, the idi-
osyncratic perspective that shapes the way in which we see the world and represent concepts
[37]. Such hypothesis is supported by previous results showing that textual analyses of a speak-
er’s utterances can be used to uncover a speaker’s unique perspective on concepts [32,105],
their emotions [106,107], their personality traits and mental health [108113].
Time-wise, encoding scores were statistically significant in the 200–800ms range, with
peaks at around 400ms and 700ms, which is where traditionally semantic and memory
retrieval processing has been found in EEG studies—specifically, N400 [114] and late posterior
complex, LPC [115].
Spatial patterns of encoding emerging from the searchlight analysis highlight the role, over-
all, of temporo-parietal bilateral regions of the scalp in the 400–500ms range. Such electrode
clusters are situated roughly above portions of the cortex which have been shown to be crucial
for the processing of individual entities by a vast literature from neuroimaging (summarized
in [116]). This provides new evidence that markers of processing of proper names on the scalp
seem to follow a similar trajectory as cortical activity (Fig 3, lower portion; see also [64] for
similar decoding maps).
This consistent pattern of results therefore confirms our hypothesis. The distributional lexi-
cal information contained in the questionnaire answers, describing the three main compo-
nents of memories for individual entities—episodic and semantic knowledge, as well as
personal semantics—can be used to encode how the brain represents those individual entities.
4.2 Using language models to encode personally familiar entities
We were able to encode responses to individual entities by exploiting, through language mod-
els, the distributional lexical information contained in the questionnaire answers. Our results
thus indicate that the semantic representations obtained from language models are rich, multi-
faceted models of brain processing (cfr. [11,117]) and that, following an appropriate method-
ology, they can be combined to create vectors for entities that do not appear in their pre-train-
ing data [27,32]. The questionnaire was composed of nine questions only, to which
PLOS ONE
Encoding personal memories of people and places in the brain using language models
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 17 / 28
participants had to answer to with open-ended natural sentences—which by NLP standards is
extremely small [24].
This result is all the more surprising given not just the small size of the textual data, but also
the fact that participants produced answers freely, without having to stick to categories or
dimensions defined a priori by the experimenter as was the case in previous neuroimaging
studies [3335].
A key role in making the most of such reduced textual data was played by a dedicated vector
creation methodology (see Section 2.4.1). We created original semantic representations for
personally familiar individual entities from the unconstrained questionnaire answers, building
upon the previously-acquired distributional lexical knowledge contained in language models.
In other words, through language models we could model semantic knowledge of people and
places as idiosyncratic combinations of pieces of generic knowledge (the pre-trained word vec-
tors), shaped and guided by autobiographic experiences (the person-specific words contained
in the answers to the questionnaires).
In this respect, XLM, the contextualized language model, showed consistently better perfor-
mance, when compared to word2vec, a static language model, at capturing semantic process-
ing in the brain (Fig 3). Notice that we matched the vector extraction methodology (Section
2.4.1), thus levelling as much as possible the differences between the two models. Two remain-
ing factors differentiate static and contextual models. The first one is related to their size, both
in terms of the amount of distributional information used during training (2.5 terabytes for
XLM, 12 gigabytes for word2vec), and of their number of parameters (560 million parameters
for XLM, 75 millions for word2vec—computed as in [118] as the multiplication of the number
of vector dimensions by the vocabulary size; see Section 2.4.1). The second one is that contex-
tualized models are able to adapt their vector representations to specific linguistic contexts,
capturing fine-grained contextual shifts in meaning, whereas static models cannot [75]. These
two features, therefore, seem to be important in order to capture the semantic information
revolving around an individual entity, which is an extremely idiosyncratic and unique mix of
features and memories [1,2]. Notice that, looking the current literature, it is debated which of
these factors—and why—could play a bigger role when it comes to predicting cognitive pro-
cesses (for evidence in opposite directions, see [119122]); more work will be required to dis-
entangle the two. Nevertheless, this evidence dovetails with previous results in the literature
indicating that LLMs can improve over static models when predicting generic concepts and
famous individual entities [31,48,123,124].
Finally, it should be underlined that, despite its overall inferiority, a static model like word2-
vec could nevertheless reach statistically significant encoding performance, both overall and
for some individual types of entities (Figs 3and 5). This can provide some reassurance over
the ability of simpler, lighter static models to efficiently deal with distributional semantics
information [125], even as fine-grained as personal memories—an important point for future
studies involving low-resource settings and languages [126].
4.3 Individual entities
Detailed patterns of encoding performance can be compared in terms of scores and spatial
location (Fig 4) across semantic categories (people and places) and levels of familiarity (famous
and personally familiar). This allows to obtain insights with respect to the way in which the
brain represents individual entities.
First of all, it is important to notice that, as experimental stimuli, we used names, instead of
images—which is by contrast the most common choice when looking at concepts in the brain
[63], possibly because they afford higher encoding/decoding performance [18,64,127,128].
PLOS ONE
Encoding personal memories of people and places in the brain using language models
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 18 / 28
The reason why we chose to use names in our experiment is that this would allow us to elicit
semantic processing of people and places in the brain not biased towards any sensory modal-
ity. For instance, had we used images as stimuli, we would have had two types of non-semantic,
visual confounds. First, images clearly differ in terms of visual properties across people and
places and therefore evoke strongly distinct responses [129,130]. Secondly, using a picture
elicits brain activity associated with that specific instance of the picture and its low-level visual
features [131,132].
Overall, we found that the semantics of people, both personally familiar and famous, could
be captured by XLM, a large language model, using distributional lexical information repre-
senting declarative memories (in its three components—semantic, episodic and personal)—
despite the uniqueness of the semantic information for each individual [13,116]. The perfor-
mance of word2vec, a static language model, was worse than XLM. In particular, it seemed to
interact more strongly with the type of entity to be represented, being consistently lower for
famous entities.
Turning to places, a more mixed set of results emerges. For XLM, a particularly striking dif-
ference in terms of encoding performance was found between personally familiar people and
places (Fig 4): performance was barely above chance for places, while it was significant in a
large time window for people. Interestingly, the patterns of encoding were quite similar for
personally familiar people and places also for word2vec (Fig 5).
At least two factors may be hypothesized to be at work. First of all, it is possible that the
questionnaire could not capture place-specific types of information—for instance, sensory and
modality-specific features—that are crucial when it comes to cognitive processing of familiar
places. Secondly, as discussed in [2,60], the identity of personally familiar places may be
harder to process than that of personally familiar people—for various reasons that could be
evolutionary [133], social [134], or related to the availability of semantic features during
retrieval [135]. This would then make it harder to correctly distinguish place-specific, as
opposed to person-specific, signatures in brain activity—a result that converges with the over-
all lower encoding scores for places reported here (Fig 5) and that has previously been found
in the literature [60,135,136]. While we are unable to provide an answer to such questions
with the current experiment, we believe this could be a fruitful direction for future research.
Furthermore, searchlight encoding, which provides a window on focal activity on the scalp,
highlights both commonalities and differences for the various types of individual entities.
When looking at the commonalities, a core set of spatio-temporal clusters shared across
models, categories and levels of familiarity emerges, indicating general correlates of semantic
processing of individual entities. When using XLM, the best-performing language model,
encoding was significant for all type of individual entities in the range between 300 and 700ms,
first in temporo-parietal bilateral electrodes (300–500ms), then fronto- and posterior- central
electrodes (500–700ms). Similar patterns were found for word2vec when aggregating across all
types of entities. Temporo-parietal bilateral clusters are typically associated with general
semantic processing in the N400 time range [63,114].
By contrast, the importance of frontal and central electrodes could be explained as an
involvement of both Default Mode Network areas, which are activated by episodic memory
and social processing [137140], and of posterior visual areas relevant for processing mental
imagery [141,142].
Turning to the differences, the earliest time range where correlations were significant (200–
300ms) revealed an interesting pattern: a stronger presence of information related to people as
opposed to places in the fronto-temporal right hemisphere (Fig 4, lower portion). This can be
connected to results coming from the neuroimaging literature, where it has been shown that
the right hemisphere is crucial specifically for person identity processing [135,143146]. Also,
PLOS ONE
Encoding personal memories of people and places in the brain using language models
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 19 / 28
this seems to converge with previous results showing person-selective early evoked activity
[60], suggesting that not only quantitative, but also qualitative differences exist among the two
(i.e. differences not only in terms of encoding performance, but also of neural processes
involved; cf. above).
On a more abstract, philosophical level, our findings would seem to be compatible with so-
called descriptivism. Descriptivism is a view on proper names and individual entities that pro-
poses that the meaning of a proper name can be in fact equated to a set of sentences describing
that entity (in our case, declarative memories) [147,148]. After all, we have shown that the
words contained in short texts about individual entities can be used to capture relevant aspects
of the way in which brains process their representations. However, one could also argue, as
pointed out by [116], that descriptions (and by extension, declarative knowledge) fail at cap-
turing all semantic information associated with cognitive representations of individual entities,
namely not considering socio-emotional dimensions.
We believe that it is precisely our choice of modelling memories (i.e. descriptions) through
language models that allows to address this concern. Language models have been consistently
shown to capture both social and emotional dimensions of word meaning from lexical distri-
butional information [11,13,14]. Therefore, our view is that although social and emotional
semantic dimensions of individual entities can be studied in isolation (cfr. [3335]), they are
latent in declarative memories (descriptions) and can be therefore be captured in the mixture
of semantic information and dimensions present in language models [117]. In this sense,
descriptions are a richer source of semantic information about individual entities than it may
appear at first sight: they not only convey their propositional content, but also the broader
semantic information hidden in the words that make them up.
5 Conclusion
In this work we have shown that it is possible to capture semantic knowledge about personally
familiar individual entities by extracting semantic representations from subject-specific mem-
ories using language models. Importantly, we also demonstrated that similar performance
could be obtained also for a matched set of famous entities, thus proving the solidity of our
approach. The results of our multivariate encoding analyses indicate that entity-specific infor-
mation emerges in a time window usually associated with semantic processing, between 200–
800ms. Also, this seems to be the case especially in bilateral temporo-parietal regions, which
converges with neuroimaging studies. Overall, our results exploit cutting-edge models in AI to
provide a window into extremely fine-grained, subject-specific semantic knowledge as it is
processed in the brain. We hope that this will motivate future work aiming at the investigation
of individual uniqueness in semantic processing.
Supporting information
S1 File.
(PDF)
Acknowledgments
We would like to thank prof. Davide Crepaldi, head of the Language, Learning and Reading
lab at SISSA, who provided the facilities for collecting the EEG data while the first author was
visiting, and Marjina Bellida for helping out with subject preparation procedures. Finally,
Natalia Ginzburg, from whom we borrowed, out of admiration for the wonderful book ‘Les-
sico Famigliare’, part of the title of this paper.
PLOS ONE
Encoding personal memories of people and places in the brain using language models
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 20 / 28
Author Contributions
Conceptualization: Andrea Bruera, Massimo Poesio.
Data curation: Andrea Bruera.
Formal analysis: Andrea Bruera.
Funding acquisition: Massimo Poesio.
Investigation: Andrea Bruera.
Methodology: Andrea Bruera.
Project administration: Andrea Bruera, Massimo Poesio.
Resources: Massimo Poesio.
Software: Andrea Bruera.
Supervision: Andrea Bruera, Massimo Poesio.
Validation: Andrea Bruera.
Visualization: Andrea Bruera.
Writing original draft: Andrea Bruera.
Writing review & editing: Andrea Bruera, Massimo Poesio.
References
1. Cohen G, Burke DM. Memory for proper names: A review. Memory. 1993; 1(4):249–263. https://doi.
org/10.1080/09658219308258237 PMID: 7584271
2. Kaminski J, Bowren M Jr, Manzel K, Tranel D. Neural correlates of recognition and naming of famous
persons and landmarks: A special role for the left anterior temporal lobe. In: Handbook of Clinical Neu-
rology. vol. 187. Elsevier; 2022. p. 303–317.
3. Semenza C. Proper names and personal identity. Handbook of Clinical Neurology. 2022; 187:287–
302. https://doi.org/10.1016/B978-0-12-823493-8.00008-0 PMID: 35964978
4. Tulving E. Episodic and semantic memory. Organization of Memory. 1972; p. 382–403.
5. Tulving E. Episodic memory: From mind to brain. Annual review of psychology. 2002; 53(1):1–25.
https://doi.org/10.1146/annurev.psych.53.100901.135114 PMID: 11752477
6. Yee E, Chrysikou EG, Thompson-Schill SL. Semantic Memory 17. The Oxford Handbook of Cognitive
Neuroscience: Volume 1: Core Topics. 2013; p. 353.
7. Renoult L, Davidson PS, Palombo DJ, Moscovitch M, Levine B. Personal semantics: at the crossroads
of semantic and episodic memory. Trends in cognitive sciences. 2012; 16(11):550–558. https://doi.
org/10.1016/j.tics.2012.09.003 PMID: 23040159
8. Eichenbaum H. Declarative memory: Insights from cognitive neurobiology. Annual review of psychol-
ogy. 1997; 48(1):547–572. https://doi.org/10.1146/annurev.psych.48.1.547 PMID: 9046568
9. Wittgenstein L. Philosophical investigations. Philosophische Untersuchungen. Macmillan; 1953.
10. Harris ZS. Distributional structure. Word. 1954; 10(2-3):146–162. https://doi.org/10.1080/00437956.
1954.11659520
11. Hollis G, Westbury C. The principals of meaning: Extracting semantic dimensions fromco-occurrence
models of semantics. Psychonomic bulletin & review. 2016; 23:1744–1756. https://doi.org/10.3758/
s13423-016-1053-2 PMID: 27138012
12. Hollis G, Westbury C, Lefsrud L. Extrapolating human judgments from skip-gram vector representa-
tions of word meaning. Quarterly Journal of Experimental Psychology. 2017; 70(8):1603–1619. https://
doi.org/10.1080/17470218.2016.1195417 PMID: 27251936
13. Utsumi A. Exploring what is encoded in distributional word vectors: A neurobiologically motivated anal-
ysis. Cognitive Science. 2020; 44(6):e12844. https://doi.org/10.1111/cogs.12844 PMID: 32458523
14. Chersoni E, Santus E, Huang CR, Lenci A. Decoding word embeddings with brain-based semantic
features. Computational Linguistics. 2021; 47(3):663–698. https://doi.org/10.1162/coli_a_00412
PLOS ONE
Encoding personal memories of people and places in the brain using language models
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 21 / 28
15. Mandera P, Keuleers E, Brysbaert M. Explaining human performance in psycholinguistic tasks with
models of semantic similarity based on prediction and counting: A review and empirical validation.
Journal of Memory and Language. 2017; 92:57–78. https://doi.org/10.1016/j.jml.2016.04.001
16. Wingfield C, Connell L. Understanding the role of linguistic distributional knowledge in cognition. Lan-
guage, Cognition and Neuroscience. 2022; 37(10):1220–1270. https://doi.org/10.1080/23273798.
2022.2069278
17. Mitchell TM, Shinkareva SV, Carlson A, Chang KM, Malave VL, Mason RA, et al. Predicting human
brain activity associated with the meanings of nouns. science. 2008; 320(5880):1191–1195. https://
doi.org/10.1126/science.1152876 PMID: 18511683
18. Pereira F, Lou B, Pritchett B, Ritter S, Gershman SJ, Kanwisher N, et al. Toward a universal decoder
of linguistic meaning from brain activation. Nature communications. 2018; 9(1):963. https://doi.org/10.
1038/s41467-018-03068-4 PMID: 29511192
19. Goldstein A, Zada Z, Buchnik E, Schain M, Price A, Aubrey B, et al. Shared computational principles
for language processing in humans and deep language models. Nature neuroscience. 2022; 25
(3):369–380. https://doi.org/10.1038/s41593-022-01026-4 PMID: 35260860
20. Hale JT, Campanelli L, Li J, Bhattasali S, Pallier C, Brennan JR. Neurocomputational models of lan-
guage processing. Annual Review of Linguistics. 2022; 8:427–446. https://doi.org/10.1146/annurev-
linguistics-051421-020803
21. Recchia G, Jones MN. More data trumps smarter algorithms: Comparing pointwise mutual information
with latent semantic analysis. Behavior research methods. 2009; 41:647–656. https://doi.org/10.3758/
BRM.41.3.647 PMID: 19587174
22. Sahlgren M, Lenci A. The Effects of Data Size and Frequency Range on Distributional Semantic Mod-
els. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing;
2016. p. 975–980.
23. Gu¨nther F, Rinaldi L, Marelli M. Vector-space models of semantic representation from a cognitive per-
spective: A discussion of common misconceptions. Perspectives on Psychological Science. 2019; 14
(6):1006–1033. https://doi.org/10.1177/1745691619861372 PMID: 31505121
24. Ruzzetti ES, Ranaldi L, Mastromattei M, Fallucchi F, Scarpato N, Zanzotto FM. Lacking the Embed-
ding of a Word? Look it up into a Traditional Dictionary. In: Findings of the Association for Computa-
tional Linguistics: ACL 2022; 2022. p. 2651–2662.
25. Yu W, Zhu C, Fang Y, Yu D, Wang S, Xu Y, et al. Dict-BERT: Enhancing Language Model Pre-training
with Dictionary. In: Findings of the Association for Computational Linguistics: ACL 2022; 2022.
p. 1907–1918.
26. Asr FT, Willits JA, Jones MN. Comparing Predictive and Co-occurrence Based Models of Lexical
Semantics Trained on Child-directed Speech. In: 38th Annual Meeting of the Cognitive Science Soci-
ety: Recognizing and Representing Events, CogSci 2016. The Cognitive Science Society; 2016.
p. 1092–1097.
27. Herbelot A, Baroni M. High-risk learning: acquiring new word vectors from tiny data. In: Proceedings of
the 2017 Conference on Empirical Methods in Natural Language Processing; 2017. p. 304–309.
28. Schick T, Schu¨tze H. Rare words: A major problem for contextualized embeddings and how to fix it by
attentive mimicking. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34; 2020.
p. 8766–8774.
29. Fernandino L, Tong JQ, Conant LL, Humphries CJ, Binder JR. Decoding the information structure
underlying the neural representation of concepts. Proceedings of the National Academy of Sciences.
2022; 119(6):e2108091119. https://doi.org/10.1073/pnas.2108091119 PMID: 35115397
30. Carota F, Nili H, Kriegeskorte N, Pulvermu¨ller F. Experientially-grounded and distributional semantic
vectors uncover dissociable representations of conceptual categories. Language, Cognition and Neu-
roscience. 2023; p. 1–25.
31. Bruera A, Poesio M. Exploring the representations of individual entities in the brain combining EEG
and distributional semantics. Frontiers in Artificial Intelligence. 2022; p. 25. https://doi.org/10.3389/frai.
2022.796793 PMID: 35280237
32. Anderson AJ, McDermott K, Rooks B, Heffner KL, Dodell-Feder D, Lin FV. Decoding individual identity
from brain activity elicited in imagining common experiences. Nature communications. 2020; 11
(1):5916. https://doi.org/10.1038/s41467-020-19630-y PMID: 33219210
33. Thornton MA, Weaverdyck ME, Tamir DI. The brain represents people as the mental states they habit-
ually experience. Nature communications. 2019; 10(1):2291. https://doi.org/10.1038/s41467-019-
10309-7 PMID: 31123269
34. Peer M, Hayman M, Tamir B, Arzy S. Brain coding of social network structure. Journal of Neurosci-
ence. 2021; 41(22):4897–4909. https://doi.org/10.1523/JNEUROSCI.2641-20.2021 PMID: 33903220
PLOS ONE
Encoding personal memories of people and places in the brain using language models
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 22 / 28
35. Ron Y, Dafni-Merom A, Saadon-Grosman N, Roseman M, Elias U, Arzy S. Brain system for social cat-
egorization by narrative roles. Journal of Neuroscience. 2022; 42(26):5246–5253. https://doi.org/10.
1523/JNEUROSCI.1436-21.2022 PMID: 35613892
36. Kim HJ, Lux BK, Lee E, Finn ES, Woo CW. Brain decoding of spontaneous thought: Predictive model-
ing of self-relevance and valence using personal narratives. Proceedings of the National Academy of
Sciences. 2024; 121(14):e2401959121. https://doi.org/10.1073/pnas.2401959121 PMID: 38547065
37. Charest I, Kriegeskorte N. The brain of the beholder: honouring individual representational idiosyncra-
sies. Language, Cognition and Neuroscience. 2015; 30(4):367–379. https://doi.org/10.1080/
23273798.2014.1002505
38. De Haas B, Iakovidis AL, Schwarzkopf DS, Gegenfurtner KR. Individual differences in visual salience
vary along semantic dimensions. Proceedings of the National Academy of Sciences. 2019; 116
(24):11687–11692. https://doi.org/10.1073/pnas.1820553116 PMID: 31138705
39. Levine SM, Schwarzbach JV. Individualizing representational similarity analysis. Frontiers in psychia-
try. 2021; 12:729457. https://doi.org/10.3389/fpsyt.2021.729457 PMID: 34707520
40. Johns BT. Determining the relativity of word meanings through the construction of individualized mod-
els of semantic memory. Cognitive Science. 2024; 48(2):e13413. https://doi.org/10.1111/cogs.13413
PMID: 38402448
41. Chen M, Chu Z, Chen Y, Stratos K, Gimpel K. EntEval: A Holistic Evaluation Benchmark for Entity
Representations. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language
Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-
IJCNLP); 2019. p. 421–433.
42. Westera M, Gupta A, Boleda G, Pado
´S. Distributional models of category concepts based on names
of category members. Cognitive Science. 2021; 45(9):e13029. https://doi.org/10.1111/cogs.13029
PMID: 34490924
43. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and
phrases and their compositionality. Advances in neural information processing systems. 2013; 26.
44. Anderson AJ, Binder JR, Fernandino L, Humphries CJ, Conant LL, Raizada RD, et al. An integrated
neural decoder of linguistic and experiential meaning. Journal of Neuroscience. 2019; 39(45):8969–
8987. https://doi.org/10.1523/JNEUROSCI.2575-18.2019 PMID: 31570538
45. Conneau A, Khandelwal K, Goyal N, Chaudhary V, Wenzek G, Guzma
´n F, et al. Unsupervised Cross-
lingual Representation Learning at Scale. In: Proceedings of the 58th Annual Meeting of the Associa-
tion for Computational Linguistics; 2020. p. 8440–8451.
46. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need.
Advances in neural information processing systems. 2017; 30.
47. Bruera A, Tao Y, Anderson A, C¸okal D, Haber J, Poesio M. Modeling Brain Representations of Words’
Concreteness in Context Using GPT-2 and Human Ratings. Cognitive Science. 2023; 47(12):e13388.
https://doi.org/10.1111/cogs.13388 PMID: 38103208
48. Schrimpf M, Blank IA, Tuckute G, Kauf C, Hosseini EA, Kanwisher N, et al. The neural architecture of
language: Integrative modeling converges on predictive processing. Proceedings of the National
Academy of Sciences. 2021; 118(45):e2105646118. https://doi.org/10.1073/pnas.2105646118 PMID:
34737231
49. Caucheteux C, Gramfort A, King JR. Evidence of a predictive coding hierarchy in the human brain lis-
tening to speech. Nature human behaviour. 2023; 7(3):430–441. https://doi.org/10.1038/s41562-022-
01516-2 PMID: 36864133
50. Tang J, LeBel A, Jain S, Huth AG. Semantic reconstruction of continuous language from non-invasive
brain recordings. Nature Neuroscience. 2023; 26(5):858–866. https://doi.org/10.1038/s41593-023-
01304-9 PMID: 37127759
51. Goldstein A, Grinstein-Dabush A, Schain M, Wang H, Hong Z, Aubrey B, et al. Alignment of brain
embeddings and artificial contextual embeddings in natural language points to common geometric pat-
terns. Nature communications. 2024; 15(1):2768. https://doi.org/10.1038/s41467-024-46631-y PMID:
38553456
52. Moore V, Valentine T. The Effects Of Age Of Acquisition In Processing Famous Faces And Names:
Exploring The Locus And Proposing A Mechanism. In: Proceedings of the Twenty First Annual Confer-
ence of the Cognitive Science Society. Psychology Press; 2020. p. 416–421.
53. Paivio A, Yuille JC, Madigan SA. Concreteness, imagery, and meaningfulness values for 925 nouns.
Journal of experimental psychology. 1968; 76(1p2):1. https://doi.org/10.1037/h0025327 PMID: 5672258
54. Rofes A, Zakaria
´s L, Ceder K, Lind M, Johansson MB, De Aguiar V, et al. Imageability ratings across
languages. Behavior Research Methods. 2018; 50:1187–1197. https://doi.org/10.3758/s13428-017-
0936-0 PMID: 28707216
PLOS ONE
Encoding personal memories of people and places in the brain using language models
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 23 / 28
55. Morton NW, Zippi EL, Noh SM, Preston AR. Semantic knowledge of famous people and places is rep-
resented in hippocampus and distinct cortical networks. Journal of Neuroscience. 2021; 41(12):2762–
2779. https://doi.org/10.1523/JNEUROSCI.2034-19.2021 PMID: 33547163
56. Runge A, Hovy E. Exploring Neural Entity Representations for Semantic Information. In: Alishahi A,
Belinkov Y, Chrupała G, Hupkes D, Pinter Y, Sajjad H, editors. Proceedings of the Third BlackboxNLP
Workshop on Analyzing and Interpreting Neural Networks for NLP. Online: Association for Computa-
tional Linguistics; 2020. p. 204–216. Available from: https://aclanthology.org/2020.blackboxnlp-1.20.
57. Chen M, Chu Z, Stratos K, Gimpel K. Mining Knowledge for Natural Language Inference from Wikipe-
dia Categories. In: Findings of the Association for Computational Linguistics: EMNLP 2020; 2020.
p. 3500–3511.
58. Zhou WX, Sornette D, Hill RA, Dunbar RI. Discrete hierarchical organization of social group sizes. Pro-
ceedings of the Royal Society B: Biological Sciences. 2005; 272(1561):439–444. https://doi.org/10.
1098/rspb.2004.2970 PMID: 15734699
59. Hill RA, Dunbar RI. Social network size in humans. Human nature. 2003; 14(1):53–72. https://doi.org/
10.1007/s12110-003-1016-y PMID: 26189988
60. Bruera A, Poesio M. EEG searchlight decoding reveals person-and place-specific responses for
semantic category and familiarity. Journal of Cognitive Neuroscience. 2024; p. 1–20. https://doi.org/
10.1162/jocn_a_02125 PMID: 38319891
61. Boudewyn MA, Luck SJ, Farrens JL, Kappenman ES. How many trials does it take to get a significant
ERP effect? It depends. Psychophysiology. 2018; 55(6):e13049. https://doi.org/10.1111/psyp.13049
PMID: 29266241
62. Grootswagers T, Wardle SG, Carlson TA. Decoding dynamic brain patterns from evoked responses: A
tutorial on multivariate pattern analysis applied to time series neuroimaging data. Journal of cognitive
neuroscience. 2017; 29(4):677–697. https://doi.org/10.1162/jocn_a_01068 PMID: 27779910
63. Ryba
´řM, Poli R, Daly I. Decoding of semantic categories of imagined concepts of animals and tools in
fNIRS. Journal of Neural Engineering. 2021; 18(4):046035. https://doi.org/10.1088/1741-2552/abf2e5
PMID: 33780916
64. Leonardelli E, Fait E, Fairhall SL. Temporal dynamics of access to amodal representations of cate-
gory-level conceptual information. Scientific reports. 2019; 9(1):239. https://doi.org/10.1038/s41598-
018-37429-2 PMID: 30659237
65. Gramfort A, Luessi M, Larson E, Engemann DA, Strohmeier D, Brodbeck C, et al. MEG and EEG data
analysis with MNE-Python. Frontiers in neuroscience. 2013; p. 267. https://doi.org/10.3389/fnins.
2013.00267 PMID: 24431986
66. Jas M, Larson E, Engemann DA, Leppa
¨kangas J, Taulu S, Ha
¨ma
¨la
¨inen M, et al. A reproducible MEG/
EEG group study with the MNE software: recommendations, quality assessments, and good practices.
Frontiers in neuroscience. 2018; 12:530. https://doi.org/10.3389/fnins.2018.00530 PMID: 30127712
67. Luck SJ. An introduction to the event-related potential technique. MIT press; 2014.
68. Jas M, Engemann DA, Bekhti Y, Raimondo F, Gramfort A. Autoreject: Automated artifact rejection for
MEG and EEG data. NeuroImage. 2017; 159:417–429. https://doi.org/10.1016/j.neuroimage.2017.06.
030 PMID: 28645840
69. Todd MT, Nystrom LE, Cohen JD. Confounds in multivariate pattern analysis: theory and rule repre-
sentation case study. Neuroimage. 2013; 77:157–165. https://doi.org/10.1016/j.neuroimage.2013.03.
039 PMID: 23558095
70. Snoek L, MiletićS, Scholte HS. How to control for confounds in decoding analyses of neuroimaging
data. Neuroimage. 2019; 184:741–760. https://doi.org/10.1016/j.neuroimage.2018.09.074 PMID:
30268846
71. Mass Y, Roitman H, Erera S, Rivlin O, Weiner B, Konopnicki D. A study of bert for non-factoid ques-
tion-answering under passage length constraints. arXiv preprint arXiv:190806780. 2019;.
72. Rudnicka K. Variation of sentence length across time and genre. Diachronic corpora, genre, and lan-
guage change. 2018; p. 220–240.
73. Bernardi R, Boleda G, Ferna
´ndez R, Paperno D. Distributional semantics in use. In: Proceedings of
the First Workshop on Linking Computational Models of Lexical, Sentential and Discourse-level
Semantics; 2015. p. 95–101.
74. Kim N, Patel R, Poliak A, Xia P, Wang A, McCoy T, et al. Probing What Different NLP Tasks Teach
Machines about Function Word Comprehension. In: Proceedings of the Eighth Joint Conference on
Lexical and Computational Semantics (*SEM 2019); 2019. p. 235–249.
75. Apidianaki M. From Word Types to Tokens and Back: A Survey of Approaches to Word Meaning
Representation and Interpretation. Computational Linguistics. 2022; p. 1–60.
PLOS ONE
Encoding personal memories of people and places in the brain using language models
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 24 / 28
76. Yee E, Jones MN, McRae K. Semantic memory. The Stevens’ handbook of experimental psychology
and cognitive neuroscience. 2018;3.
77. Baroni M, Dinu G, Kruszewski G. Don’t count, predict! a systematic comparison of context-counting
vs. context-predicting semantic vectors. In: Proceedings of the 52nd Annual Meeting of the Association
for Computational Linguistics (Volume 1: Long Papers); 2014. p. 238–247.
78. Bender EM, Gebru T, McMillan-Major A, Shmitchell S. On the Dangers of Stochastic Parrots: Can Lan-
guage Models Be Too Big? In: Proceedings of the 2021 ACM conference on fairness, accountability,
and transparency; 2021. p. 610–623.
79. VulićI, Ponti EM, Litschko R, Glavas
ˇG, Korhonen A. Probing Pretrained Language Models for Lexical
Semantics. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Pro-
cessing (EMNLP); 2020. p. 7222–7240.
80. Pires T, Schlinger E, Garrette D. How Multilingual is Multilingual BERT? In: Proceedings of the 57th
Annual Meeting of the Association for Computational Linguistics; 2019. p. 4996–5001.
81. Wu S, Dredze M. Are All Languages Created Equal in Multilingual BERT? In: Proceedings of the 5th
Workshop on Representation Learning for NLP; 2020. p. 120–130.
82. Hollenstein N, Pirovano F, Zhang C, Ja
¨ger L, Beinborn L. Multilingual Language Models Predict
Human Reading Behavior. In: Proceedings of the 2021 Conference of the North American Chapterof
the Association for Computational Linguistics: Human Language Technologies; 2021. p. 106–123.
83. Periti F, Montanelli S. Lexical Semantic Change through Large Language Models: a Survey. ACM
Computing Surveys. 2024;. https://doi.org/10.1145/3672393
84. Wenzek G, Lachaux MA, Conneau A, Chaudhary V, Guzma
´n F, Joulin A, et al. CCNet: Extracting high
quality monolingual datasets from web crawl data. arXiv preprint arXiv:191100359. 2019;.
85. Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, et al. Huggingface’s transformers: State-
of-the-art natural language processing. arXiv preprint arXiv:191003771. 2019;.
86. Yarkoni T, Balota D, Yap M. Moving beyond Coltheart’s N: A new measure of orthographic similarity.
Psychonomic bulletin & review. 2008; 15(5):971–979. https://doi.org/10.3758/PBR.15.5.971
87. Anderson AJ, Zinszer BD, Raizada RD. Representational similarity encoding for fMRI: Pattern-based
synthesis to predict brain activity using stimulus-model-similarities. NeuroImage. 2016;128:44–53.
https://doi.org/10.1016/j.neuroimage.2015.12.035 PMID: 26732404
88. Kriegeskorte N, Mur M, Bandettini PA. Representational similarity analysis-connecting the branches of
systems neuroscience. Frontiers in systems neuroscience. 2008; p. 4. https://doi.org/10.3389/neuro.
06.004.2008 PMID: 19104670
89. Walther A, Nili H, Ejaz N, Alink A, Kriegeskorte N, Diedrichsen J. Reliability of dissimilarity measures
for multi-voxel pattern analysis. Neuroimage. 2016; 137:188–200. https://doi.org/10.1016/j.
neuroimage.2015.12.012 PMID: 26707889
90. Varoquaux G, Raamana PR, Engemann DA, Hoyos-Idrobo A, Schwartz Y, Thirion B. Assessing and
tuning brain decoders: cross-validation, caveats, and guidelines. NeuroImage. 2017; 145:166–179.
https://doi.org/10.1016/j.neuroimage.2016.10.038 PMID: 27989847
91. Nili H, Wingfield C, Walther A, Su L, Marslen-Wilson W, Kriegeskorte N. A toolbox for representational
similarity analysis. PLoS computational biology. 2014; 10(4):e1003553. https://doi.org/10.1371/
journal.pcbi.1003553 PMID: 24743308
92. Etzel JA, Zacks JM, Braver TS. Searchlight analysis: promise, pitfalls, and potential. Neuroimage.
2013; 78:261–269. https://doi.org/10.1016/j.neuroimage.2013.03.041 PMID: 23558106
93. Kriegeskorte N, Goebel R, Bandettini P. Information-based functional brain mapping. Proceedings of
the National Academy of Sciences. 2006; 103(10):3863–3868. https://doi.org/10.1073/pnas.
0600244103 PMID: 16537458
94. Su L, Fonteneau E, Marslen-Wilson W, Kriegeskorte N. Spatiotemporal searchlight representational
similarity analysis in EMEG source space In: 2012 Second International Workshop on Pattern Recog-
nition in NeuroImaging; 2012.
95. Collins E, Robinson AK, Behrmann M. Distinct neural processes for the perception of familiar versus
unfamiliar faces along the visual hierarchy revealed by EEG. NeuroImage. 2018; 181:120–131.
https://doi.org/10.1016/j.neuroimage.2018.06.080 PMID: 29966716
96. Smith SM, Nichols TE. Threshold-free cluster enhancement: addressing problems of smoothing,
threshold dependence and localisation in cluster inference. Neuroimage. 2009; 44(1):83–98. https://
doi.org/10.1016/j.neuroimage.2008.03.061 PMID: 18501637
97. Latinus M, Nichols T, Rousselet G. Cluster-based computational methods for mass univariate analy-
ses of event-related brain potentials/fields: A simulation study. Journal of neuroscience methods.
2015; 250:85–93. https://doi.org/10.1016/j.jneumeth.2014.08.003 PMID: 25128255
PLOS ONE
Encoding personal memories of people and places in the brain using language models
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 25 / 28
98. Szucs D, Ioannidis JP. Empirical assessment of published effect sizes and power in the recent cogni-
tive neuroscience and psychology literature. PLoS biology. 2017; 15(3):e2000797. https://doi.org/10.
1371/journal.pbio.2000797 PMID: 28253258
99. Kaplan JT, Man K, Greening SG. Multivariate cross-classification: applying machine learning tech-
niques to characterize abstraction in neural representations. Frontiers in human neuroscience. 2015;
9:151. https://doi.org/10.3389/fnhum.2015.00151 PMID: 25859202
100. Lake B, Baroni M. Generalization without systematicity: On the compositional skills of sequence-to-
sequence recurrent networks. In: International conference on machine learning. PMLR; 2018.
p. 2873–2882.
101. Gorman K, Bedrick S. We need to talk about standard splits. In: Proceedings of the 57th annual meet-
ing of the association for computational linguistics; 2019. p. 2786–2791.
102. Elangovan A, He J, Verspoor K. Memorization vs. Generalization: Quantifying Data Leakage in NLP
Performance Evaluation. In: Proceedings of the 16th Conference of the European Chapter of the Asso-
ciation for Computational Linguistics: Main Volume; 2021. p. 1325–1335.
103. Chyzhyk D, Varoquaux G, Milham M, Thirion B. How to remove or control confounds in predictive mod-
els, with applications to brain biomarkers. GigaScience. 2022; 11. https://doi.org/10.1093/gigascience/
giac014 PMID: 35277962
104. Eichstaedt JC, Kern ML, Yaden DB, Giorgi S, Park G, Hagan CA, et al. Closed-and open-vocabulary
approaches to text analysis: A review, quantitative comparison, and recommendations. Psychological
Methods. 2021; 26(4):398. https://doi.org/10.1037/met0000349 PMID: 34726465
105. Herbelot A, QasemiZadeh B. You and me. . . in a vector space: modelling individual speakers with
distributional semantics. In: Proceedings of the Fifth Joint Conference on Lexical and Computational
Semantics; 2016. p. 179–188.
106. Strapparava C, Mihalcea R. Learning to identify emotions in text. In: Proceedings of the 2008 ACM
symposium on Applied computing; 2008. p. 1556–1560.
107. Sailunaz K, Dhaliwal M, Rokne J, Alhajj R. Emotion detection from text and speech: a survey. Social
Network Analysis and Mining. 2018; 8:1–26. https://doi.org/10.1007/s13278-018-0505-2
108. Pennebaker JW, Graybeal A. Patterns of natural language use: Disclosure, personality, and social
integration. Current Directions in Psychological Science. 2001; 10(3):90–93. https://doi.org/10.1111/
1467-8721.00123
109. Calvo RA, Milne DN, Hussain MS, Christensen H. Natural language processing in mental health appli-
cations using non-clinical texts. Natural Language Engineering. 2017; 23(5):649–685. https://doi.org/
10.1017/S1351324916000383
110. Corcoran CM, Carrillo F, Ferna
´ndez-Slezak D, Bedi G, Klim C, Javitt DC, et al. Prediction of psychosis
across protocols and risk cohorts using automated language analysis. World Psychiatry. 2018; 17
(1):67–75. https://doi.org/10.1002/wps.20491 PMID: 29352548
111. Mota NB, Weissheimer J, Ribeiro M, De Paiva M, Avilla-Souza J, Simabucuru G, et al. Dreaming dur-
ing the Covid-19 pandemic: Computational assessment of dream reports reveals mental suffering
related to fear of contagion. PloS one. 2020; 15(11):e0242903. https://doi.org/10.1371/journal.pone.
0242903 PMID: 33253274
112. Sarzynska-Wawer J, Wawer A, Pawlak A, Szymanowska J, Stefaniak I, Jarkiewicz M, et al. Detecting
formal thought disorder by deep contextualized word representations. Psychiatry Research. 2021;
304:114135. https://doi.org/10.1016/j.psychres.2021.114135 PMID: 34343877
113. Grimmer J, Roberts ME, Stewart BM. Text as data: A new framework for machine learning and the
social sciences. Princeton University Press; 2022.
114. Kutas M, Federmeier KD. Thirty years and counting: finding meaning in the N400 component of the
event-related brain potential (ERP). Annual review of psychology. 2011; 62:621–647. https://doi.org/
10.1146/annurev.psych.093008.131123 PMID: 20809790
115. Wilding EL, Ranganath C. Electrophysiological Correlates of Episodic Memory Processes. The Oxford
Handbook of Event-Related Potential Components. 2011; p. 373.
116. O’Rourke T, de Diego Balaguer R. Names and their meanings: A dual-process account of proper-
name encoding and retrieval. Neuroscience & Biobehavioral Reviews. 2020; 108:308–321. https://doi.
org/10.1016/j.neubiorev.2019.11.005 PMID: 31734171
117. Grand G, Blank IA, Pereira F, Fedorenko E. Semantic projection recovers rich human knowledge of
multiple object features from word embeddings. Nature human behaviour. 2022; 6(7):975–987. https://
doi.org/10.1038/s41562-022-01316-8 PMID: 35422527
118. Epoch AI. Data on Notable AI Models; 2024. Available from: https://epochai.org/data/notable-ai-
models.
PLOS ONE
Encoding personal memories of people and places in the brain using language models
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 26 / 28
119. De Varda A, Marelli M. Scaling in cognitive modelling: A multilingual approach to human reading times.
In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume
2: Short Papers); 2023. p. 139–149.
120. Oh BD, Schuler W. Why does surprisal from larger transformer-based language models provide a
poorer fit to human reading times? Transactions of the Association for Computational Linguistics.
2023; 11:336–350. https://doi.org/10.1162/tacl_a_00548
121. Antonello R, Vaidya A, Huth A. Scaling laws for language encoding models in fMRI. Advances in Neu-
ral Information Processing Systems. 2024; 36.
122. Hosseini EA, Schrimpf M, Zhang Y, Bowman S, Zaslavsky N, Fedorenko E. Artificial neural network
language models predict human brain responses to language even after a developmentally realistic
amount of training. Neurobiology of Language. 2024; 5(1):43–63. https://doi.org/10.1162/nol_a_00137
PMID: 38645622
123. Jain S, Huth A. Incorporating context into language encoding models for fMRI. Advances in neural
information processing systems. 2018; 31.
124. Jat S, Tang H, Talukdar P, Mitchell T. Relating Simple Sentence Representations in Deep Neural Net-
works and the Brain. In: Proceedings of the 57th Annual Meeting of the Association for Computational
Linguistics; 2019. p. 5137–5154.
125. Lenci A, Sahlgren M, Jeuniaux P, Cuba Gyllensten A, Miliani M. A comparative evaluation and analysis
of three generations of Distributional Semantic Models. Language resources and evaluation. 2022; 56
(4):1269–1313. https://doi.org/10.1007/s10579-021-09575-z
126. Blasi DE, Henrich J, Adamou E, Kemmerer D, Majid A. Over-reliance on English hinders cognitive sci-
ence. Trends in cognitive sciences. 2022; 26(12):1153–1170. https://doi.org/10.1016/j.tics.2022.09.
015 PMID: 36253221
127. Simanova I, Van Gerven M, Oostenveld R, Hagoort P. Identifying object categories from event-related
EEG: toward decoding of conceptual representations. PloS one. 2010; 5(12):e14465. https://doi.org/
10.1371/journal.pone.0014465 PMID: 21209937
128. Shinkareva SV, Malave VL, Mason RA, Mitchell TM, Just MA. Commonality of neural representations
of words and pictures. Neuroimage. 2011; 54(3):2418–2425. https://doi.org/10.1016/j.neuroimage.
2010.10.042 PMID: 20974270
129. Rousselet GA, Husk JS, Bennett PJ, Sekuler AB. Time course and robustness of ERP object and face
differences. Journal of vision. 2008; 8(12):3–3. https://doi.org/10.1167/8.12.3 PMID: 18831616
130. Rossion B. Understanding face perception by means of human electrophysiology. Trends in cognitive
sciences. 2014; 18(6):310–318. https://doi.org/10.1016/j.tics.2014.02.013 PMID: 24703600
131. Just MA, Cherkassky VL, Aryal S, Mitchell TM. A neurosemantic theory of concrete noun representa-
tion based on the underlying brain codes. PloS one. 2010; 5(1):e8622. https://doi.org/10.1371/journal.
pone.0008622 PMID: 20084104
132. Simanova I, Hagoort P, Oostenveld R, Van Gerven MA. Modality-independent decoding of semantic
information from the human brain. Cerebral cortex. 2014; 24(2):426–434. https://doi.org/10.1093/
cercor/bhs324 PMID: 23064107
133. Mahon BZ, Caramazza A. What drives the organization of object knowledge in the brain? Trends in
cognitive sciences. 2011; 15(3):97–103. https://doi.org/10.1016/j.tics.2011.01.004 PMID: 21317022
134. Olson IR, McCoy D, Klobusicky E, Ross LA. Social cognition and the anterior temporal lobes: a review
and theoretical framework. Social cognitive and affective neuroscience. 2013; 8(2):123–133. https://
doi.org/10.1093/scan/nss119 PMID: 23051902
135. Desai RH, Tadimeti U, Riccardi N. Proper and common names in thesemantic system. Brain Structure
and Function. 2022; p. 1–16. https://doi.org/10.1007/s00429-022-02593-9 PMID: 36372812
136. Ragni F, Lingnau A, Turella L. Decoding category and familiarity information during visual imagery.
NeuroImage. 2021; 241:118428. https://doi.org/10.1016/j.neuroimage.2021.118428 PMID: 34311066
137. Raichle ME. The brain’s default mode network. Annual review of neuroscience. 2015; 38:433–447.
https://doi.org/10.1146/annurev-neuro-071013-014030 PMID: 25938726
138. Campbell A, Louw R, Michniak E, Tanaka JW. Identity-specific neural responses to three categories of
face familiarity (own, friend, stranger) using fast periodic visual stimulation. Neuropsychologia. 2020;
141:107415. https://doi.org/10.1016/j.neuropsychologia.2020.107415 PMID: 32126214
139. Smallwood J, Bernhardt BC, Leech R, Bzdok D, Jefferies E, Margulies DS. The default mode network
in cognition: a topographical perspective. Nature reviews neuroscience. 2021; 22(8):503–513. https://
doi.org/10.1038/s41583-021-00474-4 PMID: 34226715
140. Kaefer K, Stella F, McNaughton BL, Battaglia FP. Replay, the default mode network and the cascaded
memory systems model. Nature Reviews Neuroscience. 2022; 23(10):628–640. https://doi.org/10.
1038/s41583-022-00620-6 PMID: 35970912
PLOS ONE
Encoding personal memories of people and places in the brain using language models
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 27 / 28
141. Reddy L, Tsuchiya N, Serre T. Reading the mind’s eye: decoding category information during mental
imagery. Neuroimage. 2010; 50(2):818–825. https://doi.org/10.1016/j.neuroimage.2009.11.084
PMID: 20004247
142. Xie S, Kaiser D, Cichy RM. Visual imagery and perception share neural representations in the alpha
frequency band. Current Biology. 2020; 30(13):2621–2627. https://doi.org/10.1016/j.cub.2020.04.074
PMID: 32531274
143. Gainotti G. Different patterns of famous people recognition disorders in patients with right and left ante-
rior temporal lesions: a systematic review. Neuropsychologia. 2007; 45(8):1591–1607. https://doi.org/
10.1016/j.neuropsychologia.2006.12.013 PMID: 17275042
144. Gainotti G. Implications of recent findings for current cognitive models of familiar people recognition.
Neuropsychologia. 2015; 77:279–287. https://doi.org/10.1016/j.neuropsychologia.2015.09.002 PMID:
26359717
145. Borghesani V, Narvid J, Battistella G, Shwe W, Watson C, Binney RJ, et al. “Looks familiar, but I do
not know who she is”: The role of the anterior right temporal lobe in famous face recognition. Cortex.
2019; 115:72–85. https://doi.org/10.1016/j.cortex.2019.01.006 PMID: 30772608
146. Pisoni A, Sperandeo PR, Lauro LJR, Papagno C. The role of the left and right anterior temporal poles
in people naming and recognition. Neuroscience. 2020; 440:175–185. https://doi.org/10.1016/j.
neuroscience.2020.05.040 PMID: 32497758
147. Russell B. Knowledge by acquaintance and knowledge by description. In: Proceedings of the Aristote-
lian society. vol. 11. JSTOR; 1910. p. 108–128.
148. Searle JR. Proper names. Mind. 1958; 67(266):166–173. https://doi.org/10.1093/mind/LXVII.266.166
PLOS ONE
Encoding personal memories of people and places in the brain using language models
PLOS ONE | https://doi.org/10.1371/journal.pone.0291099 November 22, 2024 28 / 28
... Electroencephalography (EEG) captures electrical activity from scalp electrodes, offering high temporal resolution for tracking neural response timing to language stimuli. [73] applied EEG to examine brain responses to familiar entities. Electrocorticography (ECoG), which involves placing electrodes on the brain's surface, provides even higher spatial resolution than EEG. ...
... While language models are typically trained on large general corpora, individual experiences shape human language understanding. [73] demonstrated that language models could represent subject-specific semantic knowledge of familiar people and places when trained on personalized data. This capability opens up possibilities for personalized language models that capture both general and individual semantic knowledge, with applications in personalized recommendations and userspecific content generation. ...
... Sources of Bias in Embedding Models [13], [32] Measuring and Quantifying Bias [91], [98], [99] Mitigating Bias [15], [92] Ethical Implications of Biased Embeddings [47] Adaptive Language Modeling and Transfer Learning with Embeddings Transfer Learning [11], [24], [32], [40], [100] Domain Adaptation [37], [101], [102] Cross-Lingual Transfer Learning [28], [30], [60], [103] Adaptive Language Modeling [104], [105] Zero-Shot Learning with Embeddings [67], [106] The Role of Embeddings in Emerging Areas Embodied AI [66], [69], [107] Cognitive Science [32], [73]- [75], [108], [109] Reasoning and Commonsense Knowledge [26], [43], [106], [110], [111] ...
Preprint
Word embeddings and language models have transformed natural language processing (NLP) by facilitating the representation of linguistic elements in continuous vector spaces. This review visits foundational concepts such as the distributional hypothesis and contextual similarity, tracing the evolution from sparse representations like one-hot encoding to dense embeddings including Word2Vec, GloVe, and fastText. We examine both static and contextualized embeddings, underscoring advancements in models such as ELMo, BERT, and GPT and their adaptations for cross-lingual and personalized applications. The discussion extends to sentence and document embeddings, covering aggregation methods and generative topic models, along with the application of embeddings in multimodal domains, including vision, robotics, and cognitive science. Advanced topics such as model compression, interpretability, numerical encoding, and bias mitigation are analyzed, addressing both technical challenges and ethical implications. Additionally, we identify future research directions, emphasizing the need for scalable training techniques, enhanced interpretability, and robust grounding in non-textual modalities. By synthesizing current methodologies and emerging trends, this survey offers researchers and practitioners an in-depth resource to push the boundaries of embedding-based language models.
Conference Paper
Full-text available
Polysemes are words that can have different senses depending on the context of utterance: for instance, 'newspaper' can refer to an organization (as in 'manage the newspaper') or to an object (as in 'open the newspaper'). Contrary to a large body of evidence coming from psy-cholinguistics, polysemy has been traditionally modelled in NLP by assuming that each sense should be given a separate representation in a lexicon (e.g. WordNet). This led to the current situation, where datasets used to evaluate the ability of computational models of semantics miss crucial details about the representation of polysemes, thus limiting the amount of evidence that can be gained from their use. In this paper we propose a framework to approach polysemy as a continuous variation in psycholinguistic properties of a word in context. This approach accommodates different sense interpretations, without postulating clear-cut jumps between senses. First we describe a publicly available English dataset that we collected , where polysemes in context (verb-noun phrases) are annotated for their concreteness and body sensory strength. Then, we evaluate static and contextualized language models in their ability to predict the ratings of each pol-yseme in context, as well as in their ability to capture the distinction among senses, revealing and characterizing in an interpretable way the models' flaws.
Article
Full-text available
Proper names are linguistic expressions referring to unique entities, such as individual people or places. This sets them apart from other words like common nouns, which refer to generic concepts. And yet, despite both being individual entities, one's closest friend and one's favorite city are intuitively associated with very different pieces of knowledge—face, voice, social relationship, autobiographical experiences for the former, and mostly visual and spatial information for the latter. Neuroimaging research has revealed the existence of both domain-general and domain-specific brain correlates of semantic processing of individual entities; however, it remains unclear how such commonalities and similarities operate over a fine-grained temporal scale. In this work, we tackle this question using EEG and multivariate (time-resolved and searchlight) decoding analyses. We look at when and where we can accurately decode the semantic category of a proper name and whether we can find person- or place-specific effects of familiarity, which is a modality-independent dimension and therefore avoids sensorimotor differences inherent among the two categories. Semantic category can be decoded in a time window and with spatial localization typically associated with lexical semantic processing. Regarding familiarity, our results reveal that it is easier to distinguish patterns of familiarity-related evoked activity for people, as opposed to places, in both early and late time windows. Second, we discover that within the early responses, both domain-general (left posterior-lateral) and domain-specific (right fronto-temporal, only for people) neural patterns can be individuated, suggesting the existence of person-specific processes.
Article
Full-text available
Lexical Semantic Change (LSC) is the task of identifying, interpreting, and assessing the possible change over time in the meanings of a target word. Traditionally, LSC has been addressed by linguists and social scientists through manual and time-consuming analyses, which have thus been limited in terms of the volume, genres, and time-frame that can be considered. In recent years, computational approaches based on Natural Language Processing have gained increasing attention to automate LSC as much as possible. Significant advancements have been made by relying on Large Language Models (LLMs), which can handle the multiple usages of the words and better capture the related semantic change. In this article, we survey the approaches based on LLMs for LSC and we propose a classification framework characterized by three dimensions: meaning representation , time-awareness , and learning modality . The framework is exploited to i) review the measures for change assessment, ii) compare the approaches on performance, and iii) discuss the current issues in terms of scalability, interpretability, and robustness. Open challenges and future research directions about the use of LLMs for LSC are finally outlined.
Article
Full-text available
Contextual embeddings, derived from deep language models (DLMs), provide a continuous vectorial representation of language. This embedding space differs fundamentally from the symbolic representations posited by traditional psycholinguistics. We hypothesize that language areas in the human brain, similar to DLMs, rely on a continuous embedding space to represent language. To test this hypothesis, we densely record the neural activity patterns in the inferior frontal gyrus (IFG) of three participants using dense intracranial arrays while they listened to a 30-minute podcast. From these fine-grained spatiotemporal neural recordings, we derive a continuous vectorial representation for each word (i.e., a brain embedding) in each patient. Using stringent zero-shot mapping we demonstrate that brain embeddings in the IFG and the DLM contextual embedding space have common geometric patterns. The common geometric patterns allow us to predict the brain embedding in IFG of a given left-out word based solely on its geometrical relationship to other non-overlapping words in the podcast. Furthermore, we show that contextual embeddings capture the geometry of IFG embeddings better than static word embeddings. The continuous brain embedding space exposes a vector-based neural code for natural language processing in the human brain.
Article
Full-text available
The contents and dynamics of spontaneous thought are important factors for personality traits and mental health. However, assessing spontaneous thoughts is challenging due to their unconstrained nature, and directing participants’ attention to report their thoughts may fundamentally alter them. Here, we aimed to decode two key content dimensions of spontaneous thought—self-relevance and valence—directly from brain activity. To train functional MRI-based predictive models, we used individually generated personal stories as stimuli in a story-reading task to mimic narrative-like spontaneous thoughts ( n = 49). We then tested these models on multiple test datasets (total n = 199). The default mode, ventral attention, and frontoparietal networks played key roles in the predictions, with the anterior insula and midcingulate cortex contributing to self-relevance prediction and the left temporoparietal junction and dorsomedial prefrontal cortex contributing to valence prediction. Overall, this study presents brain models of internal thoughts and emotions, highlighting the potential for the brain decoding of spontaneous thought.
Article
Full-text available
Proper names are linguistic expressions referring to unique entities, such as individual people or places. This sets them apart from other words like common nouns, which refer to generic concepts. And yet, despite both being individual entities, one's closest friend and one's favorite city are intuitively associated with very different pieces of knowledge—face, voice, social relationship, autobiographical experiences for the former, and mostly visual and spatial information for the latter. Neuroimaging research has revealed the existence of both domain-general and domain-specific brain correlates of semantic processing of individual entities; however, it remains unclear how such commonalities and similarities operate over a fine-grained temporal scale. In this work, we tackle this question using EEG and multivariate (time-resolved and searchlight) decoding analyses. We look at when and where we can accurately decode the semantic category of a proper name and whether we can find person- or place-specific effects of familiarity, which is a modality-independent dimension and therefore avoids sensorimotor differences inherent among the two categories. Semantic category can be decoded in a time window and with spatial localization typically associated with lexical semantic processing. Regarding familiarity, our results reveal that it is easier to distinguish patterns of familiarity-related evoked activity for people, as opposed to places, in both early and late time windows. Second, we discover that within the early responses, both domain-general (left posterior-lateral) and domain-specific (right fronto-temporal, only for people) neural patterns can be individuated, suggesting the existence of person-specific processes.
Article
Full-text available
Artificial neural networks have emerged as computationally plausible models of human language processing. A major criticism of these models is that the amount of training data they receive far exceeds that of humans during language learning. Here, we use two complementary approaches to ask how the models’ ability to capture human fMRI responses to sentences is affected by the amount of training data. First, we evaluate GPT-2 models trained on 1 million, 10 million, 100 million, or 1 billion words against an fMRI benchmark. We consider the 100-million-word model to be developmentally plausible in terms of the amount of training data given that this amount is similar to what children are estimated to be exposed to during the first 10 years of life. Second, we test the performance of a GPT-2 model trained on a 9-billion-token dataset to reach state-of-the-art next-word prediction performance on the human benchmark at different stages during training. Across both approaches, we find that (i) the models trained on a developmentally plausible amount of data already achieve near-maximal performance in capturing fMRI responses to sentences. Further, (ii) lower perplexity—a measure of next-word prediction performance—is associated with stronger alignment with human data, suggesting that models that have received enough training to achieve sufficiently high next-word prediction performance also acquire representations of sentences that are predictive of human fMRI responses. In tandem, these findings establish that although some training is necessary for the models’ predictive ability, a developmentally realistic amount of training (∼100 million words) may suffice.
Article
Full-text available
The meaning of most words in language depends on their context. Understanding how the human brain extracts contextualized meaning, and identifying where in the brain this takes place, remain important scientific challenges. But technological and computational advances in neuroscience and artificial intelligence now provide unprecedented opportunities to study the human brain in action as language is read and understood. Recent contextualized language models seem to be able to capture homonymic meaning variation (“bat”, in a baseball vs. a vampire context), as well as more nuanced differences of meaning—for example, polysemous words such as “book”, which can be interpreted in distinct but related senses (“explain a book”, information, vs. “open a book”, object) whose differences are fine‐grained. We study these subtle differences in lexical meaning along the concrete/abstract dimension, as they are triggered by verb‐noun semantic composition. We analyze functional magnetic resonance imaging (fMRI) activations elicited by Italian verb phrases containing nouns whose interpretation is affected by the verb to different degrees. By using a contextualized language model and human concreteness ratings, we shed light on where in the brain such fine‐grained meaning variation takes place and how it is coded. Our results show that phrase concreteness judgments and the contextualized model can predict BOLD activation associated with semantic composition within the language network. Importantly, representations derived from a complex, nonlinear composition process consistently outperform simpler composition approaches. This is compatible with a holistic view of semantic composition in the brain, where semantic representations are modified by the process of composition itself. When looking at individual brain areas, we find that encoding performance is statistically significant, although with differing patterns of results, suggesting differential involvement, in the posterior superior temporal sulcus, inferior frontal gyrus and anterior temporal lobe, and in motor areas previously associated with processing of concreteness/abstractness.
Article
Distributional models of lexical semantics are capable of acquiring sophisticated representations of word meanings. The main theoretical insight provided by these models is that they demonstrate the systematic connection between the knowledge that people acquire and the experience that they have with the natural language environment. However, linguistic experience is inherently variable and differs radically across people due to demographic and cultural variables. Recently, distributional models have been used to examine how word meanings vary across languages and it was found that there is considerable variability in the meanings of words across languages for most semantic categories. The goal of this article is to examine how variable word meanings are across individual language users within a single language. This was accomplished by assembling 500 individual user corpora attained from the online forum Reddit. Each user corpus ranged between 3.8 and 32.3 million words each, and a count‐based distributional framework was used to extract word meanings for each user. These representations were then used to estimate the semantic alignment of word meanings across individual language users. It was found that there are significant levels of relativity in word meanings across individuals, and these differences are partially explained by other psycholinguistic factors, such as concreteness, semantic diversity, and social aspects of language usage. These results point to word meanings being fundamentally relative and contextually fluid, with this relativeness being related to the individualized nature of linguistic experience.