Article

Probing the Representational Structure of Regular Polysemy via Sense Analogy Questions: Insights from Contextual Word Vectors

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Regular polysemes are sets of ambiguous words that all share the same relationship between their meanings, such as CHICKEN and LOBSTER both referring to an animal or its meat. To probe how a distributional semantic model, here exemplified by bidirectional encoder representations from transformers (BERT), represents regular polysemy, we analyzed whether its embeddings support answering sense analogy questions similar to “is the mapping between CHICKEN (as an animal) and CHICKEN (as a meat) similar to that which maps between LOBSTER (as an animal) to LOBSTER (as a meat)?” We did so using the LRcos model, which combines a logistic regression classifier of different categories (e.g., animal vs. meat) with a measure of cosine similarity. We found that (a) the model was sensitive to the shared structure within a given regular relationship; (b) the shared structure varies across different regular relationships (e.g., animal/meat vs. location/organization), potentially reflective of a “regularity continuum;” (c) some high‐order latent structure is shared across different regular relationships, suggestive of a similar latent structure across different types of relationships; and (d) there is a lack of evidence for the aforementioned effects being explained by meaning overlap. Lastly, we found that both components of the LRcos model made important contributions to accurate responding and that a variation of this method could yield an accuracy boost of 10% in answering sense analogy questions. These findings enrich previous theoretical work on regular polysemy with a computationally explicit theory and methods, and provide evidence for an important organizational principle for the mental lexicon and the broader conceptual knowledge system.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... BERT-based models have enhanced translation accuracy by better understanding the context of polysemous words. However, this approach remains limited as it relies solely on probabilistic distribution in selecting word meanings, without comprehending explicit semantic relationships between words in a sentence (Li & Armstrong, 2024). ...
Article
Full-text available
The persistent challenges of polysemy and ambiguity continue to hinder the semantic accuracy of Neural Machine Translation (NMT), particularly in language pairs with distinct syntactic structures. While transformer-based models such as BERT and GPT have achieved notable progress in capturing contextual word meanings, they still fall short in understanding explicit semantic roles. This study aims to address this limitation by integrating Semantic Role Labeling (SRL) into a Transformer-based NMT framework to enhance semantic comprehension and reduce translation errors. Using a parallel corpus of 100,000 English-Indonesian and English-Japanese sentence pairs, the proposed SRL-enhanced NMT model was trained and evaluated against a baseline Transformer NMT. The integration of SRL enabled the model to annotate semantic roles, such as agent, patient, and instrument, which were fused with encoder representations through semantic-aware attention mechanisms. Experimental results demonstrate that the SRL-integrated model significantly outperformed the standard NMT model, improving BLEU scores by 6.2 points (from 32.5 to 38.7), METEOR scores by 6.3 points (from 58.5 to 64.8), and reducing the TER by 5.8 points (from 45.1 to 39.3). These results were statistically validated using a paired t-test (p < 0.05). Furthermore, qualitative analyses confirmed SRL's effectiveness in resolving lexical ambiguities and syntactic uncertainties. Although SRL integration increased inference time by 12%, the performance trade-off was deemed acceptable for applications requiring higher semantic fidelity. The novelty of this research lies in the architectural fusion of SRL with transformer-based attention layers in NMT, a domain seldom explored in prior studies. Moreover, the model demonstrates robust performance across linguistically divergent language pairs, suggesting its broader applicability. This work contributes to the advancement of semantically aware translation systems and paves the way for future research in unsupervised SRL integration and multilingual scalability.
Article
To meet the challenge of incompleteness within Knowledge Graphs, Knowledge Graph Embedding(KGE) has emerged as the fundamental methodology for predicting the missing link(Link Prediction), by mapping entities and relations as low-dimensional vectors in continuous space. However, current KGE models often struggle with the polysemy issue, where entities exhibit different semantic characteristics depending on the relations in which they participate. Such limitation stems from weak interactions between entities and their relation contexts, leading to low expressiveness in modeling complex structures and resulting in inaccurate predictions. To address this, we propose ConQuatE ( Con textualized Quat ernion E mbedding), a model that enhances the representation learning of entities across multiple semantic dimensions by leveraging quaternion rotation to capture diverse relational contexts. In specific, ConQuatE incorporates contextual cues from various connected relations to enrich the original entity representations. Notably, this is achieved through efficient vector transformations in quaternion space, without any extra information required other than original triples. Experimental results demonstrate that our model outperforms state-of-the-art models for Link Prediction on four widely-recognized datasets: FB15k-237, WN18RR, FB15k and WN18.
Article
Polysemy has recently emerged as a popular topic in philosophy of language. While much existing research focuses on the relatedness among senses, this article introduces a novel perspective that emphasizes the continuity of sense individuation, sense regularity, and sense productivity. This new perspective has only recently gained traction, largely due to advancements in computational linguistics. It also poses a serious challenge to semantic minimalism, so I present three arguments against minimalism from the continuous perspective that touch on the minimal concept, the distinction from homonymy, and the quasi‐rule‐like nature of polysemy. Last, I provide an account of polysemy that incorporates this continuous perspective.
Article
Full-text available
Over the past two decades, numerous studies have demonstrated how less-predictable (i.e., higher surprisal) words take more time to read. In general, these studies have implicitly assumed the reading process is purely responsive: Readers observe a new word and allocate time to process it as required. We argue that prior results are also compatible with a reading process that is at least partially anticipatory: Readers could make predictions about a future word and allocate time to process it based on their expectation. In this work, we operationalize this anticipation as a word’s contextual entropy. We assess the effect of anticipation on reading by comparing how well surprisal and contextual entropy predict reading times on four naturalistic reading datasets: two self-paced and two eye-tracking. Experimentally, across datasets and analyses, we find substantial evidence for effects of contextual entropy over surprisal on a word’s reading time (RT): In fact, entropy is sometimes better than surprisal in predicting a word’s RT. Spillover effects, however, are generally not captured by entropy, but only by surprisal. Further, we hypothesize four cognitive mechanisms through which contextual entropy could impact RTs—three of which we are able to design experiments to analyze. Overall, our results support a view of reading that is not just responsive, but also anticipatory.1
Article
Full-text available
In a typical text, readers look much longer at some words than at others, even skipping many altogether. Historically, researchers explained this variation via low-level visual or oculomotor factors, but today it is primarily explained via factors determining a word’s lexical processing ease, such as how well word identity can be predicted from context or discerned from parafoveal preview. While the existence of these effects is well established in controlled experiments, the relative importance of prediction, preview and low-level factors in natural reading remains unclear. Here, we address this question in three large naturalistic reading corpora (n = 104, 1.5 million words), using deep neural networks and Bayesian ideal observers to model linguistic prediction and parafoveal preview from moment to moment in natural reading. Strikingly, neither prediction nor preview was important for explaining word skipping—the vast majority of explained variation was explained by a simple oculomotor model, using just fixation position and word length. For reading times, by contrast, we found strong but independent contributions of prediction and preview, with effect sizes matching those from controlled experiments. Together, these results challenge dominant models of eye movements in reading, and instead support alternative models that describe skipping (but not reading times) as largely autonomous from word identification, and mostly determined by low-level oculomotor information.
Conference Paper
Full-text available
Humans often make creative use of words to express novel senses. A long-standing effort in natural language processing has been focusing on word sense disambiguation (WSD), but little has been explored about how the sense inventory of a word may be extended toward novel meanings. We present a paradigm of word sense extension (WSE) that enables words to spawn new senses toward novel context. We develop a framework that simulates novel word sense extension by first partitioning a polysemous word type into two pseudo-tokens that mark its different senses, and then inferring whether the meaning of a pseudo-token can be extended to convey the sense denoted by the token partitioned from the same word type. Our framework combines cognitive models of chaining with a learning scheme that transforms a language model embedding space to support various types of word sense extension. We evaluate our framework against several competitive baselines and show that it is superior in predicting plausible novel senses for over 7,500 English words. Furthermore, we show that our WSE framework improves performance over a range of transformer-based WSD models in predicting rare word senses with few or zero mentions in the training data.
Article
Full-text available
Most words have multiple meanings, but there are foundationally distinct accounts for this. Categorical theories posit that humans maintain discrete entries for distinct word meanings, as in a dictionary. Continuous ones eschew discrete sense representations, arguing that word meanings are best characterized as trajectories through a continuous state space. Both kinds of approach face empirical challenges. In response, we introduce two novel “hybrid” theories, which reconcile discrete sense representations with a continuous view of word meaning. We then report on two behavioral experiments, pairing them with an analytical approach relying on neural language models to test these competing accounts. The experimental results are best explained by one of the novel hybrid accounts, which posits both distinct sense representations and a continuous meaning space. This hybrid account accommodates both the dynamic, context-dependent nature of word meaning, as well as the behavioral evidence for category-like structure in human lexical knowledge. We further develop and quantify the predictive power of several computational implementations of this hybrid account. These results raise questions for future research on lexical ambiguity, such as why and when discrete sense representations might emerge in the first place. They also connect to more general questions about the role of discrete versus gradient representations in cognitive processes and suggest that at least in this case, the best explanation is one that integrates both factors: Word meaning is both categorical and continuous.
Article
Full-text available
For any research program examining how ambiguous words are processed in broader linguistic contexts, a first step is to establish factors relating to the frequency balance or dominance of those words’ multiple meanings, as well as the similarity of those meanings to one other. Homonyms—words with divergent meanings—are one ambiguous word type commonly utilized in psycholinguistic research. In contrast, although polysemes—words with multiple related senses—are far more common in English, they have been less frequently used as tools for understanding one-to-many word-to-meaning mappings. The current paper details two norming studies of a relatively large number of ambiguous English words. In the first, offline dominance norming is detailed for 547 homonyms and polysemes via a free association task suitable for words across the ambiguity continuum, with a goal of identifying words with more equibiased meanings. The second norming assesses offline meaning similarity for a partial subset of 318 ambiguous words (including homonyms, unambiguous words, and polysemes divided into regular and irregular types) using a novel, continuous rating method reliant on the linguistic phenomenon of zeugma. In addition, we conduct computational analyses on the human similarity norming data using the BERT pretrained neural language model (Devlin et al., 2018, BERT: Pre-training of deep bidirectional transformers for language understanding. ArXiv Preprint. arXiv:1810.04805) to evaluate factors that may explain variance beyond that accounted for by dictionary-criteria ambiguity categories. Finally, we make available the summarized item dominance values and similarity ratings in resultant appendices (see supplementary material), as well as individual item and participant norming data, which can be accessed online (https://osf.io/g7fmv/).
Article
Full-text available
Significance Language is a quintessentially human ability. Research has long probed the functional architecture of language in the mind and brain using diverse neuroimaging, behavioral, and computational modeling approaches. However, adequate neurally-mechanistic accounts of how meaning might be extracted from language are sorely lacking. Here, we report a first step toward addressing this gap by connecting recent artificial neural networks from machine learning to human recordings during language processing. We find that the most powerful models predict neural and behavioral responses across different datasets up to noise levels. Models that perform better at predicting the next word in a sequence also better predict brain measurements—providing computationally explicit evidence that predictive processing fundamentally shapes the language comprehension mechanisms in the brain.
Article
Full-text available
Word embeddings are vectorial semantic representations built with either counting or predicting techniques aimed at capturing shades of meaning from word co-occurrences. Since their introduction, these representations have been criticized for lacking interpretable dimensions. This property of word embeddings limits our understanding of the semantic features they actually encode. Moreover, it contributes to the “black box” nature of the tasks in which they are used, since the reasons for word embedding performance often remain opaque to humans. In this contribution, we explore the semantic properties encoded in word embeddings by mapping them onto interpretable vectors, consisting of explicit and neurobiologically motivated semantic features (Binder et al. 2016). Our exploration takes into account different types of embeddings, including factorized count vectors and predict models (Skip-Gram, GloVe, etc.), as well as the most recent contextualized representations (i.e., ELMo and BERT). In our analysis, we first evaluate the quality of the mapping in a retrieval task, then we shed light on the semantic features that are better encoded in each embedding type. A large number of probing tasks is finally set to assess how the original and the mapped embeddings perform in discriminating semantic categories. For each probing task, we identify the most relevant semantic features and we show that there is a correlation between the embedding performance and how they encode those features. This study sets itself as a step forward in understanding which aspects of meaning are captured by vector spaces, by proposing a new and simple method to carve human-interpretable semantic representations from distributional vectors.
Article
Full-text available
Many words carry multiple distinct but related senses. For example, a producer’s name (e.g., Picasso) can be metonymically extended to label products (e.g., Picasso’s paintings) but rarely refer to other associated items (e.g., Picasso’s paintbrushes). We test whether item-based linguistic experience is necessary for children’s acquisition of semantic generalizations. In Experiment 1, we present 4- to 5-year-olds, 8- to 9-year-olds, and adults with scenarios involving novel and conventional artists’ names (e.g., Dax or Picasso) and a metonymic extension of the name that could refer to either artists’ products or tools. We find that the tendency to choose the product as the metonymic referent is present in 4- and 5-year-olds, increases with age, and is stronger for conventional, as opposed to novel, names. In follow-up experiments, we replicate 4- and 5-year-olds’ tendency to choose the product as the metonymic referent, when the metonyms are conventional artists’ names (e.g., Picasso), multi-syllabic novel names (e.g., Zazapa), familiar names (e.g., Smith), and conventional names that are rarely, if ever, used metonymically (e.g., Mandela). We also show that preschoolers do not possess explicit knowledge of conventional artists. Overall, these findings suggest that young children acquire producer-product metonymy without much, if any, prior experience with producers’ names. We discuss the implications of these findings for conceptual and usage-based accounts of the acquisition of semantic generalizations.
Chapter
Full-text available
Systematic (or regular) polysemy is characterized as the phenomenon in which a noun has several distinct but related meanings whereby the same relation holds between the meanings for a series of nouns. The latter is explained by the fact that systematically polysemous nouns are determined by the existence of a multitude of conceptual patterns applying to a certain extent crosslinguistically. It is customary to distinguish between metonymic (or metonymically motivated) polysemy, on the one hand, and inherent (or logical) polysemy, on the other. Metonymic polysemy is described as cases where one of the related senses is primary and the others are metonymically derived from it. By contrast, inherent polysemy involves senses where there are no substantial reasons for assuming that one or another of them is prior. Rather, they are so intimately interconnected with each other that they must be viewed as being part of a complex meaning.
Article
Full-text available
The goal of the present study was to investigate the interaction between different senses of polysemous nouns (metonymies and metaphors) and different meanings of homonyms using the method of event-related potentials (ERPs) and a priming paradigm. Participants read two-word phrases containing ambiguous words and made a sensicality judgment. Phrases with polysemes highlighted their literal sense and were preceded by primes with either the same or different – metonymic or metaphorical – sense. Similarly, phrases with homonyms were primed by phrases with a consistent or inconsistent meaning of the noun. The results demonstrated that polysemous phrases with literal senses preceded by metonymic primes did not differ in ERP responses from the control condition with the same literal primes. In contrast, processing phrases with the literal sense preceded by metaphorical primes resulted in N400 and P600 effects that might reflect a very limited priming effect. The priming effect observed between metonymic and literal senses supports the idea that these senses share a single representation in the mental lexicon. In contrast, the effects observed for polysemes with metaphorical primes characterize lexical access to the word’s target sense and competition between the two word senses. The processing of homonyms preceded by the prime with an inconsistent meaning, although it did not elicit an N400 effect, was accompanied by a P600 effect as compared to the control condition with a consistent meaning of the prime. We suppose that the absence of the N400 effect may result from inhibition of the target meaning by the inconsistent prime, whereas the P600 response might reflect processes of reanalysis, activation, and integration of the target meaning. Our results provide additional evidence for the difference in processing mechanisms between metonymies and metaphors that might have separate representations in the mental lexicon, although they are more related as compared to homonyms.
Article
Full-text available
Most words are ambiguous: Individual word forms (e.g., run) can map onto multiple different interpretations depending on their sentence context (e.g., the athlete/politician/river runs). Models of word-meaning access must therefore explain how listeners and readers can rapidly settle on a single, contextually appropriate meaning for each word that they encounter. I present a new account of word-meaning access that places semantic disambiguation at its core and integrates evidence from a wide variety of experimental approaches to explain this key aspect of language comprehension. The model has three key characteristics. (a) Lexical-semantic knowledge is viewed as a high-dimensional space; familiar word meanings correspond to stable states within this lexical-semantic space. (b) Multiple linguistic and paralinguistic cues can influence the settling process by which the system resolves on one of these familiar meanings. (c) Learning mechanisms play a vital role in facilitating rapid word-meaning access by shaping and maintaining high-quality lexical-semantic knowledge throughout the life span. In contrast to earlier models of word-meaning access, I highlight individual differences in lexical-semantic knowledge: Each person’s lexicon is uniquely structured by specific, idiosyncratic linguistic experiences.
Article
Full-text available
How is semantic information stored in the human mind and brain? Some philosophers and cognitive scientists argue for vectorial representations of concepts, where the meaning of a word is represented as its position in a high-dimensional neural state space. At the intersection of natural language processing and artificial intelligence, a class of very successful distributional word vector models has developed that can account for classic EEG findings of language, i.e., the ease vs. difficulty of integrating a word with its sentence context. However, models of semantics have to account not only for context-based word processing, but should also describe how word meaning is represented. Here, we investigate whether distributional vector representations of word meaning can model brain activity induced by words presented without context. Using EEG activity (event-related brain potentials) collected while participants in two experiments (English, German) read isolated words, we encode and decode word vectors taken from the family of prediction-based word2vec algorithms. We find that, first, the position of a word in vector space allows the prediction of the pattern of corresponding neural activity over time, in particular during a time window of 300 to 500 ms after word onset. Second, distributional models perform better than a human-created taxonomic baseline model (WordNet), and this holds for several distinct vector-based models. Third, multiple latent semantic dimensions of word meaning can be decoded from brain activity. Combined, these results suggest that empiricist, prediction-based vectorial representations of meaning are a viable candidate for the representational architecture of human semantic knowledge.
Article
Full-text available
It is well-known that children rapidly learn words, following a range of heuristics. What is less well appreciated is that—because most words are polysemous and have multiple meanings (e.g., “glass” can label a material and drinking vessel)—children will often be learning a new meaning for a known word, rather than an entirely new word. Across 4 experiments we show that children flexibly adapt a well-known heuristic—the shape bias—when learning polysemous words. Consistent with previous studies, we find that children and adults preferentially extend a new object label to other objects of the same shape. But we also find that when a new word for an object (“a gup”) has previously been used to label the material composing that object (“some gup”), children and adults override the shape bias, and are more likely to extend the object label by material (Experiments 1 and 3). Further, we find that, just as an older meaning of a polysemous word constrains interpretations of a new word meaning, encountering a new word meaning leads learners to update their interpretations of an older meaning (Experiment 2). Finally, we find that these effects only arise when learners can perceive that a word’s meanings are related, not when they are arbitrarily paired (Experiment 4). Together, these findings show that children can exploit cues from polysemy to infer how new word meanings should be extended, suggesting that polysemy may facilitate word learning and invite children to construe categories in new ways.
Article
Full-text available
The words of a language reflect the structure of the human mind, allowing us to transmit thoughts between individuals. However, language can represent only a subset of our rich and detailed cognitive architecture. Here, we ask what kinds of common knowledge (semantic memory) are captured by word meanings (lexical semantics). We examine a prominent computational model that represents words as vectors in a multidimensional space, such that proximity between word-vectors approximates semantic relatedness. Because related words appear in similar con-texts, such spaces - called "word embeddings" - can be learned from patterns of lexical co-occurrences in natural language. Despite their popularity, a fundamental concern about word embeddings is that they appear to be semantically "rigid": inter-word proximity captures only overall similarity, yet human judgments about object similarities are highly context-dependent and involve multiple, distinct semantic features. For example, dolphins and alligators appear similar in size, but differ in intelligence and aggressiveness. Could such context-dependent relationships be recovered from word embeddings? To address this issue, we introduce a powerful, domain-general solution: "semantic projection" of word-vectors onto lines that represent various object features, like size (the line extending from the word "small" to "big"), intelligence (from "dumb" to "smart"), or danger (from "safe" to "dangerous"). This method, which is intuitively analogous to placing objects "on a mental scale" between two extremes, recovers human judgments across a range of object categories and properties. We thus show that word embeddings inherit a wealth of common knowledge from word co-occurrence statistics and can be flexibly manipulated to express context-dependent meanings.
Article
Full-text available
One reason that word learning presents a challenge for children is because pairings between word forms and meanings are arbitrary conventions that children must learn via observation – e.g., the fact that “shovel” labels shovels. The present studies explore cases in which children might bypass observational learning and spontaneously infer new word meanings: By exploiting the fact that many words are flexible and systematically encode multiple, related meanings. For example, words like shovel and hammer are nouns for instruments, and verbs for activities involving those instruments. The present studies explored whether 3- to 5-year-old children possess semantic generalizations about lexical flexibility, and can use these generalizations to infer new word meanings: Upon learning that dax labels an activity involving an instrument, do children spontaneously infer that dax can also label the instrument itself? Across four studies, we show that at least by age four, children spontaneously generalize instrument-activity flexibility to new words. Together, our findings point to a powerful way in which children may build their vocabulary, by leveraging the fact that words are linked to multiple meanings in systematic ways.
Article
Full-text available
Recent developments in distributional semantics (Mikolov, Chen, Corrado, & Dean, 2013; Mikolov, Sutskever, Chen, Corrado, & Dean, 2013) include a new class of prediction-based models that are trained on a text corpus and that measure semantic similarity between words. We discuss the relevance of these models for psycholinguistic theories and compare them to more traditional distributional semantic models. We compare the models’ performances on a large dataset of semantic priming (Hutchison et al., 2013) and on a number of other tasks involving semantic processing and conclude that the prediction-based models usually offer a better fit to behavioral data. Theoretically, we argue that these models bridge the gap between traditional approaches to distributional semantics and psychologically plausible learning principles. As an aid to researchers, we release semantic vectors for English and Dutch for a range of models together with a convenient interface that can be used to extract a great number of semantic similarity measures.
Article
Full-text available
How do children resolve the problem of indeterminacy when learning a new word? By one account, children adopt a taxonomic assumption and expect the word to denote only members of a particular taxonomic category. According to one version of this constraint, young children should represent polysemous words that label multiple kinds—for example, chicken, which labels an animal and its meat—as separate and unrelated words that each encode a single kind. Our studies provide evidence against this account: we show that four- and five-year-old children spontaneously expect that a word that has labeled one meaning of a familiar polysemous word will also label its other, taxonomically different meaning. Further, we show that children's taxonomic flexibility is importantly constrained—children do not expect a word to label thematically-related meanings (e.g., chicken and egg), or the unrelated meanings of homophones (e.g., bat[animal] and bat[baseball]). We argue that although children are initially guided by the taxonomic constraint when pairing word forms with meanings, they nonetheless relate the taxonomically-different meanings of polysemous words within lexical structure. Thus, for even young children, a single word can label multiple kinds.
Article
Full-text available
Semantic ambiguity resolution is an essential and frequent part of speech comprehension because many words map onto multiple meanings (e.g., “bark,” “bank”). Neuroimaging research highlights the importance of the left inferior frontal gyrus (LIFG) and the left posterior temporal cortex in this process but the roles they serve in ambiguity resolution are uncertain. One possibility is that both regions are engaged in the processes of semantic reinterpretation that follows incorrect interpretation of an ambiguous word. Here we used fMRI to investigate this hypothesis. 20 native British English monolinguals were scanned whilst listening to sentences that contained an ambiguous word. To induce semantic reinterpretation, the disambiguating information was presented after the ambiguous word and delayed until the end of the sentence (e.g., “the teacher explained that the BARK was going to be very damp”). These sentences were compared to well-matched unambiguous sentences. Supporting the reinterpretation hypothesis, these ambiguous sentences produced more activation in both the LIFG and the left posterior inferior temporal cortex. Importantly, all but one subject showed ambiguity-related peaks within both regions, demonstrating that the group-level results were driven by high inter-subject consistency. Further support came from the finding that activation in both regions was modulated by meaning dominance. Specifically, sentences containing biased ambiguous words, which have one more dominant meaning, produced greater activation than those with balanced ambiguous words, which have two equally frequent meanings. Because the context always supported the less frequent meaning, the biased words require reinterpretation more often than balanced words. This is the first evidence of dominance effects in the spoken modality and provides strong support that frontal and temporal regions support the updating of semantic representations during speech comprehension.
Article
Full-text available
In one form or another, the phenomena associated with "meaning transfer" have become central issues in a lot of recent work on semantics. Speaking very roughly, we can partition approaches to the phenomenon along two dimensions, which yield four basic points of departure. In the first two, people have considered transfer in basically semantic or linguistic terms. Some have concentrated on what we might call the paradigmatic aspects of transfer, focusing on the productive lexical processes that map semantic features into features --- for example, the "grinding" rule that applies to turn the names of animals into mass terms denoting their meat or fur. This the approach that's involved in most recent work on "regular polysemy," "systematic polysemy," and the like, for example by Apresjan, Ostler and Atkins, Briscoe and Copestake, Nunberg and Zaenen, Wilensky, Kilgarriff and a number of other people. Other people have emphasized the syncategorematic aspects of transfer; that is, the ways meaning shifts and specifications are coerced in the course of semantic composition. This is an approach that hass been developed in particular by James Pustejovsky and his collaborators, building on earlier work on type shifting.
Article
Full-text available
Semantic ambiguity is typically measured by summing the number of senses or dictionary definitions that a word has. Such measures are somewhat subjective and may not adequately capture the full extent of variation in word meaning, particularly for polysemous words that can be used in many different ways, with subtle shifts in meaning. Here, we describe an alternative, computationally derived measure of ambiguity based on the proposal that the meanings of words vary continuously as a function of their contexts. On this view, words that appear in a wide range of contexts on diverse topics are more variable in meaning than those that appear in a restricted set of similar contexts. To quantify this variation, we performed latent semantic analysis on a large text corpus to estimate the semantic similarities of different linguistic contexts. From these estimates, we calculated the degree to which the different contexts associated with a given word vary in their meanings. We term this quantity a word's semantic diversity (SemD). We suggest that this approach provides an objective way of quantifying the subtle, context-dependent variations in word meaning that are often present in language. We demonstrate that SemD is correlated with other measures of ambiguity and contextual variability, as well as with frequency and imageability. We also show that SemD is a strong predictor of performance in semantic judgments in healthy individuals and in patients with semantic deficits, accounting for unique variance beyond that of other predictors. SemD values for over 30,000 English words are provided as supplementary materials.
Article
Full-text available
Do people with different kinds of bodies think differently? According to the body-specificity hypothesis (Casasanto, 2009), they should. In this article, I review evidence that right- and left-handers, who perform actions in systematically different ways, use correspondingly different areas of the brain for imagining actions and representing the meanings of action verbs. Beyond concrete actions, the way people use their hands also influences the way they represent abstract ideas with positive and negative emotional valence like “goodness,” “honesty,” and “intelligence” and how they communicate about these ideas in spontaneous speech and gesture. Changing how people use their right and left hands can cause them to think differently, suggesting that motoric differences between right- and left-handers are not merely correlated with cognitive differences. Body- specific patterns of motor experience shape the way we think, feel, communicate, and make decisions.
Article
Full-text available
Article
A defining property of human language is the creative use of words to express multiple meanings through word meaning extension. Such lexical creativity is manifested at different timescales, ranging from language development in children to the evolution of word meanings over history. We explored whether different manifestations of lexical creativity build on a common foundation. Using computational models, we show that a parsimonious set of semantic knowledge types characterize developmental data as well as evolutionary products of meaning extension spanning over 1400 languages. Models for evolutionary data account very well for developmental data, and vice versa. These findings suggest a unified foundation for human lexical creativity underlying both the fleeting products of individual ontogeny and the evolutionary products of phylogeny across languages.
Article
Most words in natural languages are polysemous; that is, they have related but different meanings in different contexts. This one‐to‐many mapping of form to meaning presents a challenge to understanding how word meanings are learned, represented, and processed. Previous work has focused on solutions in which multiple static semantic representations are linked to a single word form, which fails to capture important generalizations about how polysemous words are used; in particular, the graded nature of polysemous senses, and the flexibility and regularity of polysemy use. We provide a novel view of how polysemous words are represented and processed, focusing on how meaning is modulated by context. Our theory is implemented within a recurrent neural network that learns distributional information through exposure to a large and representative corpus of English. Clusters of meaning emerge from how the model processes individual word forms. In keeping with distributional theories of semantics, we suggest word meanings are generalized from contexts of different word tokens, with polysemy emerging as multiple clusters of contextually modulated meanings. We validate our results against a human‐annotated corpus of polysemy focusing on the gradedness, flexibility, and regularity of polysemous sense individuation, as well as behavioral findings of offline sense relatedness ratings and online sentence processing. The results provide novel insights into how polysemy emerges from contextual processing of word meaning from both a theoretical and computational point of view.
Article
Transformer-based models have pushed state of the art in many areas of NLP, but our understanding of what is behind their success is still limited. This paper is the first survey of over 150 studies of the popular BERT model. We review the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue, and approaches to compression. We then outline directions for future research.
Article
Most words are ambiguous, with interpretation dependent on context. Advancing theories of ambiguity resolution is important for any general theory of language processing, and for resolving inconsistencies in observed ambiguity effects across experimental tasks. Focusing on homonyms (words such as bank with unrelated meanings EDGE OF A RIVER vs. FINANCIAL INSTITUTION), the present work advances theories and methods for estimating the relative frequency of their meanings, a factor that shapes observed ambiguity effects. We develop a new method for estimating meaning frequency based on the meaning of a homonym evoked in lines of movie and television subtitles according to human raters. We also replicate and extend a measure of meaning frequency derived from the classification of free associates. We evaluate the internal consistency of these measures, compare them to published estimates based on explicit ratings of each meaning’s frequency, and compare each set of norms in predicting performance in lexical and semantic decision mega-studies. All measures have high internal consistency and show agreement, but each is also associated with unique variance, which may be explained by integrating cognitive theories of memory with the demands of different experimental methodologies. To derive frequency estimates, we collected manual classifications of 533 homonyms over 50,000 lines of subtitles, and of 357 homonyms across over 5000 homonym–associate pairs. This database—publicly available at: www.blairarmstrong.net/homonymnorms/—constitutes a novel resource for computational cognitive modeling and computational linguistics, and we offer suggestions around good practices for its use in training and testing models on labeled data.
Book
The first formally elaborated theory of a generative approach to word meaning, The Generative Lexicon lays the foundation for an implemented computational treatment of word meaning that connects explicitly to a compositional semantics. The Generative Lexicon presents a novel and exciting theory of lexical semantics that addresses the problem of the "multiplicity of word meaning"; that is, how we are able to give an infinite number of senses to words with finite means. The first formally elaborated theory of a generative approach to word meaning, it lays the foundation for an implemented computational treatment of word meaning that connects explicitly to a compositional semantics. In contrast to the static view of word meaning (where each word is characterized by a predetermined number of word senses) that imposes a tremendous bottleneck on the performance capability of any natural language processing system, Pustejovsky proposes that the lexicon becomes an active—and central—component in the linguistic description. The essence of his theory is that the lexicon functions generatively, first by providing a rich and expressive vocabulary for characterizing lexical information; then, by developing a framework for manipulating fine-grained distinctions in word descriptions; and finally, by formalizing a set of mechanisms for specialized composition of aspects of such descriptions of words, as they occur in context, extended and novel senses are generated. The subjects covered include semantics of nominals (figure/ground nominals, relational nominals, and other event nominals); the semantics of causation (in particular, how causation is lexicalized in language, including causative/unaccusatives, aspectual predicates, experiencer predicates, and modal causatives); how semantic types constrain syntactic expression (such as the behavior of type shifting and type coercion operations); a formal treatment of event semantics with subevents); and a general treatment of the problem of polysemy. Language, Speech, and Communication series Bradford Books imprint
Article
One way that languages are able to communicate a potentially infinite set of ideas through a finite lexicon is by compressing emerging meanings into words, such that over time, individual words come to express multiple, related senses of meaning. We propose that overarching communicative and cognitive pressures have created systematic directionality in how new metaphorical senses have developed from existing word senses over the history of English. Given a large set of pairs of semantic domains, we used computational models to test which domains have been more commonly the starting points (source domains) and which the ending points (target domains) of metaphorical mappings over the past millennium. We found that a compact set of variables, including externality, embodiment, and valence, explain directionality in the majority of about 5,000 metaphorical mappings recorded over the past 1100 years. These results provide the first large-scale historical evidence that metaphorical mapping is systematic, and driven by measurable communicative and cognitive principles.
Chapter
This publication is the opening number of a series which the Psychometric Society proposes to issue. It reports the first large experimental inquiry, carried out by the methods of factor analysis described by Thurstone in The Vectors of the Mind 1. The work was made possible by financial grants from the Social Science Research Committee of the University of Chicago, the American Council of Education, and the Carnegie Corporation of New York. The results are eminently worthy of the assistance so generously accorded. Thurstone’s previous theoretical account, lucid and comprehensive as it is, is intelligible only to those who have a knowledge of matrix algebra. Hence his methods have become known to British educationists chiefly from the monograph published by W. P. Alexander8. This enquiry has provoked a good deal of criticism, particularly from Professor Spearman’s school ; and differs, as a matter of fact, from Thurstone’s later expositions. Hence it is of the greatest value to have a full and simple illustration of his methods, based on a concrete inquiry, from Professor Thurstone himself.
Conference Paper
The offset method for solving word analogies has become a standard evaluation tool for vector-space semantic models: it is considered desirable for a space to represent semantic relations as consistent vector offsets. We show that the method's reliance on cosine similarity conflates offset consistency with largely irrelevant neighborhood structure, and propose simple baselines that should be used to improve the utility of the method in vector space evaluation.
Article
A core challenge in the semantic ambiguity literature is understanding why the number and relatedness among a word's interpretations are associated with different effects in different tasks. An influential account (Hino, Pexman, & Lupker [2006. Ambiguity and relatedness effects in semantic tasks: Are they due to semantic coding? Journal of Memory and Language 55 (2), 247–273]) attributes these effects to qualitative differences in the response system. We propose instead that these effects reflect changes over time in settling dynamics within semantics. We evaluated the accounts using a single task, lexical decision, thus holding the overall configuration of the response system constant, and manipulated task difficulty – and the presumed amount of semantic processing – by varying nonword wordlikeness and stimulus contrast. We observed that as latencies increased, the effects generally (but not universally) shifted from those observed in standard lexical decision to those typically observed in different tasks with longer latencies. These results highlight the importance of settling dynamics in explaining many ambiguity effects, and of integrating theories of semantic dynamics and response systems.
Article
Words often have multiple distinct but related senses, a phenomenon called polysemy. For instance, in English, words like chicken and lamb can label animals and their meats while words like glass and tin can label materials and artifacts derived from those materials. In this paper, we ask why words have some senses but not others, and thus what constrains the structure of polysemy. Previous work has pointed to two different sources of constraints. First, polysemy could reflect conceptual structure: word senses could be derived based on how ideas are associated in the mind. Second, polysemy could reflect a set of arbitrary, language-specific conventions: word senses could be difficult to derive and might have to be memorized and stored. We used a large-scale cross-linguistic survey to elucidate the relative contributions of concepts and conventions to the structure of polysemy. We explored whether 27 distinct patterns of polysemy found in English are also present in 14 other languages. Consistent with the idea that polysemy is constrained by conceptual structure, we found that almost all surveyed patterns of polysemy (e.g., animal for meat, material for artifact) were present across languages. However, consistent with the idea that polysemy reflects language-specific conventions, we also found variation across languages in how patterns are instantiated in specific senses (e.g., the word for glass material is used to label different glass artifacts across languages). We argue that these results are best explained by a “conventions-constrained-by-concepts” model, in which the different senses of words are learned conventions, but conceptual structure makes some types of relations between senses easier to grasp than others, such that the same patterns of polysemy evolve across languages. This opens a new view of lexical structure, in which polysemy is a linguistic adaptation that makes it easier for children to learn word meanings and build a lexicon.
Conference Paper
We present the result of an annotation task on regular polysemy for a series of semantic classes or dot types in English, Danish and Spanish. This article describes the annotation process, the results in terms of inter-encoder agreement, and the sense distributions obtained with two methods: majority voting with a theory-compliant backoff strategy, and MACE, an unsuper-vised system to choose the most likely sense from all the annotations.
Article
Theoretical linguistic accounts of lexical ambiguity distinguish between homonymy, where words that share a lexical form have unrelated meanings, and polysemy, where the meanings are related. The present study explored the psychological reality of this theoretical assumption by asking whether there is evidence that homonyms and polysemes are represented and processed differently in the brain. We investigated the time-course of meaning activation of different types of ambiguous words using EEG. Homonyms and polysemes were each further subdivided into two: unbalanced homonyms (e.g., “coach”) and balanced homonyms (e.g., “match”); metaphorical polysemes (e.g., “mouth”) and metonymic polysemes (e.g., “rabbit”). These four types of ambiguous words were presented as primes in a visual single-word priming delayed lexical decision task employing a long ISI (750 ms). Targets were related to one of the meanings of the primes, or were unrelated. ERPs formed relative to the target onset indicated that the theoretical distinction between homonymy and polysemy was reflected in the N400 brain response. For targets following homonymous primes (both unbalanced and balanced), no effects survived at this long ISI indicating that both meanings of the prime had already decayed. On the other hand, for polysemous primes (both metaphorical and metonymic), activation was observed for both dominant and subordinate senses. The observed processing differences between homonymy and polysemy provide evidence in support of differential neuro-cognitive representations for the two types of ambiguity. We argue that the polysemous senses act collaboratively to strengthen the representation, facilitating maintenance, while the competitive nature of homonymous meanings leads to decay.
Article
There are competing views on the on-line processing of polysemous words such as book, which have distinct but semantically related senses (as in bound book vs. scary book). According to a Sense-Enumeration Lexicon (SEL) view, different senses are represented separately, just as the different meanings of a homonym (e.g. bank). According to an underspecification view, initial processing does not distinguish between the different senses. According to a Relevance Theory (RT)-inspired view, the context will immediately guide interpretation to a specific sense. In Experiment 1, participants indicated whether an adjective–noun construction made sense or not. Switching from one sense to another was costly, but there was no effect of sense frequency (contra SEL). In Experiment 2, eye movements were recorded when participants read sentences in which a polyseme was disambiguated to a specific sense following a neutral context, a sense was repeated, or a sense was switched. The results showed no effect of sense dominance in the neutral condition, no advantage when a sense was repeated, and a cost when switched, especially when switching from a concrete to an abstract interpretation. These data cannot be fitted in an SEL or RT-inspired account, questioning the validity of both as a processing account.
Article
Prior research suggests that the language processor initially activates an underspecified representation of a metonym consistent with all its senses, potentially selecting a specific sense if supported by contextual and lexical information. We explored whether a structural heuristic, the Subject as Agent Principle, which provisionally assigns an agent theta role to canonical subjects, would prompt immediate sense selection. In Experiment 1, we found initial evidence that this principle is active during offline and online processing of metonymic names like Kafka. Reading time results from Experiments 2 and 3 demonstrated that previous context biasing towards the metonymic sense of the name reduced, but did not remove, the agent preference, consistent with Frazier’s (1999) proposal that the processor may avoid selecting a specific sense, unless grammatically required.
Article
This book contains 15 papers by the influential American philosopher, David Lewis. All previously published (between 1966 and 80), these papers are divided into three groups: ontology, the philosophy of mind, and the philosophy of language. Lewis supplements eight of the fifteen papers with postscripts in which he amends claims, answers objections, and introduces later reflections. Topics discussed include possible worlds, counterpart theory, modality, personal identity, radical interpretation, language, propositional attitudes, the mind, and intensional semantics. Among the positions Lewis defends are modal realism, materialism, socially contextualized formal semantics, and functionalism of the mind. The volume begins with an introduction in which Lewis discusses his philosophical method.
Article
Article
When does the human language processor take on semantic commitments? This question has two parts: (1) for what class of semantic decisions will the processor make a decision rather than leaving the developing interpretation vague or unspecified?; and (2) at what point during sentence analysis will a decision be made if insufficient information is available to guarantee an accurate decision? Neither question has been answered (or even posed in a fully explicit general form) in the psycholinguistic literature. We recorded readers' eye movements in a study designed to explore these questions. The data indicated that delaying the presentation of disambiguating information until after the occurrence of an ambiguous target lengthened fixation times for words with multiple meanings (the concrete vs. abstract meaning of ball, ring), but not for words with multiple senses (the concrete vs. abstract sense of library, poem). This finding is taken as initial support for the view that semantic commitments are minimized, occurring only when mutually incompatible choices are presented by the grammar or when forced by the need to maintain consistency between the interpretation of the current phrase and any already processed contextual material. However, decisions forced by either of these circumstances occur immediately with associated effects appearing in the eye movement record long before the end of the sentence.
Article
Past lexical decision studies investigating the number of meanings (NOM) effect have produced mixed results. A second variable, the relatedness among a word's meanings, has not been widely studied. In Experiment 1, Relatedness (High or Low), NOM (Many or Few), and nonword condition (legal nonwords or pseudohomophones) were manipulated in lexical decision. No significant effects of NOM or Relatedness were observed in the legal nonword condition. However, in the pseudohomophone condition, Relatedness and NOM both produced significant main effects, and an interaction. Words with few, unrelated meanings produced the slowest response times (RTs); all other words produced statistically equivalent RTs. Results of the pseudohomophone condition of Experiment 1 were replicated in Experiment 2, except the main effect of NOM was not significant. The overall unreliability of NOM effects in these (and previous) experiments lead us to question the contribution of NOM to the observed interaction. NOM metrics are often confounded with relatedness; words with many meanings tend to have highly related meanings. The results show that relatedness among meanings can influence lexical decision performance; the challenge is now to explore alternative measures, other than simple enumeration, to adequately describe word meanings.
Article
Although regular polysemy [e.g. producer for product (John read Dickens) or container for contents (John drank the bottle)] has been extensively studied, there has been little work on why certain polysemy patterns are more acceptable than others. We take an empirical approach to the question, in particular evaluating an account based on rules against a gradient account of polysemy that is based on various radical pragmatic theories (Fauconnier 1985; Nunberg 1995). Under the gradient approach, possible senses become more acceptable as they become more closely related to a word’s default meaning, and the apparent regularity of polysemy is an artefact of having many similarly structured concepts. Using methods for measuring conceptual structure drawn from cognitive psychology, Study 1 demonstrates that a variety of metrics along which possible senses can be related to a default meaning, including conceptual centrality, cue validity and similarity, are surprisingly poor predictors of whether shifts to those senses are acceptable. Instead, sense acceptability was better explained by rule-based approaches to polysemy (e.g. Copestake & Briscoe 1995). Study 2 replicated this finding using novel word meanings in which the relatedness of possible senses was varied. However, while individual word senses were better predicted by polysemy rules than conceptual metrics, our data suggested that rules (like producer for product) had themselves arisen to mark senses that, aggregated over many similar words, were particularly closely related.