Afra Alishahi's research while affiliated with Tilburg University and other places

Publications (59)

Preprint
Full-text available
Attempts to computationally simulate the acquisition of spoken language via grounding in perception have a long tradition but have gained momentum in the past few years. Current neural approaches exploit associations between the spoken and visual modality and learn to represent speech and visual data in a joint vector space. A major unresolved issu...
Preprint
Full-text available
The distributed and continuous representations used by neural networks are at odds with representations employed in linguistics, which are typically symbolic. Vector quantization has been proposed as a way to induce discrete neural representations that are closer in nature to their linguistic counterparts. However, it is not clear which metrics are...
Conference Paper
Full-text available
Cognitive models of child word learning in general, and cross-situational models in particular, characterize the learner as a passive observer. But children are curious and actively participate in verbal and non-verbal communication, and often introduce new topics which parents are likely to follow up (Bloom et al., 1996). We investigate the potent...
Conference Paper
Full-text available
Models of cross-situational word learning typically characterize the learner as a passive observer, but a language learning child can actively participate in verbal and non-verbal communication. We present a computational study of cross-situational word learning to investigate whether a curious word learner who actively influences linguistic input...
Preprint
Full-text available
Speech directed to children differs from adult-directed speech in linguistic aspects such as repetition, word choice, and sentence length, as well as in aspects of the speech signal itself, such as prosodic and phonemic variation. Human language acquisition research indicates that child-directed speech helps language learners. This study explores t...
Preprint
Full-text available
Given the fast development of analysis techniques for NLP and speech processing systems, few systematic studies have been conducted to compare the strengths and weaknesses of each method. As a step in this direction we study the case of representations of phonology in neural network models of spoken language. We use two commonly applied analytical...
Preprint
Recent work has highlighted the advantage of jointly learning grounded sentence representations from multiple languages. However, the data used in these studies has been limited to an aligned scenario: the same images annotated with sentences in multiple languages. We focus on the more realistic disjoint scenario in which there is no overlap betwee...
Article
The Empirical Methods in Natural Language Processing (EMNLP) 2018 workshop BlackboxNLP was dedicated to resources and techniques specifically developed for analyzing and understanding the inner-workings and representations acquired by neural models of language. Approaches included: systematic manipulation of input to neural networks and investigati...
Preprint
Analysis methods which enable us to better understand the representations and functioning of neural models of language are increasingly needed as deep learning becomes the dominant approach in NLP. Here we present two methods based on Representational Similarity Analysis (RSA) and Tree Kernels (TK) which allow us to directly quantify how strongly t...
Preprint
The EMNLP 2018 workshop BlackboxNLP was dedicated to resources and techniques specifically developed for analyzing and understanding the inner-workings and representations acquired by neural models of language. Approaches included: systematic manipulation of input to neural networks and investigating the impact on their performance, testing whether...
Preprint
Recent work has shown how to learn better visual-semantic embeddings by leveraging image descriptions in more than one language. Here, we investigate in detail which conditions affect the performance of this type of grounded language learning model. We show that multilingual training improves over bilingual training, and that low-resource languages...
Preprint
Full-text available
Hierarchical Multiscale LSTM (Chung et al., 2016a) is a state-of-the-art language model that learns interpretable structure from character-level input. Such models can provide fertile ground for (cognitive) computational linguistics studies. However, the high complexity of the architecture, training procedure and implementations might hinder its ap...
Article
The bulk of research in the area of speech processing concerns itself with supervised approaches to transcribing spoken language into text. In the domain of unsupervised learning most work on speech has focused on discovering relatively low level constructs such as phoneme inventories or word-like units. This is in contrast to research on written l...
Article
Full-text available
We study the representation and encoding of phonemes in a recurrent neural network model of grounded speech. We use a model which processes images and their spoken descriptions, and projects the visual and auditory representations into the same semantic space. We perform a number of analyses on how information about individual phonemes is encoded i...
Article
Cross-linguistic influence (CLI) is one of the key phenomena in bilingual and second language learning. We propose a method for quantifying CLI in the use of linguistic constructions with the help of a computational model, which acquires constructions in two languages from bilingual input. We focus on the acquisition of case-marking cues in Russian...
Article
We present a visually grounded model of speech perception which projects spoken utterances and images to a joint semantic space. We use a multi-layer recurrent highway network to model the temporal nature of spoken speech, and show that it learns to extract both form and meaning-based linguistic knowledge from the input signal. We carry out an in-d...
Article
Full-text available
This article looks into the nature of cognitive associations between verbs and argument structure constructions (ASCs). Existing research has shown that distributional and semantic factors affect speakers' choice of verbs in ASCs. A formal account of this theory has been proposed by Ellis, N. C., O'Donnell, M. B., & Römer, U. [(2014a). The processi...
Article
Full-text available
We present novel methods for analysing the activation patterns of RNNs and identifying the types of linguistic structure they learn. As a case study, we use a multi-task gated recurrent network model consisting of two parallel pathways with shared word embeddings trained on predicting the representations of the visual scene corresponding to an inpu...
Article
We study how the learning of argument structure constructions in a second language (L2) is affected by two basic input properties often discussed in literature – the amount of input and the time of L2 onset. To isolate the impact of the two factors on learning, we use a computational model that simulates bilingual construction learning. In the firs...
Article
We propose Imaginet, a model of learning visually grounded representations of language from coupled textual and visual input. The model consists of two Gated Recurrent Unit networks with shared word embeddings, and uses a multi-task objective by receiving a textual description of a scene and trying to concurrently predict its visual representation...
Conference Paper
Full-text available
Learning argument structure constructions is believed to depend on input properties. In particular, in a cued production task, verb production within each construction has been shown to depend on three input factors: frequency of a verb in a construction , contingency of verb–construction mapping, and verb semantic prototypicality. Earlier studies...
Conference Paper
Full-text available
The study of second language acquisition (SLA) is often hindered by substantial variability in the background of learners, their learning process and the input they receive. This diversity often makes it difficult to isolate specific learning factors and study their impact on L2 development. We present a computational study of SLA as an alternative...
Chapter
We present a cognitive model of inducing verb selectional preferences from individual verb usages. The selectional preferences for each verb argument are represented as a probability distribution over the set of semantic properties that the argument can possess—a semantic profile. The semantic profiles yield verb-specific conceptualizations of the...
Conference Paper
Full-text available
There are few computational models of second language acquisition (SLA). At the same time, many questions in the field of SLA remain unanswered. In particular, SLA patterns are difficult to study due to the large amount of variation between human learners. We present a computational model of second language construction learning that allows manipul...
Conference Paper
Full-text available
The input to a cognitively plausible model of language acqui-sition must have the same information components and statis-tical properties as the child-directed speech. There are collec-tions of child-directed utterances (e.g., CHILDES), but a real-istic representation of their visual and semantic context is not available. We propose three quantitat...
Book
The nature and amount of information needed for learning a natural language, and the underlying mechanisms involved in this process, are the subject of much debate: how is the knowledge of language represented in the human brain? Is it possible to learn a language from usage data only, or is some sort of innate knowledge and/or bias needed to boost...
Book
Questions related to language acquisition have been of interest for many centuries, as children seem to acquire a sophisticated capacity for processing language with apparent ease, in the face of ambiguity, noise and uncertainty. However, with recent advances in technology and cognitive-related research it is now possible to conduct large-scale com...
Article
Learning the meaning of words from ambiguous and noisy context is a challenging task for language learners. It has been suggested that children draw on syntactic cues such as lexical categories of words to constrain potential referents of words in a complex scene. Although the acquisition of lexical categories should be interleaved with learning wo...
Article
Full-text available
When looking for the referents of novel nouns, adults and young children are sensitive to cross-situational statistics (Yu and Smith, 2007; Smith and Yu, 2008). In addition, the linguistic context that a word appears in has been shown to act as a powerful attention mechanism for guiding sentence processing and word learning (Landau and Gleitman, 19...
Article
Semantic roles are a critical aspect of linguistic knowledge because they indicate the relations of the participants in an event to the main predicate. Experimental studies on children and adults show that both groups use associations between general semantic roles such as Agent and Theme, and grammatical positions such as Subject and Object, even...
Conference Paper
Full-text available
The syntactic bootstrapping hypothesis suggests that children's verb learning is guided by the structural cues that the linguistic context provides. However, the onset of syntactic bootstrapping in word learning is not well studied. To investigate the impact of linguistic infor-mation on word learning during early stages of language acquisition, we...
Conference Paper
We present a novel method for FrameNet-based semantic role labeling (SRL), focusing on limitations posed by the limited coverage of available annotated data. Our SRL model is based on Bayesian clustering and has the advantage of being very robust in the face of unseen and incomplete data. Frame labeling and role labeling are modeled in like fashion...
Article
Words are the essence of communication: They are the building blocks of any language. Learning the meaning of words is thus one of the most important aspects of language acquisition: Children must first learn words before they can combine them into complex utterances. Many theories have been developed to explain the impressive efficiency of young c...
Conference Paper
Children learn a robust representation of lexical categories at a young age. We propose an incremental model of this process which efficiently groups words into lexical categories based on their local context using an information-theoretic criterion. We train our model on a corpus of child-directed speech from CHILDES and show that the model learns...
Article
Semantic roles are a critical aspect of linguistic knowledge because they indicate the relations of the participants in an event to the main predicate. Experimental studies on children and adults show that both groups use associations between general semantic roles such as Agent and Theme, and grammatical positions such as Subject and Object, even...
Book
The nature and amount of information needed for learning a natural language, and the underlying mechanisms involved in this process, are the subject of much debate: is it possible to learn a language from usage data only, or some sort of innate knowledge and/or bias is needed to boost the process? This is a topic of interest to (psycho)linguists wh...
Article
Full-text available
Higher frequency has been shown to have a positive effect on the acquisition of words and other linguistic items in children. An important question that needs to be answered then is how children learn low frequency items. In this study, we inves-tigate the acquisition of meanings for low frequency words through computational modeling. We suggest th...
Article
Full-text available
We present a probabilistic incremental model of early word learning. The model acquires the meaning of words from ex-posure to word usages in sentences, paired with appropriate semantic representations, in the presence of referential uncer-tainty. A distinct property of our model is that it continually re-vises its learned knowledge of a word's mea...
Article
How children go about learning the general regularities that govern language, as well as keeping track of the exceptions to them, remains one of the challenging open questions in the cognitive science of language. Computational modeling is an important methodology in research aimed at addressing this issue. We must determine appropriate learning me...
Article
Full-text available
Children can determine the meaning of a new word from hearing it used in a familiar context—an ability often referred to as fast mapping. In this paper, we study fast map-ping in the context of a general probabilistic model of word learning. We use our model to simulate fast mapping experiments on chil-dren, such as referent selection and retention...
Conference Paper
Full-text available
Children can determine the meaning of a new word from hearing it used in a familiar context---an ability often referred to as fast mapping. In this paper, we study fast mapping in the context of a general probabilistic model of word learning. We use our model to simulate fast mapping experiments on children, such as referent selection and retention...
Article
We present a cognitive model of inducing verb selectional preferences from individ- ual verb usages. The selectional preferences for each verb argument are represented as a probability distribution over the set of semantic properties that the argument can possess—a semantic profile. The seman- tic profiles yield verb-specific conceptual- izations o...
Article
We present a Bayesian model of early verb learning that acquires a general conception of the semantic roles of predicates based only on exposure to individual verb usages. The model forms probabilistic associations be-tween the semantic properties of arguments, their syn-tactic positions, and the semantic primitives of verbs. Because of the model's...
Article
Developing computational algorithms that capture the complex structure of natural language is an open problem. In particular, learning the abstract properties of language only from usage data remains a challenge. In this dissertation, we present a probabilistic usage-based model of verb argument structure acquisition that can successfully learn abs...
Article
We present a Bayesian model for the representation, acquisition and use of argument structure constructions, which is founded on a novel view of constructions as a mapping of a syntactic form to a probability distribution over semantic features. Our computational experiments demonstrate the feasibility of learning general constructions from individ...
Article
Full-text available
It has been suggested that children learn the meanings of words by observing the regularities across different situations in which a word is used. However, experi-mental studies show that children are also sensitive to the syntactic properties of words and their context at a young age, and can use this information to find the correct referent for n...

Citations

... Many studies have since investigated the properties of the learned representations of such VGS models (e.g., [13,39,40,41,42]). Perhaps the most prominent question is whether words are encoded in these utterance embeddings even though VGS models are not explicitly trained to encode words and are only exposed to complete sentences. ...
... Human speech perception has been an active area of research in the past five decades which has produced a wealth of documented behavioral studies and experimental findings. Recently, there has been a growing scientific interest in the cognitive modeling community to leverage the recent advances in speech representation learning to formalize and test theories of speech perception using computational simulations on the one hand, and to investigate whether neural networks exhibit similar behavior to humans on the other hand (Räsänen et al., 2016;Alishahi et al., 2017;Dupoux, 2018;Scharenborg et al., 2019;Gelderloos et al., 2020;Matusevych et al., 2020b;Magnuson et al., 2020). ...
... Pre-trained models, such as BERT (Devlin et al., 2018), RoBERTa (Liu et al., 2019) and more recently T5 (Raffel et al., 2020), are the state of the art across several tasks in computational linguistics. In addition, transformer-based models are known to have access to information as varied as part of speech information (Chrupała and Alishahi, 2019;Tenney et al., 2019b), parse trees (Hewitt and Manning, 2019), the NLP pipeline (Tenney et al., 2019a), and constructional information (Tayyar Madabushi et al., 2020). These models tend to perform so well that, on certain tasks, they outperform human baselines (Zhang et al., 2020). ...
... The availability of language models that can process connected text has increased the scope of cognitive neuroscientists' toolkit for probing the relationship between computational language representations and the neural signals. Mirroring the successes in computer vision 24 and the subsequent modeling of neural processing in visual perceptual hierarchies [25][26][27] , computational linguists are beginning to interpret how language models achieve their task performance [28][29][30] and what is the correspondence between such pretrained model representations and neural responses recorded when participants engage in similar language tasks [31][32][33][34][35][36][37] . On the one hand, task-optimized ANNs therefore serve as a tool and a framework that allow us to operationalize and identify which computational primitives serve as the candidate hypotheses for explaining neural data [38][39][40][41] . ...
... Multi-lingual dataset collection has always been a major hurdle when it comes to building models in a one-model-fits-all style that can provide good results for image retrieval across multiple languages. Most methods [22,28,31] rely on direct translations of English captions while others [12,20] have used independent image and language text pairs. Based on previous research learning we try to explore following ideas in this paper: ...
... In fact, factors like vocal tract differences across speakers, speaking styles, contextual differences, and environmental conditions all make speech signal much more complicated than pure text. It is therefore surprising and exciting to see our proposed Speech2Vec is able to, at least to some extent, factor out these inherent variability in speech production and preserve the semantic information of spoken words in a latent space [Chrupa la et al., 2019], as shown by the results on word similarity benchmarks and visualization of the learned embeddings. ...
... Another set of models operate directly on real continuous speech (e.g., Kamper et al., 2016;Nixon, 2020;Park and Glass, 2008;Schatz et al., 2021;Shain and Elsner, 2020). Besides processing language input only, there are models that use visual concurrent input in addition to spoken language (e.g., Alishahi et al., 2017;Chrupa la et al., 2017;Coen, 2006;Harwath et al., 2016;Khorrami and Räsänen, 2021;Nikolaus and Fourtassi, 2021;Roy, 2005). Besides passive perception approaches, there are also models that can interact with simulated or real human caregivers (e.g., ...
... Harwath and Glass collected spoken captions for the Flickr8k database and used it to train the first neural network based VGS model [26]. There have been many improvements to the model architecture ( [27,28,29,30,31,32,33]) and new applications of VGS models such as semantic keyword spotting ( [34,35,14]), image generation [36], recovering of masked speech [37] and even models combining speech and video [38]. ...
... The use of text has been explored for US image captioning, either using medical reports [16], [17] or voice commentaries on a retrospective dataset [18]. In order to be less text-dependant, in the computer vision community, there is an interest in machine learning methods that learn directly from voice signals, rather than transcripts [19]- [23]. For example, Harwath et al. [20], proposed a novel multi-modal method for image retrieval that uses spoken captions on real images to assign a similarity score to each image, demonstrating the possibility to learn semantic correspondences from audio and image pairings. ...
... Amongst L2 experimentalists, native-language corpora have become widespread tools in the development of experiments: They have been used to extract word frequencies in native language (L1) to be used as predictors or control variables to assess learners' performances (Gries & Ellis, 2015) and they have allowed scholars to establish native-like baselines against which to contrast L2. This approach has proved popular in processing (e.g., Spinner et al., 2017), morphology (e.g., Matusevych et al., 2018), syntax (e.g., Hopp, 2017), collocational knowledge (e.g., Toomer & Elgort, 2019), linguistic contexts and their effects on phonolexical processing (e.g., Chrabaszcz & Gor, 2014), and constructional knowledge (e.g., Kim & Rah, 2019). In the case of Kim and Rah (2019), following Johnson and Goldberg (2013), the authors used the Corpus of Contemporary American English (COCA) to select verbs for an experiment designed to explore L2 learners' sensitivity to constructional information and learners efficiency in integrating information from a verb and a construction in real-time processing. ...