Conference Paper

Word-order Biases in Deep-agent Emergent Communication

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Accordingly, the population setting offers more opportunities to actively shape the process [21,44,116]. 108,112,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185 ✓ ◎ 19,97,98,113,116,118,119,158,164,182,186,187,188,189,190,191,192,193,194,195,196 ◎ 34 26, 30, 31, 91, 101, 102, 103, 150, 164, 165, 180, 197, 198, 199, 200, 201, 202 30, 31, 91, 102, 203, 204 ✓ ◎ 21, 22, 90, 95, 96, 97, 116, 164, 187, 192, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215 18, 20, 23, 27, 28, 97, 187, 192, 208, 212, 213, 215, 216, 217, 218 An additional factor that shapes the communication setting is the type of cooperation inherent in the setup. Determining the level of cooperation or competition feasible within the setting is a fundamental decision and closely related to the choice of the language game. ...
... Paper Referential 14,15,16,25,26,31,32,33,34,37,45,48,49,50,52,56,57,65,87,89,91,92,95,96,98,99,101,102,105,108,112,116,123,124,125,126,127,130,131,133,134,135,136,137,138,140,141,142,143,144,146,147,148,151,153,154,156,157,159,160,161,163,164,166,167,168,170,172,173,174,175,176,177,179,181,185,192,197,198,200,201,206,207,210,211,216 Reconstruction 48,49,56,57,94,98,103,104,106,107,108,120,121,123,128,129,139,141,143,148,149,150,152,165,169,183,186,189,196,204,205,21,45,57,90,93,113,155,180,188 Grid World 18,20,22,27,28,30,33,54,57,88,97,100,117,118,119,122,123,132,134,140,145,147,162,176,178,184,191,195,199,208,212,213,215,216,217,218,219,220,223,225,226,227,228,229,230 Continuous World 57,158,193,209,213,215,220,222,224,225 Other 17,20,23,28,29,30,54,126,144,182,190,194,203,220,221,224,228 three dimensions, presents challenges and adds a greater sense of realism and intricacy. These environments have the potential to make it more feasible to deploy EL agents in real-world scenarios compared to discrete environments [193]. ...
... Table 6 summarizes the results of our survey regarding vocabulary types in EL research. One commonly used phonological type in EL is a binary encoding, while an even more prominent type 108,112,113,115,116,117,118,120,121,123,124,125,127,128,129,130,131,132,133,134,135,136,137,138,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,156,157,158,159,161,162,163,164,166,167,168,169,170,171,172,173,174,176,177,178,179,180,181,183,185,186,188,189,190,191,192,195,196,198,199,200,201,203,204,206,210,211,215,216,217,218,221,226,227,229 Continuous 20,22,28,56,89,122,155,160,165,193,197,202,205,208,209,212,213,214,222,223,224,225 Both 27,34,35,57,126,139,175,182,184,207,219,220 188,190,191,193,197,198,199,200,202,203,205,206,207,208,209 is a token-based vocabulary. However, these two phonological classes are not always distinct, as a token-based vocabulary often builds upon a binary encoded representation [97]. ...
Preprint
Full-text available
The field of emergent language represents a novel area of research within the domain of artificial intelligence, particularly within the context of multi-agent reinforcement learning. Although the concept of studying language emergence is not new, early approaches were primarily concerned with explaining human language formation, with little consideration given to its potential utility for artificial agents. In contrast, studies based on reinforcement learning aim to develop communicative capabilities in agents that are comparable to or even superior to human language. Thus, they extend beyond the learned statistical representations that are common in natural language processing research. This gives rise to a number of fundamental questions, from the prerequisites for language emergence to the criteria for measuring its success. This paper addresses these questions by providing a comprehensive review of 181 scientific publications on emergent language in artificial intelligence. Its objective is to serve as a reference for researchers interested in or proficient in the field. Consequently, the main contributions are the definition and overview of the prevailing terminology, the analysis of existing evaluation methods and metrics, and the description of the identified research gaps.
... In conclusion, these works demonstrate how language structures are transmitted across generations of learning agents and refined with each subsequent iteration, resulting in more efficient, communicable, and learnable languages [47], [52], [104], [109]. ...
... For example, [148] employs curriculum learning, gradually increasing the difficulty of referential games, and reports an emergence of compositionality using topographic similarity and zero-shot generalization accuracy. Similarly, the introduction of iterative learning, see Section II-C, in the language development process, has been shown to lead to increasingly compositional languages with each generation [47], [52], [104], [109]. The complex and dynamic chaotic environments where languages can emerge offer a rich foundation for language development [149]. ...
... This measure shows ''how consistently a speaker agent emits a particular symbol when it takes a particular action and vice versa''. While this method is reported in [26], [46], [52] as a reliable metric, but it fails to capture the listener behavior. ...
Article
Full-text available
In the recent shift towards human-centric AI, the need for machines to accurately use natural language has become increasingly important. While a common approach to achieve this is to train large language models, this method presents a form of learning misalignment where the model may not capture the underlying structure and reasoning humans employ in using natural language, potentially leading to unexpected or unreliable behavior. Emergent communication ( EmCom ) is a field of research that has seen a growing number of publications in recent years, aiming to develop artificial agents capable of using natural language in a way that goes beyond simple discriminative tasks and can effectively communicate and learn new concepts. In this review, we present EmCom under two aspects. Firstly, we delineate all the common proprieties we find across the literature and how they relate to human interactions. Secondly, we identify two subcategories and highlight their characteristics and open challenges. We encourage researchers to work together by demonstrating that different methods can be viewed as diverse solutions to a common problem and emphasize the importance of including diverse perspectives and expertise in the field. We believe a deeper understanding of human communication and human-AI trust dynamics are crucial to develop machines that can accurately use natural language in human-machine interactions.
... Communication-assisted Unlike previous Communication-focused games, this second category utilizes communication as a means to achieve a goal different from the communicative act itself. The agents' action space includes both communication signals and other game-dependent actions, which range from physic simulators [Grover et al., 2018, Mordatch and, navigation tasks [Das et al., 2019, Lowe et al., 2017, Eccles et al., 2019, Chaabouni et al., 2019a, negotiation settings [Bachrach et al., 2020, Cao et al., 2018 to social deduction games [Brandizzi et al., 2021, Nakamura et al., 2016, Jaques et al., 2018. These games aim to recreate the environment in which language emerged, emphasizing the view that human language did not emerge as a goal itself but rather as a means of coordinating actions between humans. ...
... In conclusion, these works demonstrate how language structures are transmitted across generations of learning agents and refined with each subsequent iteration, resulting in more efficient, communicable, and learnable languages [Ren et al., 2020, Tieleman et al., 2019, Chaabouni et al., 2019a, Cogswell et al., 2019. ...
... For example, Korbak et al. [2019] employs curriculum learning, gradually increasing the difficulty of referential games, and reports an emergence of compositionality using topographic similarity and zero-shot generalization accuracy. Similarly, the introduction of iterative learning, see Section 2.3, in the language development process, has been shown to lead to increasingly compositional languages with each generation [Cogswell et al., 2019, Ren et al., 2020, Tieleman et al., 2019, Chaabouni et al., 2019a. The complex and dynamic chaotic environments where languages can emerge offer a rich foundation for language development [Larsen-Freeman, 1997]. ...
Preprint
Full-text available
In the recent shift towards human-centric AI, the need for machines to accurately use natural language has become increasingly important. While a common approach to achieve this is to train large language models, this method presents a form of learning misalignment where the model may not capture the underlying structure and reasoning humans employ in using natural language, potentially leading to unexpected or unreliable behavior. Emergent communication (Emecom) is a field of research that has seen a growing number of publications in recent years, aiming to develop artificial agents capable of using natural language in a way that goes beyond simple discriminative tasks and can effectively communicate and learn new concepts. In this review, we present Emecom under two aspects. Firstly, we delineate all the common proprieties we find across the literature and how they relate to human interactions. Secondly, we identify two subcategories and highlight their characteristics and open challenges. We encourage researchers to work together by demonstrating that different methods can be viewed as diverse solutions to a common problem and emphasize the importance of including diverse perspectives and expertise in the field. We believe a deeper understanding of human communication is crucial to developing machines that can accurately use natural language in human-machine interactions.
... One of the proposed explanations for these mismatches is the difference in cognitive biases between human and neural-network (NN) based learners. For instance, the neural-agent iterated learning simulations of Chaabouni et al. (2019b) and Lian et al. (2021) did not succeed in replicating the trade-off between word-order and case marking, which is widely attested in human languages (Sinnemäki, 2008;Futrell et al., 2015) and has also been observed in miniature language learning experiments with human subjects (Fedzechkina et al., 2017). Instead, those simulations resulted in the preservation of languages with redundant coding mechanisms, which the authors mainly attributed to the lack of a human-like least-effort bias in the neural agents. ...
... Focusing on the word-order/case-marking tradeoff, Chaabouni et al. (2019b) implemented an iterated learning experiment inspired by Kirby (2001); Kirby et al. (2014), where agents acquired a language through supervised learning, and then transmitted this to a new learner, iterating over several generations. The trade-off did not appear in their simulations. ...
... The trade-off did not appear in their simulations. Lian et al. (2021) extended the study of Chaabouni et al. (2019b) by introducing several crucial factors from the language evolution field (e.g. input variability or a learning bottleneck) into the agent setup, however no clear trade-off was found. ...
Preprint
Full-text available
Artificial learners often behave differently from human learners in the context of neural agent-based simulations of language emergence and change. The lack of appropriate cognitive biases in these learners is one of the prevailing explanations. However, it has also been proposed that more naturalistic settings of language learning and use could lead to more human-like results. In this work, we investigate the latter account focusing on the word-order/case-marking trade-off, a widely attested language universal which has proven particularly difficult to simulate. We propose a new Neural-agent Language Learning and Communication framework (NeLLCom) where pairs of speaking and listening agents first learn a given miniature language through supervised learning, and then optimize it for communication via reinforcement learning. Following closely the setup of earlier human experiments, we succeed in replicating the trade-off with the new framework without hard-coding any learning bias in the agents. We see this as an essential step towards the investigation of language universals with neural learners.
... Natural languages commonly display a tradeoff among different strategies to convey constituent roles. A similar trade-off, however, has not been observed in recent simulations of iterated language learning with neural network based agents (Chaabouni et al., 2019b). In this work, we re-evaluate this result in the light of two important factors, namely: the lack of effort-based pressure in the agents and the lack of variability in the initial input language. ...
... Besides the horizontal transmission that is often modeled in the referential game setup, the process of iterated learning, where signals are transmitted vertically from generation to generation, has been identified to shape language Kirby (2001). Chaabouni et al. (2019b) incorporated this process in their model in which agents are initially exposed to a pre-defined language, which is then learned and reproduced repeatedly by several agents in a row. By controlling the initial language, they analyzed how specific language characteristics affect agents' learning and further investigated what biases the agents show over genera-tions. ...
... For instance, typological comparisons of human languages show that the use word order freedom correlates with the use of casemarking (Sinnemäki, 2008;Futrell et al., 2015), balancing complexity between the two strategies to encode constituent roles. This trade-off did not clearly appear in the iterated learning experiments conducted by Chaabouni et al. (2019b), which they tentatively attributed to the lack of a least-effort pressure in their agents. ...
Preprint
Natural languages display a trade-off among different strategies to convey syntactic structure, such as word order or inflection. This trade-off, however, has not appeared in recent simulations of iterated language learning with neural network agents (Chaabouni et al., 2019b). We re-evaluate this result in light of three factors that play an important role in comparable experiments from the Language Evolution field: (i) speaker bias towards efficient messaging, (ii) non systematic input languages, and (iii) learning bottleneck. Our simulations show that neural agents mainly strive to maintain the utterance type distribution observed during learning, instead of developing a more efficient or systematic language.
... Computational linguists have been researching the emergence of these properties in artificial languages induced by language games [20,39,9,10,40,33,61,41] to better understand the evolution of natural languages. It is only relatively recently that it has also been investigated within the context of deep learning [46,29,45,25,47,7,19,43,21,15,16,49,60,27,50,2,17,3], as the ability to ground into other modalities a natural-like language is thought to be a prerequisite for general AI [69,54,4,18,3]. ...
... Variants of this game have been driving a lot of research on language emergence and communicationbased co-operation in the field of linguistics [11,14,62,63], game theory [23,26,5,22] (as acknowledged in [46]), and more recently, deep learning [46,29,45,25,47,7,19,43,21,15,16,49,27,17,59]. We will focus specifically on those variants that fit in the denomination of referential games. ...
... More importantly, this feature influenced the "organisation"/structure of the emerging languages. Lazaridou et al. [47] opens an avenue to explore the naturalness of emerging languages, as it is a growing concern in the community [45,15,16,3], in view of better human-machine interface. ...
Preprint
Full-text available
Natural languages are powerful tools wielded by human beings to communicate information and co-operate towards common goals. Their values lie in some main properties like compositionality, hierarchy and recurrent syntax, which computational linguists have been researching the emergence of in artificial languages induced by language games. Only relatively recently, the AI community has started to investigate language emergence and grounding working towards better human-machine interfaces. For instance, interactive/conversational AI assistants that are able to relate their vision to the ongoing conversation. This paper provides two contributions to this research field. Firstly, a nomenclature is proposed to understand the main initiatives in studying language emergence and grounding, accounting for the variations in assumptions and constraints. Secondly, a PyTorch based deep learning framework is introduced, entitled ReferentialGym, which is dedicated to furthering the exploration of language emergence and grounding. By providing baseline implementations of major algorithms and metrics, in addition to many different features and approaches, ReferentialGym attempts to ease the entry barrier to the field and provide the community with common implementations.
... With the modelling flexibility of a neural network and the fact that it learns from data, neural emergent communication models provide interesting prospects for studying cognitive mechanisms behind language learning and change. Higherlevel properties of the languages emerging in these simulations can be compared to human language, such as the frequency distribution of symbols and word order Chaabouni, Kharitonov, Lazaric, Dupoux, & Baroni, 2019), emergence of human-like colour naming systems (Chaabouni, Kharitonov, Dupoux, & Baroni, 2021) and compositionality (combining expressions to a larger expression; Chaabouni, Kharitonov, Bouchacourt, Dupoux, & Baroni, 2020;Cogswell, Lu, Lee, Parikh, & Batra, 2019;Li & Bowling, 2019). Specifically of interest to the research questions in this thesis are emergent communication models where language contact plays a role, such as the formation of creole languages (Graesser, Cho, & Kiela, 2019). ...
Thesis
Full-text available
In this thesis, I study how languages change in situations where languages or groups of speakers are in contact with each other. As language change is inherently caused by interaction between individuals, I use a technique from multi-agent Artificial Intelligence (AI) that puts the interaction of individuals central: agent-based computer simulations. I apply these agent-based models to specific case studies of language change in the real world. The goal of the thesis is two-fold: getting a better view of the mechanisms behind language change and studying how computational methods work on real-world problems with small amounts of data. I present three different computer models, which each answer a particular linguistic question given a specific case study or dataset. In my first model, I study how language contact can make languages simplify, using a case study of Alorese, a language in Eastern Indonesia. By integrating data from the language into an agent-based model, I study if the phonotactics of the language -- the allowed structure of sounds following each other -- could play a role in simplification. In my second model, I investigate if mechanisms in conversations could be a factor in language change. Using an agent-based model, I show how speakers influencing each other's linguistic choices in conversations can under certain circumstances, lead to spread of an innovative form In my third model, I investigate what could be a cognitively realistic computer model for the `brain' of the speakers, that could be used in an agent-based simulation. I develop a neural network model, based on a technique called Adaptive Resonance Theory, which has as its task to cluster verbs that conjugate in the same way into groups. The model is able to learn the systems of verbs of languages from different families while being interpretable: it is possible to visualise to which parts of the words the network attends. Together, the three models show how different mechanisms that interact with each other can lead to language change when languages are in contact. The models show how mechanisms working on short timescales, such as on the scale of a conversation, can cause effects in the longer term, leading to language change. At the same time, this thesis gives insights for the development of communication in multi-agent AI systems, especially when there are multiple types of agents, as is the case in language contact situations.
... On one hand, emergent artificial languages' compositionality has been shown to further the learnability of said languages [36,61,7,43] and, on the other hand, natural languages' compositionality promises to increase the generalisation ability of the artificial agent that would be able to rely on them as a grounding signal, as it has been found to produce learned representations that generalise, when measured in terms of the data-efficiency of subsequent transfer and/or curriculum learning [25,48,49,31]. Yet, emerging languages are far from being 'natural-like' protolanguages [38,9,10], and the questions of how to constraint them to a specific semantic or a specific syntax remain open problems. Nevertheless, some sufficient conditions can be found to further the emergence of compositional languages and generalising learned representations (e.g. ...
Preprint
Natural language instruction following is paramount to enable collaboration between artificial agents and human beings. Natural language-conditioned reinforcement learning (RL) agents have shown how natural languages' properties, such as compositionality, can provide a strong inductive bias to learn complex policies. Previous architectures like HIGhER combine the benefit of language-conditioning with Hindsight Experience Replay (HER) to deal with sparse rewards environments. Yet, like HER, HIGhER relies on an oracle predicate function to provide a feedback signal highlighting which linguistic description is valid for which state. This reliance on an oracle limits its application. Additionally, HIGhER only leverages the linguistic information contained in successful RL trajectories, thus hurting its final performance and data-efficiency. Without early successful trajectories, HIGhER is no better than DQN upon which it is built. In this paper, we propose the Emergent Textual Hindsight Experience Replay (ETHER) agent, which builds on HIGhER and addresses both of its limitations by means of (i) a discriminative visual referential game, commonly studied in the subfield of Emergent Communication (EC), used here as an unsupervised auxiliary task and (ii) a semantic grounding scheme to align the emergent language with the natural language of the instruction-following benchmark. We show that the referential game's agents make an artificial language emerge that is aligned with the natural-like language used to describe goals in the BabyAI benchmark and that it is expressive enough so as to also describe unsuccessful RL trajectories and thus provide feedback to the RL agent to leverage the linguistic, structured information contained in all trajectories. Our work shows that EC is a viable unsupervised auxiliary task for RL and provides missing pieces to make HER more widely applicable.
... While recent studies have primarily used simple RNNbased Speaker and Listener agents, other agent architectures have been employed for specific purposes. For example, Chaabouni et al. (2019) and Ryo et al. (2022) used the LSTM sequence-to-sequence (with attention) architecture to study transduction from a grammar-generated input to an emergent language. Evtimova et al. (2018) compared attentional and non-attentional agents in a multi-modal and multi-step referential game, observing improvements in the out-of-domain test but not the in-domain test. ...
Preprint
Full-text available
To develop computational agents that better communicate using their own emergent language, we endow the agents with an ability to focus their attention on particular concepts in the environment. Humans often understand an object or scene as a composite of concepts and those concepts are further mapped onto words. We implement this intuition as cross-modal attention mechanisms in Speaker and Listener agents in a referential game and show attention leads to more compositional and interpretable emergent language. We also demonstrate how attention aids in understanding the learned communication protocol by investigating the attention weights associated with each message symbol and the alignment of attention weights between Speaker and Listener agents. Overall, our results suggest that attention is a promising mechanism for developing more human-like emergent language.
... Emerging languages are far from being 'natural-like' protolanguages [38,10,11], but sufficient conditions can be found to further the emergence of compositional languages and generalising learned representations (e.g. Kottur et al. [38], Lazaridou et al. [42], Choi et al. [16], Bogin et al. [6], Guo et al. [21], Korbak et al. [36], Chaabouni et al. [12], Denamganaï and Walker [18]). ...
Preprint
Natural languages are powerful tools wielded by human beings to communicate information. Among their desirable properties, compositionality has been the main focus in the context of referential games and variants, as it promises to enable greater systematicity to the agents which would wield it. The concept of disentanglement has been shown to be of paramount importance to learned representations that generalise well in deep learning, and is thought to be a necessary condition to enable systematicity. Thus, this paper investigates how do compositionality at the level of the emerging languages, disentanglement at the level of the learned representations, and systematicity relate to each other in the context of visual referential games. Firstly, we find that visual referential games that are based on the Obverter architecture outperforms state-of-the-art unsupervised learning approach in terms of many major disentanglement metrics. Secondly, we expand the previously proposed Positional Disentanglement (PosDis) metric for compositionality to (re-)incorporate some concerns pertaining to informativeness and completeness features found in the Mutual Information Gap (MIG) disentanglement metric it stems from. This extension allows for further discrimination between the different kind of compositional languages that emerge in the context of Obverter-based referential games, in a way that neither the referential game accuracy nor previous metrics were able to capture. Finally we investigate whether the resulting (emergent) systematicity, as measured by zero-shot compositional learning tests, correlates with any of the disentanglement and compositionality metrics proposed so far. Throughout the training process, statically significant correlation coefficients can be found both positive and negative depending on the moment of the measure.
... Typical work in this area starts from a tabula rasa and studies under what conditionse.g. environments, tasks/goals, social settings-the resulting communication protocols among agents resembles human language, along axes like word length economy (Chaabouni et al., 2019a), wordorder biases (Chaabouni et al., 2019b), and compositionality (Andreas, 2019; Chaabouni et al., 2020;Steinert-Threlkeld, 2020;Geffen Lan et al., 2020), among others (Mu and Goodman, 2021). ...
Preprint
Full-text available
We formulate and test a technique to use Emergent Communication (EC) with a pretrained multilingual model to improve on modern Unsupervised NMT systems, especially for low-resource languages. It has been argued that the currently dominant paradigm in NLP of pretraining on text-only corpora will not yield robust natural language understanding systems, and the need for grounded, goal-oriented, and interactive language learning has been highlighted. In our approach, we embed a modern multilingual model (mBART, Liu et. al. 2020) into an EC image-reference game, in which the model is incentivized to use multilingual generations to accomplish a vision-grounded task, with the hypothesis that this will align multiple languages to a shared task space. We present two variants of EC Fine-Tuning (Steinert-Threlkeld et. al. 2022), one of which outperforms a backtranslation-based baseline in 6/8 translation settings, and proves especially beneficial for the very low-resource languages of Nepali and Sinhala.
... Choi et al. (2018) use qualitative analysis, visualization and a zero-shot test, to show that a language with compositional properties can emerge from environmental pressures. Chaabouni et al. (2019) find that emerged languages, unlike human languages, do not naturally prefer non-redundant encodings. Chaabouni et al. (2020) further find that while generalization capabilities can be found in the languages, compositionality itself does not arise from simple generalization pressures. ...
... The previous work has combined iterated learning models with neural network learners. After initial work focused on the emergence of compositionality (Batali, 1998;Kirby & Hurford, 2002;Swarup & Gasser, 2009), there has been a recent surge of interest in the combination of these two models again aimed at explaining compositionality (Chaabouni, Kharitonov, Lazaric, Dupoux, & Baroni, 2019;Cogswell, Lu, Lee, Parikh, & Batra, 2019;Guo et al., 2019;Ren, Guo, Labeau, Cohen, & Kirby, 2020), but the promising combination of iterated learning model and neural networks has not been applied extensively to other problems. In this paper, we apply this combination of methods for the first time to study the evolution of a universal of lexical semantics, more specifically the monotonicity universal for simple determiners. ...
Article
Full-text available
Natural languages exhibit many semantic universals, that is, properties of meaning shared across all languages. In this paper, we develop an explanation of one very prominent semantic universal, the monotonicity universal. While the existing work has shown that quantifiers satisfying the monotonicity universal are easier to learn, we provide a more complete explanation by considering the emergence of quantifiers from the perspective of cultural evolution. In particular, we show that quantifiers satisfy the monotonicity universal evolve reliably in an iterated learning paradigm with neural networks as agents.
... The functional aspect of language (Clark, 1996) can be captured by artificial multi-agent games (Kirby, 2002;Mordatch and Abbeel, 2018), in which agents have to communicate about some shared input space (e.g., images). A common emergent communication protocol has been adopted in a large body of recent research: a speaker encodes a piece of information into a sequence of discrete symbols (emergent language) and a listener then aims to decipher the sequence and recover the original piece of information (Lazaridou et al., 2017;Havrylov and Titov, 2017;Bouchacourt and Baroni, 2018;Chaabouni et al., 2019;Li and Bowling, 2019;Chaabouni et al., 2020;Luna et al., 2020;Kharitonov and Baroni, 2020, inter alia). ...
... Straight-Through Gumbel-Softmax and Visual Stimuli. Although it has been shown that emerging languages are far from being 'natural'-like [39,12,13], there are some successful cases demonstrating the emergence of compositional languages and learned representations (e.g. Kottur et al. [39], Lazaridou et al. [44], Choi et al. [15], Bogin et al. [7], Guo et al. [21], Korbak et al. [37], Chaabouni et al. [14]), relative to a given standard of compositionality. ...
Preprint
Full-text available
The drivers of compositionality in artificial languages that emerge when two (or more) agents play a non-visual referential game has been previously investigated using approaches based on the REINFORCE algorithm and the (Neural) Iterated Learning Model. Following the more recent introduction of the \textit{Straight-Through Gumbel-Softmax} (ST-GS) approach, this paper investigates to what extent the drivers of compositionality identified so far in the field apply in the ST-GS context and to what extent do they translate into (emergent) systematic generalisation abilities, when playing a visual referential game. Compositionality and the generalisation abilities of the emergent languages are assessed using topographic similarity and zero-shot compositional tests. Firstly, we provide evidence that the test-train split strategy significantly impacts the zero-shot compositional tests when dealing with visual stimuli, whilst it does not when dealing with symbolic ones. Secondly, empirical evidence shows that using the ST-GS approach with small batch sizes and an overcomplete communication channel improves compositionality in the emerging languages. Nevertheless, while shown robust with symbolic stimuli, the effect of the batch size is not so clear-cut when dealing with visual stimuli. Our results also show that not all overcomplete communication channels are created equal. Indeed, while increasing the maximum sentence length is found to be beneficial to further both compositionality and generalisation abilities, increasing the vocabulary size is found detrimental. Finally, a lack of correlation between the language compositionality at training-time and the agents' generalisation abilities is observed in the context of discriminative referential games with visual stimuli. This is similar to previous observations in the field using the generative variant with symbolic stimuli.
... In recent agent models from NLP, deep neural networks are used as the comprehension and production model [5,9,16]. Interesting analyses can be made about the language neural agents learn, such as the frequency distribution of symbols and word order [3,4]. [11] model contact between communities of deep neural agents, and the formation of a creole language. ...
Poster
Full-text available
In this paper, we propose an outline for linguistic research on language change, as observed in the languages of the world, using neural agent-based models of emergent communication. We describe how such models could be used to study morphological simplification, using a case study of language contact in Eastern Indonesia. A neural architecture is used to represent hypothesized cognitive mechanisms of language change: a generalization mechanism, the procedural/declarative model, and a phonological mechanism, the hyper & hypo articulation model, that involves a theory of mind of the listener.
... The functional aspect of language (Clark, 1996) can be captured by artificial multi-agent games (Kirby, 2002;Mordatch and Abbeel, 2018), in which agents have to communicate about some shared input space (e.g., images). A common emergent communication protocol has been adopted in a large body of recent research: a speaker encodes a piece of information into a sequence of discrete symbols (emergent language) and a listener then aims to decipher the sequence and recover the original piece of information (Lazaridou et al., 2017;Havrylov and Titov, 2017;Bouchacourt and Baroni, 2018;Chaabouni et al., 2019;Li and Bowling, 2019;Chaabouni et al., 2020;Luna et al., 2020;Kharitonov and Baroni, 2020, inter alia). ...
Preprint
Full-text available
While state-of-the-art models that rely upon massively multilingual pretrained encoders achieve sample efficiency in downstream applications, they still require abundant amounts of unlabelled text. Nevertheless, most of the world's languages lack such resources. Hence, we investigate a more radical form of unsupervised knowledge transfer in the absence of linguistic data. In particular, for the first time we pretrain neural networks via emergent communication from referential games. Our key assumption is that grounding communication on images---as a crude approximation of real-world environments---inductively biases the model towards learning natural languages. On the one hand, we show that this substantially benefits machine translation in few-shot settings. On the other hand, this also provides an extrinsic evaluation protocol to probe the properties of emergent languages ex vitro. Intuitively, the closer they are to natural languages, the higher the gains from pretraining on them should be. For instance, in this work we measure the influence of communication success and maximum sequence length on downstream performances. Finally, we introduce a customised adapter layer and annealing strategies for the regulariser of maximum-a-posteriori inference during fine-tuning. These turn out to be crucial to facilitate knowledge transfer and prevent catastrophic forgetting. Compared to a recurrent baseline, our method yields gains of 59.0%59.0\%\sim147.6%147.6\% in BLEU score with only 500 NMT training instances and 65.1%65.1\%\sim196.7%196.7\% with 1,000 NMT training instances across four language pairs. These proof-of-concept results reveal the potential of emergent communication pretraining for both natural language processing tasks in resource-poor settings and extrinsic evaluation of artificial languages.
... However, in our case, as described in Eq. A, L M (D) catches also whether a learner is capable of finding regularity in the hold-out data fast, with few data points; hence it also represents the speed-of-learning ideas for measuring inductive biases [Chaabouni et al., 2019]. ...
Preprint
Sequence-to-sequence (seq2seq) learners are widely used, but we still have only limited knowledge about what inductive biases shape the way they generalize. We address that by investigating how popular seq2seq learners generalize in tasks that have high ambiguity in the training data. We use SCAN and three new tasks to study learners' preferences for memorization, arithmetic, hierarchical, and compositional reasoning. Further, we connect to Solomonoff's theory of induction and propose to use description length as a principled and sensitive measure of inductive biases. In our experimental study, we find that LSTM-based learners can learn to perform counting, addition, and multiplication by a constant from a single training example. Furthermore, Transformer and LSTM-based learners show a bias toward the hierarchical induction over the linear one, while CNN-based learners prefer the opposite. On the SCAN dataset, we find that CNN-based, and, to a lesser degree, Transformer- and LSTM-based learners have a preference for compositional generalization over memorization. Finally, across all our experiments, description length proved to be a sensitive measure of inductive biases.
... By providing algorithms for forests or more general graphs we are accommodating all the possible exceptions and variants that have been discussed to define the syntactic structure of sentences, e.g., allowing for cycles (see for instance, [19,Section 4.9 Graph-theoretic properties]). In addition, the syntactic structures in recent experiments with deep agents are forests [20]. ...
Preprint
Full-text available
The interest in spatial networks where vertices are embedded in a one-dimensional space is growing. Remarkable examples of these networks are syntactic dependency trees and RNA structures. In this setup, the vertices of the network are arranged linearly and then edges may cross when drawn above the sequence of vertices. Recently, two aspects of the distribution of the number of crossings in uniformly random linear arrangements have been investigated: the expectation and the variance. While the computation of the expectation is straightforward, that of the variance is not. Here we present fast algorithms to calculate that variance in arbitrary graphs and forests. As for the latter, the algorithm calculates variance in linear time with respect to the number of vertices. This paves the way for many applications that rely on an exact but fast calculation of that variance. These algorithms are based on novel arithmetic expressions for the calculation of the variance that we develop from previous theoretical work.
... Recent work on language emergence in a multi-agent setup showed that the interaction between agents performing a cooperative task while exchanging discrete symbols can lead to the development of a successful communication system [4,2,15,3]. This language protocol is fundamental to solve the task, since lack of communication results in performance decrease [13]. ...
Preprint
Research in multi-agent cooperation has shown that artificial agents are able to learn to play a simple referential game while developing a shared lexicon. This lexicon is not easy to analyze, as it does not show many properties of a natural language. In a simple referential game with two neural network-based agents, we analyze the object-symbol mapping trying to understand what kind of strategy was used to develop the emergent language. We see that, when the environment is uniformly distributed, the agents rely on a random subset of features to describe the objects. When we modify the objects making one feature non-uniformly distributed,the agents realize it is less informative and start to ignore it, and, surprisingly, they make a better use of the remaining features. This interesting result suggests that more natural, less uniformly distributed environments might aid in spurring the emergence of better-behaved languages.
... Interest in this scenario is fueled by the hypothesis that the ability to interact through a human-like language is a prerequisite for genuine AI (Mikolov et al., 2016;Chevalier-Boisvert et al., 2019). Furthermore, such simulations might lead to a better understanding of both standard NLP models (Chaabouni et al., 2019b) and the evolution of human language itself (Kirby, 2002). ...
... Most other works studying compositionality in emergent languages [1,7,24,31] have focused on learning interpretable representations. See [15] for a broad survey of the different approaches. ...
Preprint
Full-text available
Many recent works have discussed the propensity, or lack thereof, for emergent languages to exhibit properties of natural languages. A favorite in the literature is learning compositionality. We note that most of those works have focused on communicative bandwidth as being of primary importance. While important, it is not the only contributing factor. In this paper, we investigate the learning biases that affect the efficacy and compositionality of emergent languages. Our foremost contribution is to explore how capacity of a neural network impacts its ability to learn a compositional language. We additionally introduce a set of evaluation metrics with which we analyze the learned languages. Our hypothesis is that there should be a specific range of model capacity and channel bandwidth that induces compositional structure in the resulting language and consequently encourages systematic generalization. While we empirically see evidence for the bottom of this range, we curiously do not find evidence for the top part of the range and believe that this is an open question for the community.
... Interest in this scenario is fueled by the hypothesis that the ability to interact through a human-like language is a prerequisite for genuine AI (Mikolov et al., 2016;Chevalier-Boisvert et al., 2019). Furthermore, such simulations might lead to a better understanding of both standard NLP models (Chaabouni et al., 2019b) and the evolution of human language itself (Kirby, 2002). ...
Preprint
There is renewed interest in simulating language emergence among deep neural agents that communicate to jointly solve a task, spurred by the practical aim to develop language-enabled interactive AIs, as well as by theoretical questions about the evolution of human language. However, optimizing deep architectures connected by a discrete communication channel (such as that in which language emerges) is technically challenging. We introduce EGG, a toolkit that greatly simplifies the implementation of emergent-language communication games. EGG's modular design provides a set of building blocks that the user can combine to create new games, easily navigating the optimization and architecture space. We hope that the tool will lower the technical barrier, and encourage researchers from various backgrounds to do original work in this exciting area.
Article
Full-text available
Artificial learners often behave differently from human learners in the context of neural agent-based simulations of language emergence and change. A common explanation is the lack of appropriate cognitive biases in these learners. However, it has also been proposed that more naturalistic settings of language learning and use could lead to more human-like results. We investigate this latter account, focusing on the word-order/case-marking trade-off, a widely attested language universal that has proven particularly hard to simulate. We propose a new Neural-agent Language Learning and Communication framework (NeLLCom) where pairs of speaking and listening agents first learn a miniature language via supervised learning, and then optimize it for communication via reinforcement learning. Following closely the setup of earlier human experiments, we succeed in replicating the trade-off with the new framework without hard-coding specific biases in the agents. We see this as an essential step towards the investigation of language universals with neural learners.
Preprint
Full-text available
Communication is a powerful tool for coordination in multi-agent RL. But inducing an effective, common language is a difficult challenge, particularly in the decentralized setting. In this work, we introduce an alternative perspective where communicative messages sent between agents are considered as different incomplete views of the environment state. By examining the relationship between messages sent and received, we propose to learn to communicate using contrastive learning to maximize the mutual information between messages of a given trajectory. In communication-essential environments, our method outperforms previous work in both performance and learning speed. Using qualitative metrics and representation probing, we show that our method induces more symmetric communication and captures global state information from the environment. Overall, we show the power of contrastive learning and the importance of leveraging messages as encodings for effective communication.
Preprint
Human beings use compositionality to generalise from past experiences to actual or fictive, novel experiences. To do so, we separate our experiences into fundamental atomic components. These atomic components can then be recombined in novel ways to support our ability to imagine and engage with novel experiences. We frame this as the ability to learn to generalise compositionally. And, we will refer to behaviours making use of this ability as compositional learning behaviours (CLBs). A central problem to learning CLBs is the resolution of a binding problem (BP) (by learning to, firstly, segregate the supportive stimulus components from the observation of multiple stimuli, and then, combine them in a single episodic experience). While it is another feat of intelligence that human beings perform with ease, it is not the case for state-of-the-art artificial agents. Thus, in order to build artificial agents able to collaborate with human beings, we propose to develop a novel benchmark to investigate agents' abilities to exhibit CLBs by solving a domain-agnostic version of the BP. We take inspiration from the language emergence and grounding framework of referential games and propose a meta-learning extension of referential games, entitled Meta-Referential Games, and use this framework to build our benchmark, that we name Symbolic Behaviour Benchmark (S2B). While it has the potential to test for more symbolic behaviours, rather than solely CLBs, in the present paper, though, we solely focus on the single-agent language grounding task that tests for CLBs. We provide baseline results for it, using state-of-the-art RL agents, and show that our proposed benchmark is a compelling challenge that we hope will spur the research community towards developing more capable artificial agents.
Article
Full-text available
Identifying factors that make certain languages harder to model than others is essential to reach language equality in future Natural Language Processing technologies. Free-order case-marking languages, such as Russian, Latin, or Tamil, have proved more challenging than fixed-order languages for the tasks of syntactic parsing and subject-verb agreement prediction. In this work, we investigate whether this class of languages is also more difficult to translate by state-of-the-art Neural Machine Translation (NMT) models. Using a variety of synthetic languages and a newly introduced translation challenge set, we find that word order flexibility in the source language only leads to a very small loss of NMT quality, even though the core verb arguments become impossible to disambiguate in sentences without semantic cues. The latter issue is indeed solved by the addition of case marking. However, in medium- and low-resource settings, the overall NMT quality of fixed-order languages remains unmatched.
Conference Paper
Full-text available
One of the distinguishing aspects of human language is its compositionality, which allows us to describe complex environments with limited vocabulary. Previously, it has been shown that neural network agents can learn to communicate in a highly structured, possibly compositional language based on disentangled input (e.g. hand-engineered features). Humans, however, do not learn to communicate based on well-summarized features. In this work, we train neural agents to simultaneously develop visual perception from raw image pixels, and learn to communicate with a sequence of discrete symbols. The agents play an image description game where the image contains factors such as colors and shapes. We train the agents using the obverter technique where an agent introspects to generate messages that maximize its own understanding. Through qualitative analysis, visualization and a zero-shot test, we show that the agents can develop, out of raw image pixels, a language with compositional properties, given a proper pressure from the environment.
Article
Full-text available
Complex sentences involving adverbial clauses appear in children's speech at about three years of age yet children have difficulty comprehending these sentences well into the school years. To date, the reasons for these difficulties are unclear, largely because previous studies have tended to focus on only sub-types of adverbial clauses, or have tested only limited theoretical models. In this paper, we provide the most comprehensive experimental study to date. We tested four-year-olds, five-year-olds and adults on four different adverbial clauses (before, after, because, if) to evaluate four different theoretical models (semantic, syntactic, frequency-based and capacity-constrained). 71 children and 10 adults (as controls) completed a forced-choice, picture-selection comprehension test, providing accuracy and response time data. Children also completed a battery of tests to assess their linguistic and general cognitive abilities. We found that children's comprehension was strongly influenced by semantic factors - the iconicity of the event-to-language mappings - and that their response times were influenced by the type of relation expressed by the connective (temporal vs. causal). Neither input frequency (frequency-based account), nor clause order (syntax account) or working memory (capacity-constrained account) provided a good fit to the data. Our findings thus contribute to the development of more sophisticated models of sentence processing. We conclude that such models must also take into account how children's emerging linguistic understanding interacts with developments in other cognitive domains such as their ability to construct mental models and reason flexibly about them.
Chapter
Full-text available
As observed by linguist Joseph Greenberg (Greenberg 1963), languages across the world seem to share properties at all levels of linguistic organization. Some of these patterns are regularities in the crosslinguistic distribution of elements that hold across languages (non-implicational universals1). For example, sentential subjects almost always precede objects in declarative sentences (Greenberg 1963). Others, the so-called implicational universals, describe correlations between elements that vary together across languages: If a language has property A, then it most likely has property B. An example of such an implicational universal is the well-documented correlation between constituent order freedom and the presence of case-marking (Blake 2001; Sapir 1921): Languages with flexible constituent order often use morphological means, such as case, to mark grammatical function assignment (e.g., German, Japanese, and Russian), whereas languages with fixed constituent order typically lack case morphology (e.g., English and Mandarin).
Article
Full-text available
Across languages of the world, some grammatical patterns have been argued to be more common than expected by chance. These are sometimes referred to as (statistical) language universals. One such universal is the correlation between constituent order freedom and the presence of a case system in a language. Here, we explore whether this correlation can be explained by a bias to balance production effort and informativity of cues to grammatical function. Two groups of learners were presented with miniature artificial languages containing optional case marking and either flexible or fixed constituent order. Learners of the flexible order language used case marking significantly more often. This result parallels the typological correlation between constituent order flexibility and the presence of case marking in a language and provides a possible explanation for the historical development of Old English to Modern English, from flexible constituent order with case marking to relatively fixed order without case marking. In addition, learners of the flexible order language conditioned case marking on constituent order, using more case marking with the cross-linguistically less frequent order, again mirroring typological data. These results suggest that some cross-linguistic generalizations originate in functionally motivated biases operating during language learning.
Chapter
Full-text available
Article
Full-text available
Significance We provide the first large-scale, quantitative, cross-linguistic evidence for a universal syntactic property of languages: that dependency lengths are shorter than chance. Our work supports long-standing ideas that speakers prefer word orders with short dependency lengths and that languages do not enforce word orders with long dependency lengths. Dependency length minimization is well motivated because it allows for more efficient parsing and generation of natural language. Over the last 20 y, the hypothesis of a pressure to minimize dependency length has been invoked to explain many of the most striking recurring properties of languages. Our broad-coverage findings support those explanations.
Article
Full-text available
Our aim is to question the basic principles of iconicity, in respect to both the common and the scientific use of natural languages. We argue for the need to propose some extensions of them. We test the validity and the relevance of these principles on some Romance languages (especially Romanian) and we examine their relevance, but also their limits. The mathematical language and the cosmic language (Freudenthal) as well as the generative approach to the syntax of both natural and formal languages are especially in our attention. There is a story about an Englishman, a Frenchman and a German who are debating the merits of their respective languages. The German starts by claiming: 'German is off course ze best language. It is ze language off logik and philosophy, and can communicate viz great clarity and precision even ze most complex ideas.' 'Boeff,' shrugs the Frenchman, 'but French, French, it ees ze language of lurve! In French, we can convey all ze subtletees of romance weez elegance and flair.' The Englishman ponders the matter for a while, and then says: 'Yes, chaps, that's all very well. But just think about it this way. Take the word " spoon " , for instance.
Article
Full-text available
Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.
Article
Full-text available
In this paper, we propose a novel neural network model called RNN Encoder--Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder--Decoder as an additional feature in the existing log-linear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases.
Conference Paper
Full-text available
A small number of the logically possible word order configurations account for a large proportion of actual human languages. To explain this distribution, typologists often invoke principles of human cognition which might make certain orders easier or harder to learn or use. We present a novel method for carrying out very large scale artificial language learning tasks over the internet, which allows us to test large batteries of systematically designed languages for differential learnability. An exploratory study of the learnability of all possible configurations of subject, verb, and object finds that the two most frequent orders in human languages are the most easily learned, and yields suggestive evidence compatible with other typological and psycholinguistic observations.
Article
Full-text available
Recent work in functional and cognitive linguistics has argued and pre-sented evidence that the positioning of adverbial clauses is motivated by competing pressures from syntactic parsing, discourse pragmatics, and se-mantics. Continuing this line of research, the current paper investigates the e¤ect of the iconicity principle on the positioning of temporal adverbial clauses. The iconicity principle predicts that the linear ordering of main and subordinate clauses mirrors the sequential ordering of the events they describe. Drawing on corpus data from spoken and written English, the pa-per shows that, although temporal clauses exhibit a general tendency to fol-low the main clause, there is a clear correlation between clause order and iconicity: temporal clauses denoting a prior event precede the main clause more often than temporal clauses of posteriority. In addition to the iconicity principle, there are other factors such as length, complexity, and pragmatic import that may a¤ect the positioning of temporal adverbial clauses. Using logistic regression analysis, the paper investigates the e¤ects of the various factors on the linear structuring of complex sentences.
Article
Full-text available
Recent years have seen a surge in accounts motivated by information theory that consider language production to be partially driven by a preference for communicative efficiency. Evidence from discourse production (i.e., production beyond the sentence level) has been argued to suggest that speakers distribute information across discourse so as to hold the conditional per-word entropy associated with each word constant, which would facilitate efficient information transfer (Genzel & Charniak, 2002). This hypothesis implies that the conditional (contextualized) probabilities of linguistic units affect speakers' preferences during production. Here, we extend this work in two ways. First, we explore how preceding cues are integrated into contextualized probabilities, a question which so far has received little to no attention. Specifically, we investigate how a cue's maximal informativity about upcoming words (the cue's effectiveness) decays as a function of the cue's recency. Based on properties of linguistic discourses as well as properties of human memory, we analytically derive a model of cue effectiveness decay and evaluate it against cross-linguistic data from 12 languages. Second, we relate the information theoretic accounts of discourse production to well-established mechanistic (activation-based) accounts: We relate contextualized probability distributions over words to their relative activation in a lexical network given preceding discourse.
Article
Full-text available
Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient-based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O(1). Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.
Article
Goal-oriented dialogue has been paid attention for its numerous applications in artificial intelligence. To solve this task, deep learning and reinforcement learning have recently been applied. However, these approaches struggle to find a competent recurrent neural questioner, owing to the complexity of learning a series of sentences. Motivated by theory of mind, we propose "Answerer in Questioner's Mind" (AQM), a novel algorithm for goal-oriented dialogue. With AQM, a questioner asks and infers based on an approximated probabilistic model of the answerer. The questioner figures out the answerer's intent via selecting a plausible question by explicitly calculating the information gain of the candidate intentions and possible answers to each question. We test our framework on two goal-oriented visual dialogue tasks: "MNIST Counting Dialog" and "GuessWhat?!." In our experiments, AQM outperforms comparative algorithms and makes human-like dialogue. We further use AQM as a tool for analyzing the mechanism of deep reinforcement learning approach and discuss the future direction of practical goal-oriented neural dialogue systems.
Article
While most machine translation systems to date are trained on large parallel corpora, humans learn language in a different way: by being grounded in an environment and interacting with other humans. In this work, we propose a communication game where two agents, native speakers of their own respective languages, jointly learn to solve a visual referential task. We find that the ability to understand and translate a foreign language emerges as a means to achieve shared goals. The emergent translation is interactive and multimodal, and crucially does not require parallel corpora, but only monolingual, independent text and corresponding images. Our proposed translation model achieves this by grounding the source and target languages into a shared visual modality, and outperforms several baselines on both word-level and sentence-level translation tasks. Furthermore, we show that agents in a multilingual community learn to translate better and faster than in a bilingual communication setting.
Conference Paper
Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT-14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.7 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a strong phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which beats the previous state of the art. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
Article
Neural networks are a family of powerful machine learning models. This book focuses on the application of neural network models to natural language data. The first half of the book (Parts I and II) covers the basics of supervised machine learning and feed-forward neural networks, the basics of working with machine learning over language data, and the use of vector-based rather than symbolic representations for words. It also covers the computation-graph abstraction, which allows to easily define and train arbitrary neural networks, and is the basis behind the design of contemporary neural network software libraries. The second part of the book (Parts III and IV) introduces more specialized neural network architectures, including 1D convolutional neural networks, recurrent neural networks, conditioned-generation models, and attention-based models. These architectures and techniques are the driving force behind state-of-the-art algorithms for machine translation, syntactic parsing, and many other a...
Article
We study the topmost weight matrix of neural network language models. We show that this matrix constitutes a valid word embedding. When training language models, we recommend tying the input embedding and this output embedding. We analyze the resulting update rules and show that the tied embedding evolves in a more similar way to the output embedding than to the input embedding in the untied model. We also offer a new method of regularizing the output embedding. Our methods lead to a significant reduction in perplexity, as we are able to show on a variety of neural network language models. Finally, we show that weight tying can reduce the size of neural translation models to less than half of their original size without harming their performance.
Article
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions. The method is straightforward to implement and is based an adaptive estimates of lower-order moments of the gradients. The method is computationally efficient, has little memory requirements and is well suited for problems that are large in terms of data and/or parameters. The method is also ap- propriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The method exhibits invariance to diagonal rescaling of the gradients by adapting to the geometry of the objective function. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. We demonstrate that Adam works well in practice when experimentally compared to other stochastic optimization methods.
Article
Iterated learning describes the process whereby an individual learns their behaviour by exposure to another individual's behaviour, who themselves learnt it in the same way. It can be seen as a key mechanism of cultural evolution. We review various methods for understanding how behaviour is shaped by the iterated learning process: computational agent-based simulations; mathematical modelling; and laboratory experiments in humans and non-human animals. We show how this framework has been used to explain the origins of structure in language, and argue that cultural evolution must be considered alongside biological evolution in explanations of language origins.
Article
A theme running through much of the functionalist literature in linguistics is that grammatical structure, to a considerable degree, has an 'iconic' motivation. This theme can be distilled into three rather distinct claims: (1) iconic principles govern speakers' choices of structurally available options in discourse; (2) structural options that reflect discourse-iconic principles become grammaticalized; (3) grammatical structure is an iconic reflection of conceptual structure. After presenting numerous examples from the functionalist literature in support of the idea that iconicity is widespread in language, I argue that claim (1) is irrelevant to generative grammar; claim (2), if correct, poses no challenge to generative grammar, despite a widespread belief to the contrary; and claim (3) has literally been built into standard versions of generative grammar. I go on to discuss the implications of iconic relations in language for the autonomy hypothesis and, at a more speculative level, for the evolution of language.
Article
Although linguistic signs in isolation are symbolic, the system or grammar which relates them may be diagrammatically iconic in two ways: (a) by isomorphism, a bi-unique correspondence tends to be established between signans and signatum; (b) by motivation, the structure of language directly reflects some aspect of the structure of reality. Isomorphism is so nearly universal that deviations from it require explanation. Motivation, although widespread, establishes a typology of languages, as indicated in Saussure's Cours. The evidence of artificial taboo languages suggests that degree of motivation co-varies inversely with the number of 'prima onomata' in the lexicon.
Article
The ability to distinguish between an inflectional derivation of a target word, which is a variant of the target, and a completely new word is an important task of language acquisition. In an attempt to explain the ability to solve this problem, it has been proposed that the beginning of the word is its most psychologically salient portion. However, it is not clear whether this phenomenon is specific to language. The three reported experiments address this issue. Experiments 1 and 2 established that suffixation-type preferences occur in language and in domains outside of language and that it is plausible that this same mechanism could account for alternative types of inflectional morphology. Experiment 3 indicated that the suffixation preference is both flexible and transferable across domains. In combination, these experiments suggest that the suffixation preference is driven by a cognitive mechanism that is both domain-general and flexible in nature.
Article
This paper proposes a new theory of the relationship between the sentence processing mechanism and the available computational resources. This theory--the Syntactic Prediction Locality Theory (SPLT)--has two components: an integration cost component and a component for the memory cost associated with keeping track of obligatory syntactic requirements. Memory cost is hypothesized to be quantified in terms of the number of syntactic categories that are necessary to complete the current input string as a grammatical sentence. Furthermore, in accordance with results from the working memory literature both memory cost and integration cost are hypothesized to be heavily influenced by locality (1) the longer a predicted category must be kept in memory before the prediction is satisfied, the greater is the cost for maintaining that prediction; and (2) the greater the distance between an incoming word and the most local head or dependent to which it attaches, the greater the integration cost. The SPLT is shown to explain a wide range of processing complexity phenomena not previously accounted for under a single theory, including (1) the lower complexity of subject-extracted relative clauses compared to object-extracted relative clauses, (2) numerous processing overload effects across languages, including the unacceptability of multiply center-embedded structures, (3) the lower complexity of cross-serial dependencies relative to center-embedded dependencies, (4) heaviness effects, such that sentences are easier to understand when larger phrases are placed later and (5) numerous ambiguity effects, such as those which have been argued to be evidence for the Active Filler Hypothesis.
Article
In this paper, we study a special kind of learning problem in which each training instance is given a set of (or distribution over) candidate class labels and only one of the candidate labels is the correct one. Such a problem can occur, e.g., in an information retrieval setting where a set of words is associated with an image, or if classes labels are organized hierarchically. We propose a novel discriminative approach for handling the ambiguity of class labels in the training examples. The experiments with the proposed approach over five different UCI datasets show that our approach is able to find the correct label among the set of candidate labels and actually achieve performance close to the case when each training instance is given a single correct label. In contrast, naive methods degrade rapidly as more ambiguity is introduced into the labels.
Emergent communication in a multi-modal, multi-step referential game
  • Katrina Evtimova
  • Andrew Drozdov
  • Douwe Kiela
  • Kyunghyun Cho
Katrina Evtimova, Andrew Drozdov, Douwe Kiela, and Kyunghyun Cho. 2018. Emergent communication in a multi-modal, multi-step referential game. In Proceedings of ICLR Conference Track, Vancouver, Canada.
Emergence of language with multi-agent games: Learning to communicate with sequences of symbols
  • Serhii Havrylov
  • Ivan Titov
Serhii Havrylov and Ivan Titov. 2017. Emergence of language with multi-agent games: Learning to communicate with sequences of symbols. In Proceedings of NIPS, pages 2149-2159, Long Beach, CA, USA.
Learning to play Guess Who? and inventing a grounded language as a consequence
  • Emilio Jorge
  • Mikael Kågebäck
  • Emil Gustavsson
Emilio Jorge, Mikael Kågebäck, and Emil Gustavsson. 2016. Learning to play Guess Who? and inventing a grounded language as a consequence. In Proceedings of the NIPS Deep Reinforcement Learning Workshop, Barcelona, Spain.
Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks
  • Brenden Lake
  • Marco Baroni
Brenden Lake and Marco Baroni. 2018. Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. In Proceedings of ICML, pages 2879-2888, Stockholm, Sweden.
Emergence of linguistic communication from referential games with symbolic and pixel input
  • Angeliki Lazaridou
  • Karl Moritz Hermann
  • Karl Tuyls
  • Stephen Clark
Angeliki Lazaridou, Karl Moritz Hermann, Karl Tuyls, and Stephen Clark. 2018. Emergence of linguistic communication from referential games with symbolic and pixel input. In Proceedings of ICLR Conference Track, Vancouver, Canada.
Multi-agent cooperation and the emergence of (natural) language
  • Angeliki Lazaridou
  • Alexander Peysakhovich
  • Marco Baroni
Angeliki Lazaridou, Alexander Peysakhovich, and Marco Baroni. 2017. Multi-agent cooperation and the emergence of (natural) language. In Proceedings of ICLR Conference Track, Toulon, France.
Emergence of grounded compositional language in multi-agent populations
  • Igor Mordatch
  • Pieter Abbeel
Igor Mordatch and Pieter Abbeel. 2018. Emergence of grounded compositional language in multi-agent populations. In Thirty-Second AAAI Conference on Artificial Intelligence.
Automatic differentiation in pytorch
  • Adam Paszke
  • Sam Gross
  • Soumith Chintala
  • Gregory Chanan
  • Edward Yang
  • Zachary Devito
  • Zeming Lin
  • Alban Desmaison
  • Luca Antiga
  • Adam Lerer
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch. In NIPS-W.
Cognitive English Grammar
  • Günter Radden
  • René Dirven
Günter Radden and René Dirven. 2007. Cognitive English Grammar. John Benjamins, Amsterdam, the Netherlands.