Figure 2 - uploaded by Laura Cohen
Content may be subject to copyright.
A. Brain areas related to language. OFC: Orbitofrontal Cortex, PFC: Prefrontal Cortex, PMC: Premotor cortex, MC: Motor cortex, A1: Primary Auditory Cortex, V1: Primary Visual Cortex. B. Cognitive model of language acquisition, based on caregiver-infant interaction through the environment. The color of the boxes correspond to the areas on the brain schematic. The part highlighted in blue constitutes the focus of this work; i.e., the learning of associations in Broca's area between internal states (PFC) and motor commands (PMC) under the supervision of the reward system.
Source publication
Language acquisition theories classically distinguish passive language understanding from active language production. However, recent findings show that brain areas such as Broca's region are shared in language understanding and production. Furthermore, these areas are also implicated in understanding and producing goal-oriented actions. These obse...
Contexts in source publication
Context 1
... evidence to sup- port this claim lies in the similar neural pathways between language and goal-directed action pro- cesses. 44 and BA 45, extending anteriorly to BA 47 (see Fig.2.A). Wernicke's region is located in the left superior temporal cortex (BA 22 and 42). ...
Context 2
... focus specifically on the modeling of func- tions attributed to Broca's area, namely the cre-ation of symbols in regards to desiring internal states (highlighted in blue on Fig.2). The sec- ondary association between perceptual features and symbols (i.e., labels) is not studied here, as many existing models are dedicated to this spe- cific aspect of language development. ...
Context 3
... was previously ex- ploited to simulate various aspects of the devel- opment of the child such as intention recognition [52]. Figure 5 shows the PerAc formulation of 480 our cognitive model (see Fig.2). The perceptual input is the internal state of the virtual infant, represented by the desired object. ...
Citations
... Symbolic communication enables people to express intention or meaning through sound or gesture (Gomez, 2007;Mundy & Newell, 2007). Emerging at the end of infants' first year of life (Cohen & Billard, 2018;Orr & Geva, 2015), symbolic communication is thought to rely on cognitive development that allows infants to understand symbolism, action and consequences, and violation of expectancies (Rodriguez, 2022). The neural bases of these emerging abilities depend on the experience and maturation of cortical areas (Ahmad et al., 2023), especially the prefrontal cortex, which receives signals from multiple regions (Grossmann et al., 2010;Kolb et al., 2012). ...
... Among examples: newborns moving from making small shifts of attention by slight movements of their eyes (Farroni et al., 2010) to an improved ability to follow using head and body movements at 3 to 6 months (Gredeback & Daum, 2015); neonates communicating by crying, then moving to affective expressions by cooing, smiling, and laughing, starting with the social smile at 2 months (Lavelli & Fogel, 2005); and infants being mostly stationary before moving to turning at 4 to 5 months, crawling at 6 months, and standing and walking toward desired targets at about 1 year (Geva & Orr, 2016;Yamamoto et al., 2019). These motor improvements are part of infants' developing communication toolbox, enabling them to relay to their caregivers more effectively their interests (Cohen & Billard, 2018) and emotional states (Prochazkova & Kret, 2017). ...
... The ability to communicate at multiple levels continues to evolve as infants mature, and as they expand their communicative repertoire and their ability to relay more complex notions or finely tuned signals about their emotional state (Leppanen & Nelson, 2009) and needs (Cohen & Billard, 2018). New abilities affect infants' communicative skills at the same and other levels. ...
Communication is commonly viewed as connecting people through conscious symbolic processes. Infants have an immature communication toolbox, raising the question of how they form a sense of connectedness. In this article, we propose a framework for infants' communication, emphasizing the subtle unconscious behaviors and autonomic contingent signals that convey drives, emotions, and a sense of connection, facilitating the formation of primal social bonds. Our developmental model emphasizes the importance of diverse modes of communication and their interplay in social interactions during infancy. The framework leverages three levels of communication—autonomic, behavioral, and symbolic—and their different maturational pathways. Initially, infants' social communication relies on autonomic responses and a dynamic behavioral repertoire, which evolve during the first year of life, supporting the emergence of symbolic communication. This extended communication framework highlights infants' role as proactive communicating agents and allows for tracing communicative developmental cascades back to their origins.
... On the contrary, there is a wide variation in the time that each new skill is achieved that is impacted by the genetic and the social environment (Navarro et al., 2017;Bedford et al., 2016). Language milestones (e.g., both verbal and nonverbal communication) can be one of the most important early developments for parents, and consist of first words (usually spoken at the end of the first year of life, Cohen & Billard, 2018;Lahrouchi & Kern, 2018) and a linguistic explosion around 2 years of age, by the combination of different words and use of sentences (Bates & Carnevale, 1993;Hirsh-Pasek et al., 2015). Also, there are important fine and gross motor skill developmental milestones, such as pincer grasp or start walking, respectively. ...
The objective of the following research was to describe the use of digital media (i.e., TV, background TV, cell phone, PC, and Tablet), presence of the adults during this activity, and its association with language, motor, and language developmental milestones and SES in the first years of life. Participants were 114 primary caregivers of toddlers between 12 and 36 months (M = 27.48 months, SD = 7.31, female = 58, low SES = 56). Parental reports of infant media use, motor and language development milestones, the Inventory of Skills Development (CDI), and the INDEC Scale (for SES) were used. The results showed that, on average, toddlers engaged for 1 h per day with TV and were passive recipients of background TV for 2 h a day, which was the most used screen. In addition, parents tend to share TV with toddlers. Language positively related with child Tablet use, book use, and TV shared with an adult, and there were negative associations with children’s cell phone and PC use alone and with an adult. For SES, having at least one basic need unsatisfied or less parental educational and occupancy was related with more background TV and use, less time sharing this type of media with toddlers, and less use and quantity of books at home. In general, there were no relations between digital media use and developmental milestones. This indicates that the excessive use of screens could relate to some early language skills, although it is necessary to investigate the context in which they are used.
... To endow a robot with the ability to learn language in a functional way, we propose to learn this capacity initially by trial and error in the same way as goal-oriented actions. In human infants, this learning takes place in the context of typical infant-caregiver social interactions, such as the infant requesting objects out of her reach that we refer to as "social babbling" [12]. Previous results [12] demonstrate that the robot can learn both, symbolic words and gestures, to request objects by interacting with a caregiver. ...
... In human infants, this learning takes place in the context of typical infant-caregiver social interactions, such as the infant requesting objects out of her reach that we refer to as "social babbling" [12]. Previous results [12] demonstrate that the robot can learn both, symbolic words and gestures, to request objects by interacting with a caregiver. This corresponds to the instrumental function of language [3]. ...
... The goal is to endow the robot with the ability to express its internal states by requesting an object from a human caregiver. We propose to extend the RL approach proposed by [12] dedicated to learning associations between internal needs and words in a robot. In the earlier model, the internal states of the robot were limited as they were modeled by binary variables and do not evolve with time. ...
... Regarding language, verbal communication can be one of the most important developmental milestones for parents, with children's first word usually spoken at the end of the first year of life. Babies usually start by saying single words, then two-word sentences, and later three-word sentences, until finally they can form more complex sentences (Cohen & Billard, 2018;Lahrouchi & Kern, 2018). Children acquire around five words daily. ...
... The subject of this internship being proposed by my tutor in the host organisation, Mrs. Cohen Laura, my contributions are therefore extensions of her work on the subject [10] and are naturally very much linked to it. More precisely, in [10], L.Cohen and A.Billard propose a model of joint learning of language and protoimperative pointing using reinforcement learning. ...
... The subject of this internship being proposed by my tutor in the host organisation, Mrs. Cohen Laura, my contributions are therefore extensions of her work on the subject [10] and are naturally very much linked to it. More precisely, in [10], L.Cohen and A.Billard propose a model of joint learning of language and protoimperative pointing using reinforcement learning. Their learning model is based on a permanent interaction between the agent, i.e. the child who learns to speak and to point, and the "Caregiver" who is usually the parent who looks after the child. ...
... In [10], a formula almost equivalent to (1) is proposed: ...
Pointing differs from other conventional gestures in being prior to language development. In this internship, we focused on the learning mechanisms of proto-imperative pointing, a particular type of pointing, among children, by testing different learning strategies such as classical social babbling or assisted social babbling (scaffolding strategy). We carry out both theoretical and experimental analysis of such strategies, thus making it possible
to confirm several pre-established results, in addition to highlighting the linearly increasing learning difficulty, with respect to the increase of the number of objects to be pointed. Our main contribution being, on the one hand, a theoretical model allowing the agent to learn pointing towards moving objects, we implement such a model by being inspired by the principles of inverse kinematics, and on the other hand, the introduction of the belief graph structure in reinforcement learning processes, thus allowing the transfer of learning between apparently independent tasks.
... It enables them to learn the complex rules of a language passively within few years and apply these rules to words they hear for the first time. Roboticists advocate grounded language learning where an artificial agent actively interacts with its environment sampling multimodal stimuli while learning language [5]- [7]. ...
Neural networks can be powerful function approximators, which are able to model high-dimensional feature distributions from a subset of examples drawn from the target distribution. Naturally, they perform well at generalizing within the limits of their target function, but they often fail to generalize outside of the explicitly learned feature space. It is therefore an open research topic whether and how neural network-based architectures can be deployed for systematic reasoning. Many studies have shown evidence for poor generalization, but they often work with abstract data or are limited to single-channel input. Humans, however, learn and interact through a combination of multiple sensory modalities, and rarely rely on just one. To investigate compositional generalization in a multimodal setting, we generate an extensible dataset with multimodal input sequences from simulation. We investigate the influence of the underlying training data distribution on compostional generalization in a minimal LSTM-based network trained in a supervised, time continuous setting. We find compositional generalization to fail in simple setups while improving with the number of objects, actions, and particularly with a lot of color overlaps between objects. Furthermore, multimodality strongly improves compositional generalization in settings where a pure vision model struggles to generalize.
... In the context of speech production and perception, some authors developed models inspired by functions of brain areas (Kröger et al., 2009). Cohen and Billard (2018) tested the hypothesis that human brain areas are shared in language understanding and production, and their implication in goal-directed actions using active language learning and social babbling. Barnaud et al. (2019) tested the hypothesis of idiosyncrasies (individual specificity) in production and perception; moreover, they tested the inter-individual variability in auditory and motor prototypes within a given language. ...
... The most simplistic models do not define the motor control device, leading to a coincidence between motor and sensory space. For instance, this approach has been used by Cohen and Billard (2018) and Pagliarini et al. (2018a) (i.e. Identical to Motor Space (IMS) in Table 2.4). ...
... Alternatively, Fiete et al. (2007) use the neural activity of a spiking neural network. Finally, in the work from Cohen and Billard (2018) goals are specific needs of the agent (e.g. thirsty, hunger). ...
Humans learn to speak in a similar way as how songbirds learn to sing. Both learn to speak/sing by imitation from an early age going through the same stages of development. First they listen to their parents’ vocalizations, then they try to reproduce them: initially babbling, until their vocal output mimics those of their parents. Songbirds have dedicated brain circuits for vocal learning, making them an ideal model for exploring the representation of imitative vocal learning. My research project aims to build a bio-inspired model to describe imitative vocal learning. This model consists in a perceptual-motor loop where a sensory evaluation mechanism drives learning. The sound production is obtained from real recordings, using recent developments in artificial intelligence. This project, in between computer science and neuroscience, may help to better understand imitative vocal learning, and more generally sensorimotor learning.
LINK TO FULL TEXT
https://tel.archives-ouvertes.fr/tel-03217834/document
... In the context of speech production and perception, some authors developed models inspired by functions of brain areas [46]. Cohen et al. [47] tested the hypothesis that human brain areas are shared in language understanding and production, and their implication in goal-directed actions using active language learning and social babbling. Barnaud et al. [48] tested the hypothesis of idiosyncrasies (individual specificity) in production and perception; moreover, they tested the interindividual variability in auditory and motor prototypes within a given language. ...
... The most simplistic models do not define the motor control device, leading to a coincidence between motor and sensory space. For instance, this approach has been used by Cohen et al. [47] and Pagliarini et al. [53] (i.e. Identical to Motor Space (IMS) in Table III). ...
... These are models with a nonperceptual internal representation of the goals, such as the general model shown in Figure 2. As for the perceptual space, the choices made by the author can be found in the last column of Table III. This is the case for the reinforcement learning models proposed by Doya and Sejnowski [100], Troyer and Doupe [43], Fiete et al. [45], Warlamount et al. [50] Cohen et al. [47] and Howard and Birkholz [59]. In the context of songbirds, a typical choice is to use an arbitrary syllable space given by a localist encoding, as in the works from Doya and Sejnowski [44] and Troyer et al. [43]. ...
Sensorimotor learning represents a challenging problem for natural and artificial systems. Several computational models have been proposed to explain the neural and cognitive mechanisms at play in the brain. In general, these models can be decomposed in three common components: a sensory system, a motor control device and a learning framework. The latter includes the architecture, the learning rule or optimisation method, and the exploration strategy used to guide learning. In this review, we focus on imitative vocal learning, that is exemplified in song learning in birds and speech acquisition in humans. We aim to synthesise, analyse and compare the various models of vocal learning that have been proposed, highlighting their common points and differences. We first introduce the biological context, including the behavioural and physiological hallmarks of vocal learning and sketch the neural circuits involved. Then, we detail the different components of a vocal learning model and how they are implemented in the reviewed models.
... An insightful use of tools was a better predictor of the use of success/failure words than to predict other abilities such as object-permanence skills. Language and communicative gestures have also been seen as social tools (Borghi et al., 2013;Cohen and Billard, 2018). Experimental evidence shows that the reaching space of a subject can be extended after the use of words, in a setup where an object could be reached with several strategies including the use of a word which triggers an action from another person (Borghi et al., 2013). ...
Babies and children are curious, active explorers of their world. One of their challenges is to learn of the relations between their actions such as the use of tools or speech, and the changes in their environment. Intrinsic motivations have been little studied in psychology, such that its mechanisms are mostly unknown. On the other hand, most artificial agents and robots have been learning in a way very different from humans. The objective of this thesis is twofold: understanding the role of intrinsic motivations in human development of speech and tool use through robotic modeling, and improving the abilities of artificial agents inspired by the mechanisms of human exploration and learning. A first part of this work concerns the understanding and modeling of intrinsic motivations. We reanalyze a typical tool-use experiment, showing that intrinsically motivated exploration seems to play an important role in the observed behaviors and to interfere with the measured success rates. With a robotic model, we show that an intrinsic motivation based on the learning progress to reach goals with a modular representation can self-organize phases of behaviors in the development of tool-use precursors that share properties with child tool-use development. We present the first robotic model learning both speech and tool use from scratch, which predicts that the grounded exploration of objects in a social interaction scenario should accelerate infant vocal learning of accurate sounds for these objects' names as a result of a goal-directed exploration of the objects. In the second part of this thesis, we extend, formalize and evaluate the algorithms designed to model child development, with the aim to obtain an efficient learning robot. We formalize an approach called Intrinsically Motivated Goal Exploration Processes (IMGEP) that enables the discovery and acquisition of large repertoires of skills. We show within several experimental setups including a real humanoid robot that learning diverse spaces of goals with intrinsic motivations is more efficient for learning complex skills than only trying to directly learn these complex skills.
... As the learning task is similar to ours, we also adopt this interactive strategy. Cohen and Billard [50] proposed another view for human-robot interaction, in which the human guessed the robot's intention and chose an object to reflect the state after an action was executed. However, the states and actions are pre-defined and not suitable for online learning. ...