
Mehdi Khamassi- PhD
- Research Director at Centre National de la Recherche Scientifique / Sorbonne Université
Mehdi Khamassi
- PhD
- Research Director at Centre National de la Recherche Scientifique / Sorbonne Université
Research director (CNRS) working at the Institute of Intelligent Systems and Robotics, Sorbonne University, Paris
About
181
Publications
34,554
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,249
Citations
Introduction
I am a research director at CNRS, working at the Institute of Intelligent Systems & Robotics, Paris. I obtained my PhD in Cognitive Neuroscience from Université Pierre et Marie Curie (UPMC) in 2007 and my Habilitation to Direct Researches from UPMC in 2014. I am interested in computational models of brain regions involved in decision-makign and learning, as well as in their robotic implementations. I also participate in the design of neuroscience experiments to test specific model predictions.
Current institution
Centre National de la Recherche Scientifique / Sorbonne Université
Current position
- Research Director
Additional affiliations
October 2008 - April 2010
February 2003 - June 2008
October 2010 - September 2020
Centre National de la Recherche Scientifique / Sorbonne Université
Position
- Researcher
Education
September 2013 - May 2014
October 2003 - September 2007
September 2002 - June 2003
UPMC / École Normale Supérieure Ulm / École Polytechnique / EHESS
Field of study
- Cognitive Sciences (CogMaster)
Publications
Publications (181)
We present a computational model of spatial navigation comprising different learning mechanisms in mammals, i.e., associative, cognitive mapping and parallel systems. This model is able to reproduce a large number of experimental results in different variants of the Morris water maze task, including standard associative phenomena (spatial generaliz...
Dynamic uncontrolled human-robot interactions (HRI) require robots to be able to adapt to changes in the human’s behavior and intentions. Among relevant signals, non-verbal cues such as the human’s gaze can provide the robot with important information about the human’s current engagement in the task, and whether the robot should continue its curren...
Multiple in vivo measures have shown that place cells from the hippocampus replay previously experienced trajectories. These replays have been thought to support memory consolidation for a long time. Some data, however, have highlighted a functional link between replays and reinforcement learning (RL). This theory, extensively used in machine learn...
This paper presents a contribution aiming at testing novel child–robot teaching schemes that could be used in future studies to support the development of social and collaborative skills of children with autism spectrum disorders (ASD). We present a novel experiment where the classical roles are reversed: in this scenario the children are the teach...
Hippocampal offline reactivations during reward-based learning, usually categorized as replay events, have been found to be important for performance improvement over time and for memory consolidation. Recent computational work has linked these phenomena to the need to transform reward information into state-action values for decision making and to...
Flexible adaptation to uncertain and changing environments requires dynamic adjustments in behavioral strategies. While classical learning theories emphasize incremental strengthening of local stimulus-action associations in adaptation, emerging evidence suggests that global-level strategy representations may enable rapid inference of adaptive beha...
Minimizing negative impacts of Artificial Intelligent (AI) systems on human societies without human supervision requires them to be able to align with human values. However, most current work only addresses this issue from a technical point of view, e.g., improving current methods relying on reinforcement learning from human feedback, neglecting wh...
To better understand neural processing during adaptive learning of stimulus‐response‐reward contingencies, we recorded synchrony of neuronal activity in anterior cingulate cortex (ACC) and hippocampal rhythms in male rats acquiring and switching between spatial and visual discrimination tasks in a Y‐maze. ACC population activity as well as single u...
Minimizing negative impacts of Artificial Intelligent (AI) systems on human societies without human supervision requires them to be able to align with human values. However, most current work only addresses this issue from a technical point of view, e.g., improving current methods relying on reinforcement learning from human feedback, neglecting wh...
In uncertain environments in which resources fluctuate continuously, animals must permanently decide whether to stabilise learning and exploit what they currently believe to be their best option, or instead explore potential alternatives and learn fast from new observations. While such a trade-off has been extensively studied in pretrained animals...
Learning to anticipate other agents' future movements has gained increased interest in robotics, especially in situations requiring interaction. Such a prediction allows the robotic systems to plan their actions as a task evolves and before its completion. Specifically, in case of active robot collaboration, early decision making might ensure smoot...
Autonomous open-ended learning (OEL) robots are able to cumulatively acquire new skills and knowledge through direct interaction with the environment, for example relying on the guidance of intrinsic motivations and self-generated goals. OEL robots have a high relevance for applications as they can use the autonomously acquired knowledge to accompl...
To better understand neural processing during adaptive learning of stimulus-response-reward contingencies, we recorded synchrony of neuronal activity in anterior cingulate cortex (ACC) with hippocampal rhythms in male rats acquiring and switching between spatial and visual discrimination tasks in a Y-maze. ACC population and single unit activity re...
In recent years, soft robots gain increasing attention as a result of their compliance when operating in unstructured environments, and their flexibility that ensures safety when interacting with humans. However, challenges lie on the difficulty to develop control algorithms due to various limitations induced by their soft structure. In this paper,...
In uncertain environments in which resources fluctuate continuously, animals must permanently decide whether to exploit what they currently believe to be their best option, or instead explore potential alternatives in case better opportunities are in fact available. While such a trade-off has been extensively studied in pretrained animals facing no...
Our brain is continuously challenged by daily experiences. Thus, how to avoid systematic erasing of previously encoded memories? While it has been proposed that a dual-learning system with 'slow' learning in the cortex and 'fast' learning in the hippocampus could protect previous knowledge from interference, this has never been observed in the livi...
Healthcare professionals’ statistical illiteracy can impair medical decision quality and compromise patient safety. Previous studies have documented clinicians’ insufficient proficiency in statistics and a tendency in overconfidence. However, an underexplored aspect is clinicians’ awareness of their lack of statistical knowledge that precludes any...
Our brain is continuously challenged by daily experiences. Thus, how to avoid systematic erasing of previously encoded memories? While it has been proposed that a dual-learning system with "slow" learning in the cortex and "fast" learning in the hippocampus could protect previous knowledge from interference, this has never been observed in the livi...
We present a new neuro-inspired reinforcement learning architecture for robot online learning and decision-making during both social and non-social scenarios. The goal is to take inspiration from the way humans dynamically and autonomously adapt their behavior according to variations in their own performance while minimizing cognitive effort. Follo...
Model-free and model-based computations are argued to distinctly update action values that guide decision-making processes. It is not known, however, if these model-free and model-based reinforcement learning mechanisms recruited in operationally based, instrumental tasks parallel those engaged by Pavlovian based behavioral procedures. Recently, co...
Model-free and model-based computations are argued to distinctly update action values that guide decision-making processes. It is not known, however, if these model-free and model-based reinforcement learning mechanisms recruited in operationally based, instrumental tasks parallel those engaged by Pavlovian based behavioral procedures. Recently, co...
Experience replay is widely used in AI to bootstrap reinforcement learning (RL) by enabling an agent to remember and reuse past experiences. Classical techniques include shuffled-, reversed-ordered- and prioritized-memory buffers, which have different properties and advantages depending on the nature of the data and problem. Interestingly, recent c...
In this paper we introduce a novel technique that aims to control a two-module bio-inspired soft-robotic arm in order to qualitatively reproduce human demonstrations. The main idea behind the proposed methodology is based on the assumption that a complex trajectory can be derived from the composition and asynchronous activation of learned parameter...
Introduction
Nombre d’articles soulignent une maîtrise insuffisante de notions statistiques médicales élémentaires par les médecins : ces lacunes peuvent affecter la qualité des décisions cliniques et compromettre la sécurité des soins.
Objectifs
Aspect fondamental mais encore sous-exploré, notre expérimentation étudie le niveau de confiance des (...
Despite the benefits of expert interaction techniques, many users do not learn them and continue to use novice ones. This article aims at better understanding if, when and how users decide to learn and ultimately adopt expert interaction techniques. This dynamic learning process is a complex skill-acquisition and decision-making problem. We first p...
Embodied Cognitive Dynamics and their Impact on Humans’ Freedom within the Society. The purpose of this article is to examine how knowledge from psychology and cognitive neuroscience about the behavioral automatisms that our bodies acquire through interaction with our environment can help us become freer. For this, we draw a parallel with philosoph...
In this paper we introduce a novel technique that aims to dynamically control a two-module bio-inspired soft-robotic arm in order to qualitatively reproduce a path defined by sparse way-points. The main idea behind this work is based on the assumption that a complex trajectory may be derived as a combination of a discrete set of parameterizable sim...
The ability to attribute thoughts to others, also called theory of mind (TOM), has been extensively studied in humans; however, its evolutionary origins have been challenged. Computationally, the basis of TOM has been interpreted within the predictive coding framework and associated with activity in the temporoparietal junction (TPJ). Here, we reve...
Our understanding of orbitofrontal cortex (OFC) function has progressed remarkably over the past decades in part due to theoretical advances in associative and reinforcement learning theories. These theoretical accounts of OFC function have implicated the region in progressively more psychologically refined processes from the value and sensory-spec...
This special issue, commissioned after the 4th Quadrennial Meeting on Orbitofrontal Cortex Function held in Paris in November of 2019 (https://ofc2019.sciencesconf.org/), is intended to provide a snapshot of this ongoing transformation; we hope that the ideas presented herein will provide a foundation for the next stage in the evolution of our unde...
The ability to attribute thoughts to others, also called theory of mind (TOM), has been extensively studied. Computationally, the basis of TOM in humans has been interpreted within the predictive coding framework and associated with activity in the temporo-parietal junction (TPJ). However, the evolutionary origins of these human mindreading abiliti...
Freedom & Cognition. Another Way is Possible. The question of the relationship between freedom and determinism has been one of the most debated topics in philosophy since antiquity. In recent decades, investigations in the sciences of cognition and neurosciences have greatly revived the debate on free will by studying, for example, decision-making,...
Taking inspiration from how the brain coordinates multiple learning systems is an appealing strategy to endow robots with more flexibility. One of the expected advantages would be for robots to autonomously switch to the least costly system when its performance is satisfying. However, to our knowledge no study on a real robot has yet shown that the...
Engineering approaches to machine learning (including robot learning) typically seek for the best learning algorithm for a particular problem, or a set problems. In contrast, the mammalian brain appears as a toolbox of different learning strategies, so that any newly encountered situation can be autonomously learned by an animal with a combination...
An important current challenge in Human-RobotInteraction (HRI) is to enable robots to learn on-the-fly from human feedback. However, humans show a great variability in the way they reward robots. We propose to address this issue by enabling the robot to combine different learning strategies, namely model-based (MB) and model-free (MF) reinforcement...
People and other animals learn the values of choices by observing the contingencies between them and their outcomes. However, decisions are not guided by choice-linked reward associations alone; macaques also maintain a memory of the general, average reward rate – the global reward state – in an environment. Remarkably, global reward state affects...
Kleefstra syndrome is a disorder caused by a mutation in the EHMT1 gene characterized in humans by general developmental delay, mild to severe intellectual disability and autism. Here, we characterized cumulative memory in the Ehmt1+/- mouse model using the Object Space Task. We combined conventional behavioral analysis with automated analysis by d...
Robots are still limited to controlled conditions, that the robot designer knows with enough details to endow the robot with the appropriate models or behaviors. Learning algorithms add some flexibility with the ability to discover the appropriate behavior given either some demonstrations or a reward to guide its exploration with a reinforcement le...
Computational Intelligence and Artificial Intelligence are both aiming at building machines and softwares capable of intelligent behavior. They are consequently prone to interactions, even if the latter is not necessarily interested in understanding how cognition emerges from the brain substrate. In this chapter, we enumerate, describe and discuss...
An important current challenge in Human-Robot Interaction (HRI) is to enable robots to learn on-the-fly from human feedback. However, humans show a great variability in the way they reward robots. We propose to address this issue by enabling the robot to combine different learning strategies, namely model-based (MB) and model-free (MF) reinforcemen...
In this paper we introduce a novel technique that aims to dynamically control a modular bio-inspired soft-robotic arm in order to perform cyclic rhythmic patterns. Oscillatory signals are produced at the actuator’s level by a central pattern generator (CPG), resulting in the generation of a periodic motion by the robot’s end-effector. The proposed...
Taking inspiration from how the brain coordinates multiple learning systems is an appealing strategy to endow robots with more flexibility. One of the expected advantages would be for robots to autonomously switch to the least costly system when its performance is satisfying. However, to our knowledge no study on a real robot has yet shown that the...
Braitenberg vehicles are bio-inspired controllers for sensor-based local navigation of wheeled robots that have been used in multiple real world robotic implementations. The common approach to implement such non-linear control mechanisms is through neural networks connecting sensing to motor action, yet tuning the weights to obtain appropriate clos...
Kleefstra syndrome is a disorder caused by a mutation in the EHMT1 gene characterized in humans by general developmental delay, mild to severe intellectual disability and autism. Here, we characterized semantic- and episodic-like memory in the Ehmt1 +/- mouse model using the Object Space Task. We combined conventional behavioral analysis with autom...
In the context of Pavlovian conditioning, two types of behaviour may emerge within the population (Flagel et al. Nature, 469(7328): 53–57, 2011). Animals may choose to engage either with the conditioned stimulus (CS), a behaviour known as sign-tracking (ST) which is sensitive to dopamine inhibition for its acquisition, or with the food cup in which...
Ethics and Cognitive Sciences. This special issue presents a set of contributions aimed at discussing ethical questions related to the cognitive science field. The goal is dual : raising ethical questions that may appear specific to cognitive science, as well as more general ethical issues on which cognitive science knowledge could shed a new light...
Declarative memory encompasses representations of specific events as well as knowledge extracted by accumulation over multiple episodes. To investigate how these different sorts of memories are created, we developed a new behavioral task in rodents. The task consists of 3 distinct conditions (stable, overlapping, and random). Rodents are exposed to...
In a volatile environment where rewards are uncertain, successful performance requires a delicate balance between exploitation of the best option and exploration of alternative choices. It has theoretically been proposed that dopamine contributes to the control of this exploration-exploitation trade-off, specifically that the higher the level of to...
There are ongoing debates on whether learning involves the same mechanisms when it is mediated by social skills than when it is not [1]. Gaze cues serve as a strong communicative modality that is profoundly human. They have been shown to trigger automatic attentional orienting [2]. However, arrow cues have been shown to elicit similar effects [3]....
In this work we tackle the problem of child engagement estimation while children freely interact with a robot in their room. We propose a deep-based multi-view solution that takes advantage of recent developments in human pose detection. We extract the child's pose from different RGB-D cameras placed elegantly in the room, fuse the results and feed...
In a volatile environment where rewards are uncertain, successful performance requires a delicate balance between exploitation of the best option and exploration of alternative choices. It has theoretically been proposed that dopamine controls this exploration-exploitation trade-off, specifically that the higher the level of tonic dopamine, the mor...
In economics and perceptual decision-making contextual effects are well documented, where decision weights are adjusted as a function of the distribution of stimuli. Yet, in reinforcement learning literature whether and how contextual information pertaining to decision states is integrated in learning algorithms has received comparably little atten...
Recent computational models of sign tracking (ST) and goal tracking (GT) have accounted for observations that dopamine (DA) is not necessary for all forms of learning and have provided a set of predictions to further their validity. Among these, a central prediction is that manipulating the intertrial interval (ITI) during autoshaping should change...
Development of sign tracking and DA signals over training averaged across rats.
(A-B) Average beam break (solid) and lever press (dashed) rate for 120-s (A) and 60-s (B) ITI groups. (C-D) Average lever press rate for 120-s (C) and 60-s (D) ITI groups. Data are the same as in A and B but with a smaller scale so that differences and timing can be bet...
Underlying data for S3 Fig.
(XLSX)
Underlying data for S4 Fig.
(XLSX)
Task and electrode placement.
(A) DA release was recorded during a standard Pavlovian conditioned approach behavior task for 10 d. Each behavioral session consisted of 25 trials presented at a random time interval of either 60 s (± 30; n = 7 rats) or 120 s (± 30; n = 12 rats). (B-C) Placement of chronic recording electrodes within the NAc core [39]...
Underlying data for Fig 2.
(XLSX)
Underlying data for Fig 3.
(XLSX)
Underlying data for S2 Fig.
(XLSX)
Sign tracking is more prominent in rats that performed sessions with 120-s ITIs.
(A) Average beam break (solid) and lever press (dashed) rate for 120-s (red) and 60-s (blue) ITI groups. (B) Average lever press rate for 120-s (red) and 60-s (blue) ITI groups. Data are the same as in “A” but with a smaller scale so that differences and timing can be...
Food cup entries and lever pressing over time for each training session.
Lever pressing (B,D) and food cup entries (A,C) over the trial time for each of the 10 sessions for 120-s ITI (A,B) and 60-s ITI (C,D) sessions. For the 120-s ITI group, lever pressing was high during the first session and already near maximum levels for that group by the seco...
Underlying data for Fig 1.
(XLSX)
Using assistive robots for educational applications requires robots to be able to adapt their behavior specifically for each child with whom they interact.Among relevant signals, non-verbal cues such as the child’s gaze can provide the robot with important information about the child’s current engagement in the task, and whether the robot should co...
Despite major progress in Robotics and AI, robots are still basically “zombies” repeatedly achieving actions and tasks without understanding what they are doing. Deep-Learning AI programs classify tremendous amounts of data without grasping the meaning of their inputs or outputs. We still lack a genuine theory of the underlying principles and metho...
Using robots as therapeutic or educational tools
for children with autism requires robots to be able to adapt their
behavior specifically for each child with whom they interact.
In particular, some children may like to be looked into the
eyes by the robot while some may not. Some may like a robot
with an extroverted behavior while others may prefer...
This paper presents a contribution to the active field of robotics research to support the development of social skills and capabilities in children with Autism Spectrum Conditions
(ASC) as well as typically developing children. We present preliminary results of a novel experiment where classical roles are reversed: children are here the teachers g...
During sleep and wakeful rest, the hippocampus replays sequences of place cells that have been activated during prior experiences. These replays have been interpreted as a memory consolidation process, but recent results suggest a possible interpretation in terms of reinforcement learning. The Dyna reinforcement learning algorithms use off-line rep...
Social interactions rely on our ability to learn and adjust our behavior to the behavior of others. Strategic games provide a useful framework to study the cognitive processes involved in the formation of beliefs about the others’ intentions and behavior, what we may call strategic theory of mind. Through the years, the growing field of behavioral...
Content: Supplementary methods.
(PDF)
Detailed schema of the model-based spatial planning module of the model.
a) The entorhinal cortex (EC) module encodes idiothetic information (grid cells on the left) and visual information, represented as the sum of the encoded landmarks. b) The dentate gyrus (DG) module receives EC’s output and realizes a Hebbian learning process on the weights Wi...
Environment dependent model parameter table.
(PDF)
Experiment III.
(a-d) Simulation results with the full model (Direction (D) Planning (P) and Exploration (E) strategies together). a) Session-by-session selection rate of strategies. b) Selelection rate of strategies by types of trials. c) Selection rates of strategies by types of simulated animals: Cue Responders (CR) and Place Responders (PR). d)...
Experiment V.
(a-b) Selection rate of strategies during Stage 1 and Stage 2 in groups P (left) and D (right). (c-d) Comparison of the occupancy rate during test trials between Octant B, Before reaching Octant B, and After reaching Octant B for groups DP (left) and D (right). Within each octant is also shown the selection rate of strategies that con...
Examples of assignments of environmental cues to different modules within the model.
a) Cue assignment used for Experiment V. b) Cue assignment used for Experiment VI.
(PDF)
Experiment II.
Simulation results with a) the Direction strategy only, b) the Direction and Locale strategies together, c) the full model illustrating the contribution of individual strategies to the behavior of each strategy in terms of % of time where they are selected.
(PDF)
Experiment IV.
(a-c) Simulation results with the full model (Direction (D) Planning (P) and Exploration (E) strategies together). a) Experimental predictions raised when the hippocampus in the model is lesioned (Group D) versus when when the striatum in the model is lesioned (Group P). b) Occupancy rate in the quadrants containing either the previo...
Experiment VI.
Results with a model only employing the Planning (P) strategy combined with an Exploration strategy. a) The time spent in the quadrant containing the previous platform location is significantly above chance (dashed line) in all conditions, unlike experimental results. b) Escape latencies during Stage 2 do not show an improvement and...