Fig 2 - uploaded by Tekin Mericli
Content may be subject to copyright.

Source publication
Conference Paper
Full-text available
This paper elaborates on mechanisms for establishing visual joint attention for the design of robotic agents that learn through natural interfaces, following a developmental trajectory not unlike infants. We describe first the evolution of cognitive skills in infants and then the adaptation of cognitive development patterns in robotic design. A com...

Contexts in source publication

Context 1
... a 5-meter range laser range finder is installed on the body of the Robotino robot to have more accurate range data from the robot's environment. Figure 2 illustrates an example for a video frame recorded by the robot, where the caregiver focuses his attention on one of the seven objects. ...
Context 2
... pose vectors, which are computed in the above described manner, have a distribution as illustrated in Figure 4. Each pose value is demonstrated in corresponding colors same as Figure 2 depending on manual annotations obtained by users. As these distributions are modeled with Gaussians, the indicated regions in 3D come into view. ...
Context 3
... the experiments we used 1200 frames of recorded video, where two different caregivers look at each of the seven objects on the table shown in Figure 2. We apply our algorithms to M 1 indicates at which rate an estimated object center falls into the bounding box of the true object of interest, whereas M 2 shows the rate at which the estimated point is at shortest distance to the true center. ...
Context 4
... effect of this factor is prominent in the case of Object 7. The resolution of Object 7 is challenging, not only because it lies on the periphery, but also because it is quite close to Object 6. On the other hand, it is obvious that some objects are very close to each other (Figure 2), even partially occluding one another in some cases (Objects 6 and 7). Hence, in addition to calculating M 1 and M 2 for each object, we form clusters of objects such as left peripheral (L), right peripheral (R) and central (C), according to localization on the table, where L includes objects 1 and 2, R includes 6 and 7, and C covers 3, 4 and 5. ...

Similar publications

Conference Paper
Full-text available
In this paper, a first step is taken towards using vision in human-humanoid haptic joint actions. Haptic joint actions are characterized by physical interaction throughout the execution of a common goal. Because of this, most of the focus is on the use of force/torque-based control. However, force/torque information is not rich enough for some task...
Conference Paper
Full-text available
Person identification is a fundamental robotic capability for long-term interactions with people. It is important to know with whom the robot is interacting for social reasons, as well as to remember user preferences and interaction histories. There exist, however, a number of different features by which people can be identified. This work describe...
Article
Full-text available
We report on an extensive study of the current benefits and limitations of deep learning approaches to robot vision and introduce a novel dataset used for our investigation. To avoid the biases in currently available datasets, we consider a human-robot interaction setting to design a data-acquisition protocol for visual object recognition on the iC...
Conference Paper
Full-text available
Intelligent robots will make a chance for us to use a computer in our daily life. We implemented a humanoid robot for the computerized university guidance at first and then some capabilities for the natural interaction are added. This paper describes the hardware and software system of this humanoid with interaction ability. HRP-2 sitting opposite...
Conference Paper
Full-text available
An architecture for integrating various vision devices in an autonomous agent is presented and demonstrated in the case of vision-based navigation in autonomous mobile robotics. The architecture is based on the behavioural approach. Applying this approach to vision results in a collection of vision-based behaviours that perform just simple tasks wh...

Citations

... In recent years, sophisticated algorithms have been developed in which gaze direction guides the identification of objects of interest [235,236,237,238,239]. For instance, in the context of robot learning, Yucel et al. [237] use a joint attention algorithm that works as follows: the instructor's face is first detected using Haar-like features. ...
Article
Full-text available
Today, one of the major challenges that autonomous vehicles are facing is the ability to drive in urban environments. Such a task requires communication between autonomous vehicles and other road users in order to resolve various traffic ambiguities. The interaction between road users is a form of negotiation in which the parties involved have to share their attention regarding a common objective or a goal (e.g. crossing an intersection), and coordinate their actions in order to accomplish it. In this literature review we aim to address the interaction problem between pedestrians and drivers (or vehicles) from joint attention point of view. More specifically, we will discuss the theoretical background behind joint attention, its application to traffic interaction and practical approaches to implementing joint attention for autonomous vehicles.
... Yücel, Salah, Meriçli, and Meriçli and colleagues [83,84] presented a cognitively-inspired, virtually task-independent gaze-following/fixation and object segmentation mechanism for robotic joint attention. In their first approach [83], the head pose of the caregiver was determined by the proposed system and gaze direction estimated from it. ...
... Yücel, Salah, Meriçli, and Meriçli and colleagues [83,84] presented a cognitively-inspired, virtually task-independent gaze-following/fixation and object segmentation mechanism for robotic joint attention. In their first approach [83], the head pose of the caregiver was determined by the proposed system and gaze direction estimated from it. At the same time, the depth of the object along the direction of gaze would be inferred from head orientation. ...
Article
Full-text available
This review intends to provide an overview of the state of the art in the modeling and implementation of automatic attentional mechanisms for socially interactive robots. Humans assess and exhibit intentionality by resorting to multisensory processes that are deeply rooted within low-level automatic attention-related mechanisms of the brain. For robots to engage with humans properly, they should also be equipped with similar capabilities. Joint attention, the precursor of many fundamental types of social interactions, has been an important focus of research in the past decade and a half, therefore providing the perfect backdrop for assessing the current status of state-of-the-art automatic attentional-based solutions. Consequently, we propose to review the influence of these mechanisms in the context of social interaction in cutting-edge research work on joint attention. This will be achieved by summarizing the contributions already made in these matters in robotic cognitive systems research, by identifying the main scientific issues to be addressed by these contributions and analyzing how successful they have been in this respect, and by consequently drawing conclusions that may suggest a roadmap for future successful research efforts.
... More recently, other proposals have been made. The model developed by [157] implements an effective model, which integrates image-processing algorithms into a robust estimation of the head pose and an estimation of the gaze direction. Other authors, such as [93,94,134], have focused on the capacity of shared attention in ''mental rotation'' and ''perspective taking.'' ...
Article
Full-text available
Recently, there have been considerable advances in the research on innovative information communication technology (ICT) for the education of people with autism. This review focuses on two aims: (1) to provide an overview of the recent ICT applications used in the treatment of autism and (2) to focus on the early development of imitation and joint attention in the context of children with autism as well as robotics. There have been a variety of recent ICT applications in autism, which include the use of interactive environments implemented in computers and special input devices, virtual environments, avatars and serious games as well as telerehabilitation. Despite exciting preliminary results, the use of ICT remains limited. Many of the existing ICTs have limited capabilities and performance in actual interactive conditions. Clinically, most ICT proposals have not been validated beyond proof of concept studies. Robotics systems, developed as interactive devices for children with autism, have been used to assess the child’s response to robot behaviors; to elicit behaviors that are promoted in the child; to model, teach and practice a skill; and to provide feedback on performance in specific environments (e.g., therapeutic sessions). Based on their importance for both early development and for building autonomous robots that have humanlike abilities, imitation, joint attention and interactive engagement are key issues in the development of assistive robotics for autism and must be the focus of further research.
... Gaze-awareness is also necessary for humans to feel that they have made eye contact with others [17]. Some robotic agents shift human attention in several ways including eye gaze [18], head orientation [19], [20], reference terms and pointing gestures [21]. Most of these also assumed that a human faces to the robot when their interaction begins. ...
Conference Paper
Full-text available
Attention control can be defined as shifting someone's attention from his/her existing attentional focus to another. However, it is not an easy task for the robot to control a human's attention toward its intended direction, especially when the robot and the human are not facing each other, or the human is intensely attending his/her task. The robot should convey some communicative intention through appropriate actions according to the human's situation. In this paper, we propose a robotic framework to control the human attention in terms of three phases: attracting attention, making eye contact, and shifting attention. Results show that the robot can attract a person's attention by three actions: head turning, head shaking, and uttering reference terms corresponding to three viewing situations in which the human vision senses the robot (near peripheral filed of view, far peripheral field of view, and out of field of view). After gaining attention, the robot makes eye contact through showing gaze awareness by blinking its eyes, and directs the human attention by the combination of eye and head turning behavior to share an object. Experiments using sixteen participants confirm the effectiveness of the propose framework to control human attention.
... Some robots were equipped with the capability to encourage people to initiate interaction by some cues such as approaching direction [10] and path [11], standing position [12], and the following behaviors [13]. Some robotic agents shift human attention in several ways including gaze turn [14], head orientation [15], reference term [16], and pointing gesture [17]. These studies assumed that the target person faces the robot and intends to talk with it; however, in reality this assumption may not hold. ...
Conference Paper
Full-text available
A major challenge in HRI is to design a social robot that can attract a target human's attention to control his/her attention toward a particular direction in various social situations. If a robot would like to initiate an interaction with a person, it may turn its gaze to him/her for eye contact. However, it is not an easy task for the robot to make eye contact because such a turning action alone may not be enough to initiate an interaction in all situations, especially when the robot and the human are not facing each other or the human intensely attends to his/her task. In this paper, we propose a conceptual model of attention control with four phases: attention attraction, eye contact, attention avoidance, and attention shift. In order to initiate an attention control process, the robot first tries to gain the target participant's attention toward it through head turning, or head shaking action depending on the three viewing situations where the robot is captured in his/her field of view (central field of view, near peripheral field of view, and far peripheral field of view). After gaining her/his attention, the robot makes eye contact only with the target person through showing gaze awareness by blinking its eyes, and directs her/his attention toward an object by turning its eyes and head cues. Moreover, the robot can show attention to aversion behaviors if non-target persons look at it. We design a robot based on the proposed approach, and it is confirmed as effective to control the target participant's attention in experimental evaluation.
... Among these applications, studies of fixation on a person, a scene or an object has been of particular interest in surveillance [12] and meeting like scenarios [13], where it's possible to also infer joint attention of a group of individuals. Interestingly, joint attention is also used in creating natural human-robot interaction [18]. Along the lines of head gesture recognition, detection of head nods and shakes has been found to be useful for individuals to both produce and recognize American Sign Language [6]. ...
Conference Paper
Full-text available
Head gesture detection and analysis is a vital part of looking inside a vehicle when designing intelligent driver assistance systems. In this paper, we present a simpler and constrained version of Optical flow based Head Movement and Gesture Analyzer (OHMeGA) and evaluate on a dataset relevant to the automotive environment. OHMeGA is user-independent, robust to occlusions from eyewear or large spatial head turns and lighting conditions, simple to implement and setup, real-time and accurate. The intuitiveness behind OHMeGA is that it segments head gestures into head motion states and no-head motion states. This segmentation allows higher level semantic information such as fixation time and rate of head motion to be readily obtained. Performance evaluation of this approach is conducted under two settings: controlled in laboratory experiment and uncontrolled on-road experiment. Results show an average of 97.4% accuracy in motion states for in laboratory experiment and an average of 86% accuracy overall in on-road experiment.
... Some robots were equipped with the capability to encourage people to initiate interaction by ooering cues such as approach direction [11], approach path [12], and standing position [13]. Robotic systems have also been developed that are able to establish eyecontact by gaze crossing141516 and to shift a human' s attention by gaze turning [17, 18], reference terms, and pointing gestures [19, 20]. These studies assumed that the target person faces the robot and intends to talk to it; however, in actual practice this assumption may not always hold. ...
Article
Full-text available
It is a major challenge in HRI to design a social robot that is able to selectively direct a target human’s attention towards an intended direction. For this purpose, the robot may first turn its gaze toward him/her in order to establish eye contact. However, such a turning action of the robot may not in itself be sufficient to make eye contact with the target person in all situations, especially when the robot and the person are not facing each other or the human is intensely engaged in a task. In this paper, we propose a conceptual model of attention control with five phases: attention attraction, eye contact, attention avoidance, gaze back, and attention shift. Evaluation experiment by using a robotic head reveals the e effectiveness of the proposed model in different viewing situations.
... The system can now control the robot to pick up the cup on the table, and not the one on the mantelpiece. Yücel and Salah (2009) proposed a method for establishing joint attention between a human and a robot. A more advanced application that requires the robot to establish and maintain a conversation is turn taking during a conversation with its human partner (Kendon, 1967). ...
Article
This chapter presents an overview of a typical scenario of Ambient Assisted Living (AAL) in which a robot navigates to a person for conveying information. Indoor robot navigation is a challenging task due to the complexity of real-home environments and the need of online learning abilities to adjust for dynamic conditions. A comparison between systems with different sensor typologies shows that visionbased systems promise to provide good performance and a wide scope of usage at reasonable cost. Moreover, vision-based systems can perform different tasks simultaneously by applying different algorithms to the input data stream thus enhancing the flexibility of the system. The authors introduce the state of the art of several computer vision methods for realizing indoor robotic navigation to a person and human-robot interaction. A case study has been conducted in which a robot, which is part of an AAL system, navigates to a person and interacts with her. The authors evaluate this test case and give an outlook on the potential of learning robot vision in ambient homes.
... The understanding of social and cognitive abilities at stake during interactions represents a puzzling but necessary question for social robotics. Multiple attempts have been made to provide human abilities to robots, like showing expressivity [5], decoding emotion [6], giving appropriate feedback [19] or sharing attention [20, 4] with a human peer. The development of such robots able to devise with oneself in a friendly manner surely represents a long-term objective at the cross-over of multiple disciplines ranging from psychology, to engineering, through signal processing and machine learning. ...
Conference Paper
Full-text available
Understanding the ability to coordinate with a partner constitutes a great challenge in social signal processing and social robotics. In this paper, we designed a child-adult imitation task to investigate how automatically computable cues on turn-taking and movements can give insight into high-level perception of coordination. First we collected a human questionnaire to evaluate the perceived coordination of the dyads. Then, we extracted automatically computable cues and information on dialog acts from the video clips. The automatic cues characterized speech and gestural turn-takings and coordinated movements of the dyad. We finally confronted human scores with automatic cues to search which cues could be informative on the perception of coordination during the task. We found that the adult adjusted his behavior according to the child need and that a disruption of the gestural turn-taking rhythm was badly perceived by the judges. We also found, that judges rated negatively the dyads that talked more as speech intervenes when the child had difficulties to imitate. Finally, coherence measures between the partners' movement features seemed more adequate than correlation to characterize their coordination.
... De notre point de vue, ce module devraitêtre apprisà travers l'interaction. Le modèle développé par (Yucel et al., 2009) implémente un modèle relativement efficace, consistantà intégrer des algorithmes de traitement d'image robustes comme l'estimation de la pose de la tête et l'estimation de la direction du regard. D'autres auteurs comme (Marin-Urias et al., 2009;Marin-Urias et al., 2008;Sisbot et al., 2007) se sont focalisés sur des capacités importantes de l'attention partagée nommées "mental rotation" et "perspective taking". ...
Article
Full-text available
My thesis focuses on the emotional interaction in autonomous robotics. The robot must be able to act and react in a natural environment and cope with unpredictable pertubations. It is necessary that the robot can acquire a behavioral autonomy, that is to say the ability to learn and adapt on line. In particular, we propose to study what are the mechanisms to introduce so that the robot has the ability to perceive objects in the environment and in addition they can be shared by an experimenter. The problem is to teach the robot to prefer certain objects and avoid other objects. The solution can be found in psychology in the social referencing. This ability allows to associate a value to an object through emotional interaction with a human partner. In this context, our problem is how a robot can autonomously learn to recognize facial expressions of a human partner and then use them to give a emotional valence to objects and allow their discrimination. We focus on understanding how emotional interaction with a partner can bootstrap behavior of increasing complexity such as social referencing. Our idea is that social referencing as well as the recognition of facial expressions can emerge from a sensorimotor architecture. We support the idea that social referencing may be initiated by a simple cascade of sensorimotor architectures which are not dedicated to social interactions. My thesis underlines several topics that have a common denominator: social interaction. We first propose an architecture which is able to learn to recognize facial expressions through an imitation game between an expressive head and an experimenter. The robotic head would begin by learning five prototypical facial expressions. Then, we propose an architecture which can reproduce facial expressions and their different levels of intensity. The robotic head can reproduce expressive more advanced for instance joy mixed with anger. We also show that the face detection can emerge from this emotional interaction thanks to an implicit rhythm that is created between human partner and robot. Finally, we propose a model sensorimotor having the ability to achieve social referencing. Three situations have been tested: 1) a robotic arm is able to catch and avoid objects as emotional interaction from the human partner. 2) a mobile robot is able to reach or avoid certain areas of its environment. 3) an expressive head can orient its gaze in the same direction as humans and addition to associate emotional values to objects according tothe facial expressions of experimenter. We show that a developmental sequence can merge from emotional interaction and that social referencing can be explained a sensorimotor level without needing to use a model theory mind.