Conference PaperPDF Available

Human-Robot Collaborative Learning System for Inspection

Authors:

Abstract and Figures

This paper presents a collaborative reinforcement learning algorithm, CQ(lambda), designed to accelerate learning by integrating a human operator into the learning process. The CQ(lambda) -learning algorithm enables collaboration of knowledge between the robot and a human; the human, responsible for remotely monitoring the robot, suggests solutions when intervention is required. Based on its learning performance, the robot switches between fully autonomous operation, and the integration of human commands. The CQ(lambda) -learning algorithm was tested on a Motoman UP-6 fixed-arm robot required to empty the contents of a suspicious bag. Experimental results of comparing the CQ(lambda) with the standard Q(lambda), indicated the superiority of the CQ(lambda) while achieving an improvement of 21.25% in the average reward.
Content may be subject to copyright.
... The final search string was described as follows: TITLE-ABS-KEY (( "expertise acceleration" OR "accelerated expertise" OR "accelerating expertise" OR "accelerate expertise" OR "accelerates expertise" OR "knowledge acceleration" OR "accelerated knowledge" OR "accelerating knowledge" OR "accelerate knowledge" OR "accelerates knowledge" OR "learning acceleration" OR "accelerated learning" OR "accelerating learning" OR "accelerate learning" OR "accelerates learning" ) AND ( collaborative OR collaboration OR cooperation OR collabo ratively OR collaborate OR collaborates OR collaborating OR cooperative OR cooperating ) ) Regarding some results, (36,84%) of the selected papers were applied in business domain [1,[12][13][14][15][16]. The set of papers also presented solutions applied in software agents (26,32%) [17][18][19], health (10,53%) [20,21], robotics (10,53%) [22,23], sport (5,26%) [24] and education (5,26%) [25]. This result can be explained because considering the business domain, organizations have increasingly invested in knowledge management [2]. ...
... This different aspect of collaboration has been increasingly investigated along with the growth of the internet of things and artificial intelligence [26]. From 19 papers, in 15 papers (78,95%) collaboration was adopted as part of the strategy to accelerate expertise transfer [25,17,27,13,14,18,28,20,16,21,22,19,29,23,30]. In the others 4 papers (21,05%) [24,17,15,1], collaboration was part of the context. ...
... This is an interesting finding once it generates evidences that collaboration has been used as a common strategy to accelerate expertise transfer. Regarding computational support, 52,63% of the papers presented a solution to support the acceleration of expertise transfer [24,17,27,13,18,28,22,19,23,30], while 47,37% did not mentioned or explored a computational solution [ 25,12,14,20,15,16,21,1,29]. It is important to state that we not limited our search to computer science area, which explains the papers that did not explored computational solutions. ...
... The human's intelligence and the experience gathered while conducting previous experiments described in [7,8] taught him that the robot should be guided to choose actions that will shake the bag mostly over the Y axis and with a small number of actions over the X axis to be more effective. Therefore, when asked to intervene, the human picked the highest possible value for the number of swings over the Y axis, an intermediate value for the number of swings over the X axis, and reduced the likelihood of swings over the Z axis. ...
... To summarize, the results indicated that learning was faster when the HA was asked to intervene in the robot's activity. These results are consistent with previous experiments described in [7,8,61]. Although other alternatives are available for solving the proposed problem, such as cutting open the bag and sliding the objects out of the bag, the shaking application was selected to serve as a test-bed for the CQ(λ) algorithm. ...
... Moderate magnitudes of speeds and adjacent state distances were chosen not to harm the robot.7 Excluding grasping and lifting.8 The evaluation is for the last ten learning episodes in which exploration was complete. ...
Article
Full-text available
Human + Machine Learning >> Machine Learning alone. This paper presents a new reinforcement learning algorithm that enables collaborative learning between a robot and a human. The algorithm which is based on the Q(λ) approach expedites the learning process by taking advantage of human intelligence and expertise. The algorithm denoted as CQ(λ) provides the robot with self awareness to adaptively switch its collaboration level from autonomous (self performing, the robot decides which actions to take, according to its learning function) to semi-autonomous (a human advisor guides the robot and the robot combines this knowledge into its learning function). This awareness is represented by a self test of its learning performance. The approach of variable autonomy is demonstrated and evaluated using a fixed-arm robot for finding the optimal shaking policy to empty the contents of a plastic bag. A comparison between the CQ(λ) and the traditional Q(λ)-reinforcement learning algorithm, resulted in faster convergence for the CQ(λ) collaborative reinforcement learning algorithm.
... Thus, the basic RL algorithm used does not need to be modified to handle the advisors actions. The adjustable autonomy method includes two learning modes, supervised and autonomous, following the model introduced in [6]. CCRL is evaluated using a simulated 3D environment for a mobile robot motion planning task. ...
... The greedy action is given the highest selection probability, but all the others are ranked and weighted according to their value estimates. The softmax method uses a Boltzmann distribution, choosing action a on the t-th step with the probability shown in (6). ...
Conference Paper
Full-text available
A cognitive collaborative reinforcement learning algorithm (CCRL) that incorporates an advisor into the learning process is developed to improve supervised learning. An autonomous learner is enabled with a self awareness cognitive skill to decide when to solicit instructions from the advisor. The learner can also assess the value of advice, and accept or reject it. The method is evaluated for robotic motion planning using simulation. Tests are conducted for advisors with skill levels from expert to novice. The CCRL algorithm and a combined method integrating its logic with Clouse's Introspection Approach, outperformed a base-line fully autonomous learner, and demonstrated robust performance when dealing with various advisor skill levels, learning to accept advice received from an expert, while rejecting that of less skilled collaborators. Although the CCRL algorithm is based on RL, it fits other machine learning methods, since advisor's actions are only added to the outer layer. Keywords—Robot learning, human-robot collaboration, motion planning, reinforcement learning.
... Thus, the basic RL algorithm used does not need to be modified to handle the advisors actions. The adjustable autonomy method includes two learning modes, supervised and autonomous, following the model introduced in [6]. CCRL is evaluated using a simulated 3D environment for a mobile robot motion planning task. ...
... The greedy action is given the highest selection probability, but all the others are ranked and weighted according to their value estimates. The softmax method uses a Boltzmann distribution, choosing action a on the t-th step with the probability shown in (6). ...
Conference Paper
Full-text available
A cognitive collaborative reinforcement learning algorithm (CCRL) that incorporates an advisor into the learning process is developed to improve supervised learning. An autonomous learner is enabled with a self awareness cognitive skill to decide when to solicit instructions from the advisor. The learner can also assess the value of advice, and accept or reject it. The method is evaluated for robotic motion planning using simulation. Tests are conducted for advisors with skill levels from expert to novice. The CCRL algorithm and a combined method integrating its logic with Clouse's Introspection Approach, outperformed a base-line fully autonomous learner, and demonstrated robust performance when dealing with various advisor skill levels, learning to accept advice received from an expert, while rejecting that of less skilled collaborators. Although the CCRL algorithm is based on RL, it fits other machine learning methods, since advisor's actions are only added to the outer layer. Keywords—Robot learning, human-robot collaboration, motion planning, reinforcement learning.
... game theory to compute the motor commands. In this manner, the robot can carry out a predefined regular task while the human intervenes when needed [15], [16]. ...
Preprint
Full-text available
This paper introduces human-robot sensory augmentation and illustrates it on a tracking task, where performance can be improved by the exchange of sensory information between the robot and its human user. It was recently found that during interaction between humans, the partners use each other's sensory information to improve their own sensing, thus also their performance and learning. In this paper, we develop a computational model of this unique human ability, and use it to build a novel control framework for human-robot interaction. The human partner's control is formulated as a feedback control with unknown control gains and desired trajectory. A Kalman filter is used to estimate first the control gains and then the desired trajectory. The estimated human partner's desired trajectory is used as augmented sensory information about the system and combined with the robot's measurement to estimate an uncertain target trajectory. Simulations and an implementation of the presented framework on a robotic interface validate the proposed observer-predictor pair for a tracking task. The results obtained using this robot demonstrate how the human user's control can be identified, and exhibit similar benefits of this sensory augmentation as was observed between interacting humans.
... To interpret the results and to show that there were no subjective influences, a physical model of the opening of a plastic bag knot by a robot was developed. The model explains the results described in this paper as well as previous experiments performed [7], [23]. It showed that it was worthwhile to open the bag using a continuous shaking/motion from locations as far as possible from the center of the horizontal axis. 2 Ideally, the robot arm should be accelerated to match or closely match the gravitational acceleration downwards and should be oscillated over the axis to overcome most of the friction forces. ...
... Holly et al. in [14] present a detailed study of HRI taxonomy that helps to understand different modes and levels of HRI and to extend the concepts to HSI. Kartoun et al. in [15] present an intelligent approach in which a robot collaborates with a human in learning a task. Though the study is not in the context of HSI, it gives some idea on possible approach for HSI. ...
Conference Paper
Full-text available
This study shows that appropriate human interaction can benefit a swarm of robots to achieve goals more efficiently. A set of desirable features for human swarm interaction is identified based on the principles of swarm robotics. Human swarm interaction architecture is then proposed that has all of the desirable features. A swarm simulation environment is created that allows simulating a swarm behavior in an indoor environment. The swarm behavior and the results of user interaction are studied by considering radiation source search and localization application of the swarm. Particle swarm optimization algorithm is slightly modified to enable the swarm to autonomously explore the indoor environment for radiation source search and localization. The emergence of intelligence is observed that enables the swarm to locate the radiation source completely on its own. Proposed human swarm interaction is then integrated in a simulation environment and user evaluation experiments are conducted. Participants are introduced to the interaction tool and asked to deploy the swarm to complete the missions. The performance comparison of the user guided swarm to that of the autonomous swarm shows that the interaction interface is fairly easy to learn and that user guided swarm is more efficient in achieving the goals. The results clearly indicate that the proposed interaction helped the swarm achieve emergence.
... To interpret the results and to show that there were no subjective influences, a physical model of the opening of a plastic bag knot by a robot was developed. The model explains the results described in this paper as well as previous experiments performed [7], [23]. It showed that it was worthwhile to open the bag using a continuous shaking/motion from locations as far as possible from the center of the horizontal axis. 2 Ideally, the robot arm should be accelerated to match or closely match the gravitational acceleration downwards and should be oscillated over the axis to overcome most of the friction forces. ...
Article
Full-text available
This article presents a novel method of determining smoking status using clinical narrative notes. TN substantially relies on the fact that physicians tend to use similar expressions to describe medical conditions and, further, tend to use these expressions consistently. Converting all expressions and notes to alphabetical-only representations eliminates the heterogeneity in the descriptions of the medical descriptors and allows a perfect match between an expression and a note that may contain the expression (e.g., “hastroublegoingtosleep” and “hasahistoryofalcohol,” as presented as examples in Figures 5 and 6, respectively). In traditional machine-learning approaches for text classification, a human expert is required to label phrases or entire notes, and then a supervised-learning algorithm attempts to generalize the associations and apply them to new data. In contrast, using non-negated distinct expressions eliminates the need for an additional computational method to achieve generalizability, as the expressions have consistently been found highly prevalent across multiple clinical conditions by considering more than 10 million clinical narrative notes. TN thus provides distinct classifications and is thereby expected to provide robust results.
Article
Full-text available
We present a fast algorithm to approximate the swept volume (SV) boundary of arbitrary polygon soup models. Despite the extensive research on calculating the volume swept by an object along a trajectory, the efficient algorithms described have imposed constraints on both the trajectories and geometric models. By proposing a general algorithm that handles flat surfaces as well as volumes and disconnected objects, we allow SV calculation without resorting to preprocessing mesh repair nor deforming offsets. This is of particular interest in the domain of product lifecycle management (PLM), which deals with industrial computer aided design (CAD) models that are malformed more often than not. We incorporate the bounded distance operator used in path planning to efficiently sample the trajectory while controlling the total error. We develop a triangulation scheme that draws on the unique data set created by an advancing front level-set method to tessellate the SV boundary in linear time. We analyze its performance, and demonstrate its effectiveness both theoretically and on real cases taken from PLM.
Article
Full-text available
This research focuses on the development of a telerobotic system that employs several state-action policies to carry out a task using on-line learning with human operator (HO) intervention through a virtual reality (VR) interface. The case-study task is to empty the contents of an unknown bag for subsequent scrutiny.A system state is defined as a condition that exists in the system for a significant period of time and consists of the following sub-states: 1) the bag which includes a feature set such as its type(e.g., plastic bag, briefcase, backpack, or suitcase) and its condition(e.g., open, close, orientation, distortions in bag contour, partial hiding of a bag, changing of handle lengths); 2) the robot(e.g., gripper spatial coordinates, home position, idle, performing a task); 3) other objects(e.g., contents that fell out of the bag, obstructions) and 4) environmental conditions such as illumination (e.g., day or night).A system action takes the system to a new state. Action examples include initial grasping point, lift and shake trajectory, re-arranging the position of a bag to prepare it for better grasping and enable the system to verify if all the bag contents have been extracted.Given the system state and a set of actions, a policy is a set of state-action pairs to perform a robotic task. The system starts with knowledge of the individual operators of the robot arm, such as opening and closing the gripper, but it has no policy for deciding when these operators are appropriate, nor does it have knowledge about the special properties of the bags. A policy is defined as the best action for a given state. The system learns this policy from experience and human guidance. A policy is found to be beneficial if a bag was grabbed successfully and all its contents have been extracted.Learning the optimal policy for classifying system states will be conducted using two soft computing methods: 1) on-line adaptive resonance theory (ART) and 2) off-line support vector machines (SVMs). The inference of these methods will be a recommendation for a set of possible grasping points. Their recognition accuracy will be compared for a set of test cases. Reinforcement learning (e.g., Q-learning) will be used to find the best action (e.g., determining the optimal grasping point followed by a lift and shake trajectory) for a given state.When unknown system states are identified, the HO suggests solutions (policies) through a VR interface and the robot decides to accept or reject them. The HO monitors the interactions of the telerobot on-line and controls the system through the VR interface. Policy examples are to let the HO classify the type of a bag (e.g., a briefcase) when it was recognized mistakenly as a different type (e.g., a suitcase) and to provide a set of possible grasping points by the HO when the system finds it difficult to recognize points that are beneficial for completing the task. When HO intervention is found to be beneficial, the system learns, and its dependence on the HO decreases.For testing the above, an advanced virtual reality (VR) telerobotic bag shaking system is proposed. It is assumed that several kinds of bags areplaced on a platform. All locks have been removed and latches and zippers opened. The task of the system is to empty the contents of an unknown bag onto the platform for subsequent scrutiny. It is assumed that the bag has already passed X-ray inspection to ensure the bag is not empty and does not contain obvious explosives (e.g., mines, gun bullets). HO collaboration is conducted via a VR interface, which has an important role in the system. The HO either manipulates the 3D robot off-line, suggests solutions (e.g., the robot learns an optimal grasping location and avoids others) or changes and adds lifting and shaking policies on-line. When the robot encounters a situation it cannot handle, it relies on HO intervention. HO intervention will be exploited by the system to support the evolution of autonomy in two ways: first, by providing input to machine learning to support system adaptation, and second, by characterizing those situations when operator intervention is necessary when autonomous capabilities fail. Finally, measuring the amount of required operator intervention provides a metric for judging the system's level of autonomy -the less intervention, the higher the level of autonomy.
Article
Tele-operation is used when a task has to be performed in a hostile, unsafe, inaccessible or remote environment. Examples of tele-operation include the dismantling of bombs by the police and the manipulation of robotic arms in nuclear reactors, in deep seas or in space. Two commonly used methods in tele-operation are direct manipulation and indirect manipulation. In direct manipulation the operator has a direct view of the manipulator and performs mostly mechanical tasks. In indirect manipulation the operator does not have a direct view of the manipulator, and sensors such as closed loop TV systems are used to provide the operator with on-line information. In this case, manipulation requires mental efforts, since the presentation of the manipulator environment is frequently distorted due to the limited ability of sensors to provide a complete and accurate view of reality. In this study a simple pick and place tele-operation task was performed in the direct and indirect modes by the participating subjects. The analysis of the results show that while the very same learning model can be used to analyze the learning process in both modes, the parameters of the models are significantly different. Thus the duration of the learning process, as well as the most appropriate leaching methodology, may differ substantially between the two modes of operation.
Article
We present a novel approach to behaviour recognition in visual surveillance under which scene events corresponding to object behaviours are modelled as groups of affiliated autonomous pixel-level events automatically detected using Pixel Change Histories (PCHs). The Expectation-Maximisation (EM) algorithm is employed to cluster these pixel-level events into semantically more meaningful blob-level scene events, with automatic model order selection using modified Minimum Description Length (MDL). The method is computationally efficient allowing for real-time performance. Experiments are presented to demonstrate the effectiveness of recognising these scene events without object trajectory matching.
Article
Programming mobile robots can be long and difficult task. The idea of having a robot learn how to accomplish a task, rather than being told explicitly is appealing. ) (λ TD implementation of Reinforcement Learning (RL) using Fuzzy Neural Network (FNN) is suggested as plausible approach for this task, while Q-Learning is shown to be inadequate. Although there is no formal proof of its superiority, but it did better in a simple simulation. (λ TD