This research focuses on the development of a telerobotic system that employs several state-action policies to carry out a task using on-line learning with human operator (HO) intervention through a virtual reality (VR) interface. The case-study task is to empty the contents of an unknown bag for subsequent scrutiny.A system state is defined as a condition that exists in the system for a significant period of time and consists of the following sub-states: 1) the bag which includes a feature set such as its type(e.g., plastic bag, briefcase, backpack, or suitcase) and its condition(e.g., open, close, orientation, distortions in bag contour, partial hiding of a bag, changing of handle lengths); 2) the robot(e.g., gripper spatial coordinates, home position, idle, performing a task); 3) other objects(e.g., contents that fell out of the bag, obstructions) and 4) environmental conditions such as illumination (e.g., day or night).A system action takes the system to a new state. Action examples include initial grasping point, lift and shake trajectory, re-arranging the position of a bag to prepare it for better grasping and enable the system to verify if all the bag contents have been extracted.Given the system state and a set of actions, a policy is a set of state-action pairs to perform a robotic task. The system starts with knowledge of the individual operators of the robot arm, such as opening and closing the gripper, but it has no policy for deciding when these operators are appropriate, nor does it have knowledge about the special properties of the bags. A policy is defined as the best action for a given state. The system learns this policy from experience and human guidance. A policy is found to be beneficial if a bag was grabbed successfully and all its contents have been extracted.Learning the optimal policy for classifying system states will be conducted using two soft computing methods: 1) on-line adaptive resonance theory (ART) and 2) off-line support vector machines (SVMs). The inference of these methods will be a recommendation for a set of possible grasping points. Their recognition accuracy will be compared for a set of test cases. Reinforcement learning (e.g., Q-learning) will be used to find the best action (e.g., determining the optimal grasping point followed by a lift and shake trajectory) for a given state.When unknown system states are identified, the HO suggests solutions (policies) through a VR interface and the robot decides to accept or reject them. The HO monitors the interactions of the telerobot on-line and controls the system through the VR interface. Policy examples are to let the HO classify the type of a bag (e.g., a briefcase) when it was recognized mistakenly as a different type (e.g., a suitcase) and to provide a set of possible grasping points by the HO when the system finds it difficult to recognize points that are beneficial for completing the task. When HO intervention is found to be beneficial, the system learns, and its dependence on the HO decreases.For testing the above, an advanced virtual reality (VR) telerobotic bag shaking system is proposed. It is assumed that several kinds of bags areplaced on a platform. All locks have been removed and latches and zippers opened. The task of the system is to empty the contents of an unknown bag onto the platform for subsequent scrutiny. It is assumed that the bag has already passed X-ray inspection to ensure the bag is not empty and does not contain obvious explosives (e.g., mines, gun bullets).
HO collaboration is conducted via a VR interface, which has an important role in the system. The HO either manipulates the 3D robot off-line, suggests solutions (e.g., the robot learns an optimal grasping location and avoids others) or changes and adds lifting and shaking policies on-line. When the robot encounters a situation it cannot handle, it relies on HO intervention. HO intervention will be exploited by the system to support the evolution of autonomy in two ways: first, by providing input to machine learning to support system adaptation, and second, by characterizing those situations when operator intervention is necessary when autonomous capabilities fail. Finally, measuring the amount of required operator intervention provides a metric for judging the system's level of autonomy -the less intervention, the higher the level of autonomy.