Conference Paper

Curriculum-based Sensing Reduction in Simulation to Real-World Transfer for In-hand Manipulation

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Robotic manipulation challenges, such as grasping and object manipulation, have been tackled successfully with the help of deep reinforcement learning systems. We give an overview of the recent advances in deep reinforcement learning algorithms for robotic manipulation tasks in this review. We begin by outlining the fundamental ideas of reinforcement learning and the parts of a reinforcement learning system. The many deep reinforcement learning algorithms, such as value-based methods, policy-based methods, and actor-critic approaches, that have been suggested for robotic manipulation tasks are then covered. We also examine the numerous issues that have arisen when applying these algorithms to robotics tasks, as well as the various solutions that have been put forth to deal with these issues. Finally, we highlight several unsolved research issues and talk about possible future directions for the subject.
Article
Full-text available
We present a new tactile sensor intended for manipulation by mobile robots, for example in the home. The surface consists of an array of small, rounded bumps or "nibs", which provide reliable traction on objects like wet dishes. When the nibs contact a surface they deflect, and capacitive sensors measure the corresponding local normal and shear forces. A key feature of the sensor is the ability to reconfigure dynamically depending on which combinations of sensing elements it samples. By interrogating different combinations of elements the sensor can detect and distinguish between linear and rotational sliding, and other dynamic events such as making and breaking contact. These dynamic events, combined with sensing the grasp and load forces, are useful for acquiring objects and performing simple in-hand manipulations. The proposed slip detection method estimates minimum required grasping force with an error less than 1.5N and uses tactile controlled rotational slips to reorient an unknown weight/surface object with 78% success rate.
Article
Full-text available
We use reinforcement learning (RL) to learn dexterous in-hand manipulation policies that can perform vision-based object reorientation on a physical Shadow Dexterous Hand. The training is performed in a simulated environment in which we randomize many of the physical properties of the system such as friction coefficients and an object’s appearance. Our policies transfer to the physical robot despite being trained entirely in simulation. Our method does not rely on any human demonstrations, but many behaviors found in human manipulation emerge naturally, including finger gaiting, multi-finger coordination, and the controlled use of gravity. Our results were obtained using the same distributed RL system that was used to train OpenAI Five. We also include a video of our results: https://youtu.be/jwSbzNHGflM .
Conference Paper
Full-text available
In this paper, we investigate the use of Kalman filter to enable robust tracking based on an efficient pose estimation algorithm, namely the four-point algorithm. Pose estimation is very useful in vision-based system control, for example in automatic driving and virtual reality inputs. Firstly, we have implemented a four-point pose estimation method with a personal computer. This estimation algorithm is supposed to be the method that requires the least number of point features for the generation of a unique solution. On the contrary, existing three-point algorithms may give multiple solutions. Then we have adopted a Kalman filter to enable robust tracking. Kalman filter is computationally efficient and very good at handling noise during tracking. The merge of these two techniques make us able to build a high-speed and yet robust system to be used in a wide variety of real applications. Furthermore, we have shown that a linear Kalman filter can be applied to filter off noises directly from the results of the four-point algorithm. Simulated and real data tests were performed and the results were satisfactory.
Conference Paper
Full-text available
Autonomously learning a complex task takes a very long time for Reinforcement Learning (RL) agents. One way to learn faster is by dividing a complex task into several simple subtasks and organizing them into a Curriculum that guides Transfer Learning (TL) methods to reuse knowledge in a convenient sequence. However, previous works do not take into account the TL method to build specialized Curricula, leaving the burden of a careful subtask selection to a human. We here contribute novel procedures for: (i) dividing the target task into simpler ones under minimal human supervision ; (ii) automatically generating Curricula based on object-oriented task descriptions; and (iii) using generated Curricula for reusing knowledge across tasks. Our experiments show that our proposal achieves a better performance using both manually given and generated subtasks when compared to the state-of-the-art technique in two different domains.
Article
Full-text available
Many relevant tasks require an agent to reach a certain state, or to manipulate objects into a desired configuration. For example, we might want a robot to align and assemble a gear onto an axle or insert and turn a key in a lock. These tasks present considerable difficulties for reinforcement learning approaches, since the natural reward function for such goal-oriented tasks is sparse and prohibitive amounts of exploration are required to reach the goal and receive a learning signal. Past approaches tackle these problems by manually designing a task-specific reward shaping function to help guide the learning. Instead, we propose a method to learn these tasks without requiring any prior task knowledge other than obtaining a single state in which the task is achieved. The robot is trained in "reverse", gradually learning to reach the goal from a set of starting positions increasingly far from the goal. Our method automatically generates a curriculum of starting positions that adapts to the agent's performance, leading to efficient training on such tasks. We demonstrate our approach on difficult simulated fine-grained manipulation problems, not solvable by state-of-the-art reinforcement learning methods.
Article
Full-text available
Objective: The current status of human-robot interaction (HRI) is reviewed, and key current research challenges for the human factors community are described. Background: Robots have evolved from continuous human-controlled master-slave servomechanisms for handling nuclear waste to a broad range of robots incorporating artificial intelligence for many applications and under human supervisory control. Methods: This mini-review describes HRI developments in four application areas and what are the challenges for human factors research. Results: In addition to a plethora of research papers, evidence of success is manifest in live demonstrations of robot capability under various forms of human control. Conclusions: HRI is a rapidly evolving field. Specialized robots under human teleoperation have proven successful in hazardous environments and medical application, as have specialized telerobots under human supervisory control for space and repetitive industrial tasks. Research in areas of self-driving cars, intimate collaboration with humans in manipulation tasks, human control of humanoid robots for hazardous environments, and social interaction with robots is at initial stages. The efficacy of humanoid general-purpose robots has yet to be proven. Applications: HRI is now applied in almost all robot tasks, including manufacturing, space, aviation, undersea, surgery, rehabilitation, agriculture, education, package fetch and delivery, policing, and military operations.
Article
Full-text available
This paper presents an easy means to produce a 3-axis Hall effect–based skin sensor for robotic applications. It uses an off-the-shelf chip and is physically small and provides digital output. Furthermore, the sensor has a soft exterior for safe interactions with the environment; in particular it uses soft silicone with about an 8 mm thickness. Tests were performed to evaluate the drift due to temperature changes, and a compensation using the integral temperature sensor was implemented. Furthermore, the hysteresis and the crosstalk between the 3-axis measurements were evaluated. The sensor is able to detect minimal forces of about 1 gf. The sensor was calibrated and results with total forces up to 1450 gf in the normal and tangential directions of the sensor are presented. The test revealed that the sensor is able to measure the different components of the force vector.
Article
Full-text available
Tactile sensing is an essential element of autonomous dexterous robot hand manipulation. It provides information about forces of interaction and surface properties at points of contact between the robot fingers and the objects. Recent advancements in robot tactile sensing led to development of many computational techniques that exploit this important sensory channel. This paper reviews current state-of-the-art of manipulation and grasping applications that involve artificial sense of touch and discusses pros and cons of each technique. The main issues of artificial tactile sensing are addressed. General requirements of a tactile sensor are briefly discussed and the main transduction technologies are analyzed. Twenty eight various tactile sensors, each integrated into a robot hand, are classified in accordance with their transduction types and applications. Previously issued reviews are focused on hardware part of tactile sensors, whereas we present an overview of algorithms and tactile feedback-based control systems that exploit signals from the sensors. The applications of these algorithms include grasp stability estimation, tactile object recognition, tactile servoing and force control. Drawing from advancements in tactile sensing technology and taking into consideration its drawbacks, this paper outlines possible new directions of research in dexterous manipulation.
Conference Paper
Full-text available
We propose a new framework for learning the world dynamics of feature-rich environments in model-based reinforcement learning. The main idea is formalized as a new, factored state-transition representation that supports efficient online-learning of the relevant features. We construct the transition models through predicting how the actions change the world. We introduce an online sparse coding learning technique for feature selection in high-dimensional spaces. We derive theoretical guarantees for our framework and empirically demonstrate its practicality in both simulated and real robotics domains.
Conference Paper
Full-text available
The human fingertip is exquisitely sensitive to vibrations that are essential to detect slip and discriminate textures. Achieving similar functions with prosthetic and robotic hands will require tactile sensors with similar sensitivity. Many technologies have been developed to sense such vibrations, yet none have achieved the requisite sensitivity in a package that is robust enough to meet practical applications. The BioTac®, developed by the authors, uses an incompressible liquid as an acoustic conductor to convey vibrations from the skin to a wide bandwidth pressure transducer located deep in the rigid core of the mechatronic finger, where it is protected from damage. Signal conditioning electronics were designed to achieve sensitivity down to the theoretical noise floor of the transducer, making the device very sensitive to the smallest of vibrations, even sound. We demonstrate here that this device exceeds human performance in detecting sustained vibrations (capable of sensing vibrations as small as a few nanometers at ~330Hz) as well as very small transient events that arise when small particles are dropped on the finger. This overcomes the supposition that such sensitivity requires fragile sensory elements to reside near the vulnerable contact surfaces.
Article
Full-text available
Whereas before 2006 it appears that deep multilayer neural networks were not successfully trained, since then several algorithms have been shown to successfully train them, with experimental results showing the superiority of deeper vs less deep architectures. All these experimental results were obtained with new initialization or training mechanisms. Our objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future. We first observe the influence of the non-linear activations functions. We find that the logistic sigmoid activation is unsuited for deep networks with random initialization because of its mean value, which can drive especially the top hidden layer into saturation. Surprisingly, we find that saturated units can move out of saturation by themselves, albeit slowly, and explaining the plateaus sometimes seen when training neural networks. We find that a new non-linearity that saturates less can often be beneficial. Finally, we study how activations and gradients vary across layers and during training, with the idea that training may be more difficult when the singular values of the Jacobian associated with each layer are far from 1. Based on these considerations, we propose a new initialization scheme that brings substantially faster convergence.
Article
Full-text available
In stark contrast to the inspiring functionality of the natural hand, limitations of current upper limb prostheses stemming from marginal feedback control, challenges of mechanical design, and lack of sensory capacity, are well-established. This paper provides a critical review of current sensory systems and the potential of a selection of electroactive polymers for sensory applications in hand prostheses. Candidate electroactive polymers are reviewed in terms of their relevant advantages and disadvantages, together with their current implementation in related applications. Empirical analysis of one of the most novel electroactive polymers, ionic polymer metal composites (IPMC), was conducted to demonstrate its potential for prosthetic applications. With linear responses within the operating range typical of hand prostheses, bending angles, and bending rates were accurately measured with 4.4+/-2.5 and 4.8+/-3.5% error, respectively, using the IPMC sensors. With these comparable error rates to traditional resistive bend sensors and a wide range of sensitivities and responses, electroactive polymers offer a promising alternative to more traditional sensory approaches. Their potential role in prosthetics is further heightened by their flexible and formable structure, and their ability to act as both sensors and actuators.
Conference Paper
Full-text available
We introduce a stochastic model for dialogue systems based on Markov decision process. Within this framework we show that the problem of dialogue strategy design can be stated as an optimization problem, and solved by a variety of methods, including the reinforcement learning approach. The advantages of this new paradigm include objective evaluation of dialogue systems and their automatic design and adaptation. We show some preliminary results on learning a dialogue strategy for an air travel information system
Article
Does progress in simulation translate to progress on robots? If one method outperforms another in simulation, how likely is that trend to hold in reality on a robot? We examine this question for embodied PointGoal navigation - developing engineering tools and a research paradigm for evaluating a simulator by its sim2real predictivity. First, we develop Habitat-PyRobot Bridge (HaPy), a library for seamless execution of identical code on simulated agents and robots - transferring simulation-trained agents to a LoCoBot platform with a one-line code change. Second, we investigate the sim2real predictivity of Habitat-Sim M. Savva et al., for PointGoal navigation. We 3D-scan a physical lab space to create a virtualized replica, and run parallel tests of 9 different models in reality and simulation. We present a new metric called Sim-vs-Real Correlation Coefficient (SRCC) to quantify predictivity. We find that SRCC for Habitat as used for the CVPR19 challenge is low (0.18 for the success metric), suggesting that performance differences in this simulator-based challenge do not persist after physical deployment. This gap is largely due to AI agents learning to exploit simulator imperfections - abusing collision dynamics to 'slide' along walls, leading to shortcuts through otherwise non-navigable space. Naturally, such exploits do not work in the real world. Our experiments show that it is possible to tune simulation parameters to improve sim2real predictivity (e.g. improving SRCC Succ from 0.18 to 0.844) - increasing confidence that in-simulation comparisons will translate to deployed systems in reality.
Article
The purpose of this technical report is two-fold. First of all, it introduces a suite of challenging continuous control tasks (integrated with OpenAI Gym) based on currently existing robotics hardware. The tasks include pushing, sliding and pick & place with a Fetch robotic arm as well as in-hand object manipulation with a Shadow Dexterous Hand. All tasks have sparse binary rewards and follow a Multi-Goal Reinforcement Learning (RL) framework in which an agent is told what to do using an additional input. The second part of the paper presents a set of concrete research ideas for improving RL algorithms, most of which are related to Multi-Goal RL and Hindsight Experience Replay.
Article
Deep reinforcement learning (RL) has proven a powerful technique in many sequential decision making domains. However, Robotics poses many challenges for RL, most notably training on a physical system can be expensive and dangerous, which has sparked significant interest in learning control policies using a physics simulator. While several recent works have shown promising results in transferring policies trained in simulation to the real world, they often do not fully utilize the advantage of working with a simulator. In this work, we exploit the full state observability in the simulator to train better policies which take as input only partial observations (RGBD images). We do this by employing an actor-critic training algorithm in which the critic is trained on full states while the actor (or policy) gets rendered images as input. We show experimentally on a range of simulated tasks that using these asymmetric inputs significantly improves performance. Finally, we combine this method with domain randomization and show real robot experiments for several tasks like picking, pushing, and moving a block. We achieve this simulation to real world transfer without training on any real world data.
Article
Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of reinforcement learning, then progress to the main streams of value-based and policy-based methods. Our survey will cover central algorithms in deep reinforcement learning, including the deep Q-network, trust region policy optimisation, and asynchronous advantage actor-critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning. To conclude, we describe several current areas of research within the field.
Conference Paper
Transfer learning in reinforcement learning is an area of research that seeks to speed up or improve learning of a complex target task, by leveraging knowledge from one or more source tasks. This thesis will extend the concept of transfer learning to curriculum learning, where the goal is to design a sequence of source tasks for an agent to train on, such that final performance or learning speed is improved. We discuss completed work on this topic, including methods for semi-automatically generating source tasks tailored to an agent and the characteristics of a target domain, and automatically sequencing such tasks into a curriculum. Finally, we also present ideas for future work.
Article
We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call proximal policy optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically). Our experiments test PPO on a collection of benchmark tasks, including simulated robotic locomotion and Atari game playing, and we show that PPO outperforms other online policy gradient methods, and overall strikes a favorable balance between sample complexity, simplicity, and wall-time.
Article
OpenAI Gym is a toolkit for reinforcement learning research. It includes a growing collection of benchmark problems that expose a common interface, and a website where people can share their results and compare the performance of algorithms. This whitepaper discusses the components of OpenAI Gym and the design decisions that went into the software.
Conference Paper
The Augmented Reality (AR) is an expanding field of the Computer Graphics (CG) that merges items of the real-world environment (e.g., places, objects) with digital information (e.g., multimedia files, virtual objects) to provide users with an enhanced interactive multi-sensorial experience of the real-world that surrounding them. Currently, a wide range of devices is used to vehicular AR systems. Common devices (e.g., cameras equipped on smartphones) enable users to receive multimedia information about target objects (non-immersive AR). Advanced devices (e.g., virtual windscreens) provide users with a set of virtual information about points of interest (POIs) or places (semi-immersive AR). Finally, an ever-increasing number of new devices (e.g., HeadMounted Display, HMD) support users to interact with mixed reality environments (immersive AR). This paper presents a practical framework for the development of non-immersive augmented reality applications through which target objects are enriched with multimedia information. On each target object is applied a different ArUco marker. When a specific application hosted inside a device recognizes, via camera, one of these markers, then the related multimedia information are loaded and added to the target object. The paper also reports a complete case study together with some considerations on the framework and future work.
Conference Paper
Robust manipulation with a dexterous robot hand is a grand challenge of robotics. Impressive levels of dexterity can be achieved through teleoperation. However, teleoperation devices such as a glove or force reflecting master-slave system can be expensive and can tie the robot down to a restricted workspace. We observe that inexpensive and widely available multi-touch interfaces can achieve excellent performance for a large range of telemanipulation tasks, making dexterous robot telemanipulation broadly accessible. Our key insight is that dexterous grasping and manipulation interactions frequently focus on precise control of the fingertips in a plane. Following this observation, our novel multi-touch interface focuses on reliable replication of planar fingertip trajectories, making previously difficult actions such as grasping, dragging, reorienting, rolling, and smoothing as intuitive as miming the action on a multi-touch surface. We demonstrate and evaluate these and other interactions using an iPad interface to a Shadow Hand mounted on a Motoman SDA10 robot.
Article
Assistive robotics is an increasingly popular research field, which has led to a large number of commercial and noncommercial systems aimed at assisting physically impaired or elderly users in the activities of daily living. In this article, we propose five criteria based on robotic arm usage scenarios and surveys with which assistive robotic arms can be classified. Different possibilities and implementations to obtain each criterion are treated, and examples of current assistive robotic arms are given. Implementations and systems are discussed and rated qualitatively, which leads to the observation that variable stiffness actuation offers great benefits for assistive robotic systems despite an increase in the overall complexity.
The ingredients of real-world robotic reinforcement learning
  • Zhu
Solving rubik’s cube with a robot hand
  • A I Open
  • I Akkaya
  • M Andrychowicz
Isaac gym: High performance gpu-based physics simulation for robot learning
  • Makoviychuk
Curriculum learning for reinforcement learning domains: A framework and survey
  • S Narvekar
  • B Peng
  • M Leonetti
  • J Sinapov
  • M E Taylor
  • P Stone
Tactile sensing and deep reinforcement learning for in-hand manipulation tasks
  • A Melnik
  • L Lach
  • M Plappert
  • T Korthals
  • R Haschke
  • H Ritter
Sim-to-real reinforcement learning for deformable object manipulation
  • J Matas
  • S James
  • A J Davison
Learning curriculum policies for reinforcement learning
  • S Narvekar
  • P Stone
Tactile sensing and deep reinforcement learning for in-hand manipulation tasks
  • Melnik
Solving rubik’s cube with a robot hand
  • AI
Sim-to-real reinforcement learning for deformable object manipulation
  • Matas
Network randomization: A simple technique for generalization in deep reinforcement learning
  • Lee
Reverse curriculum generation for reinforcement learning
  • C Florensa
  • D Held
  • M Wulfmeier
  • M Zhang