Figure 1 - uploaded by Kai Ploeger
Content may be subject to copyright.
The juggling movement consisting of four separate movements, which are repeated to achieve juggling of two balls with a single anthropomorphic manipulator: (a) throw ball 1, (b) catch ball 2, (c) throw ball 2, (d) catch ball one, (e) repeat.
Source publication
Robots that can learn in the physical world will be important to enable robots to escape their stiff and pre-programmed movements. For dynamic high-acceleration tasks, such as juggling, learning in the real-world is particularly challenging as one must push the limits of the robot and its actuation without harming the system, amplifying the necessi...
Context in source publication
Context 1
... desired juggling movement for two balls consists of four repeated movements, i.e., (a) throwing the first ball, (b) catching the second ball (c) throwing the second ball, and (d) catching the first ball ( Fig. 1). We define the switching points between these movements as the via-points of the policy and to achieve a limit-cycle, we keep repeating these via-points. The cyclic pattern is prepended with an initial stroke movement that quickly enters the limit cycle without dropping a ball. Applying the limit cycles PD-references from the start ...
Similar publications
In this paper, a Q-learning-based task scheduling approach for mixed-critical application on heterogeneous multi-cores (QSMix) to optimize their main design challenges is proposed. This approach employs reinforcement learning capabilities to optimize execution time, power consumption, reliability and temperature of the heterogeneous multi-cores dur...
For effective interactions with the open world, robots should understand how interactions with known and novel objects help them towards their goal. A key aspect of this understanding lies in detecting an object's affordances, which represent the potential effects that can be achieved by manipulating the object in various ways. Our approach leverag...
A rich representation is key to general robotic manipulation, but existing model architectures require a lot of data to learn it. Unfortunately, ideal robotic manipulation training data, which comes in the form of expert visuomotor demonstrations for a variety of annotated tasks, is scarce. In this work we propose PLEX, a transformer-based architec...
We introduce SPOT, an object-centric imitation learning framework. The key idea is to capture each task by an object-centric representation, specifically the SE(3) object pose trajectory relative to the target. This approach decouples embodiment actions from sensory inputs, facilitating learning from various demonstration types, including both acti...
Long-horizon contact-rich tasks are challenging to learn with reinforcement learning, due to ineffective exploration of high-dimensional state spaces with sparse rewards. The learning process often gets stuck in local optimum and demands task-specific reward fine-tuning for complex scenarios. In this work, we propose a structured framework that lev...
Citations
... As a result, independence is violated in task-space due to the nonlinear Forward Kinematics map of the robot. Note that this result applies to any motor primitive approach that relies on joint-space learning for task-space control (Ploeger et al. 2021;Ploeger and Peters 2022). ...
... In detail, embedding prior knowledge of physics laws or structural properties of dynamical systems into the design of a robot controller has proven to be a powerful technique for improving their computational efficiency and generalization capacity. These structures, commonly referred to as "inductive bias" (Helmbold and Long 2015), have been employed for faster learning speed, higher accuracy and better generalization (Schmidt and Lipson 2009;Nguyen-Tuong and Peters 2010;Greydanus et al. 2019;Ploeger et al. 2021;Nah et al. 2020Nah et al. , 2021Nah et al. , 2023. ...
... Research has demonstrated that the choice of action space can greatly impact the efficiency of motor learning and the quality of the resulting behavior (Peng and Van De Panne 2017;Martín-Martín et al. 2019). Furthermore, it has been shown that a judicious selection of pre-defined structures significantly accelerates motor learning, improving both speed and efficacy (Ploeger et al. 2021). The right inductive bias not only simplifies the learning process but also enables the robot to generalize across tasks more effectively. ...
Despite a slow neuromuscular system, humans easily outperform modern robot technology, especially in physical contact tasks. How is this possible? Biological evidence indicates that motor control of biological systems is achieved by a modular organization of motor primitives, which are fundamental building blocks of motor behavior. Inspired by neuro-motor control research, the idea of using simpler building blocks has been successfully used in robotics. Nevertheless, a comprehensive formulation of modularity for robot control remains to be established. In this paper, we introduce a modular framework for robot control using motor primitives. We present two essential requirements to achieve modular robot control: independence of modules and closure of stability. We describe key control modules and demonstrate that a wide range of complex robotic behaviors can be generated from this small set of modules and their combinations. The presented modular control framework demonstrates several beneficial properties for robot control, including task-space control without solving Inverse Kinematics, addressing the problems of kinematic singularity and kinematic redundancy, and preserving passivity for contact and physical interactions. Further advantages include exploiting kinematic singularity to maintain high external load with low torque compensation, as well as controlling the robot beyond its end-effector, extending even to external objects. Both simulation and actual robot experiments are presented to validate the effectiveness of our modular framework. We conclude that modularity may be an effective constructive framework for achieving robotic behaviors comparable to human-level performance.
... Dynamics tasks have always been challenging benchmarks for robotics and machine learning. For example, researchers have focused on dynamics dexterity games such as ball-in-a-cup [Kawato et al., 1994, Kober and, juggling [Ploeger et al., 2021, Ploeger and or diabolo [von Drigalski et al., 2021], but also on sports such as tennis [Zaidi et al., 2023], soccer [Haarnoja et al., 2024] and table tennis [Mülling et al., 2011, Büchler et al., 2022. Robotics tasks have also been already used as benchmarks for machine learning competitions, such as the Robot open-Ended Autonomous Learning competition (REAL) [Cartoni et al., 2020], Learn to Move [Song et al., 2021], the Real Robot Challenge [Gürtler et al., 2023, Funk et al., 2021, the TOTO Benchmark [Zhou et al., 2023], the MyoSuite Challenge [Caggiano et al., 2023] or the Home Robot Challenge [Yenamandra et al., 2023]. ...
Machine learning methods have a groundbreaking impact in many application domains, but their application on real robotic platforms is still limited. Despite the many challenges associated with combining machine learning technology with robotics, robot learning remains one of the most promising directions for enhancing the capabilities of robots. When deploying learning-based approaches on real robots, extra effort is required to address the challenges posed by various real-world factors. To investigate the key factors influencing real-world deployment and to encourage original solutions from different researchers, we organized the Robot Air Hockey Challenge at the NeurIPS 2023 conference. We selected the air hockey task as a benchmark, encompassing low-level robotics problems and high-level tactics. Different from other machine learning-centric benchmarks, participants need to tackle practical challenges in robotics, such as the sim-to-real gap, low-level control issues, safety problems, real-time requirements, and the limited availability of real-world data. Furthermore, we focus on a dynamic environment, removing the typical assumption of quasi-static motions of other real-world benchmarks. The competition's results show that solutions combining learning-based approaches with prior knowledge outperform those relying solely on data when real-world deployment is challenging. Our ablation study reveals which real-world factors may be overlooked when building a learning-based solution. The successful real-world air hockey deployment of best-performing agents sets the foundation for future competitions and follow-up research directions.
... Data-driven approaches optimized openloop movement primitives for two-ball one-handed juggling through trial and error. In [6], the movement primitive was updated in a model-based fashion, accounting for ballistics, and in one of our previous works [7], the movement primitive was updated in a black-box fashion, achieving close to two hours of sustained juggling. ...
... Including feedback on the ball's states has shown to be challenging. For instance, camera systems in [5] and [7] were used solely to detect ball drops. In [8], a combination of a hand-tuned throwing movement and a learned catching movement conditioned on the ball state resulted in 3-ball human-robot partner juggling. ...
Being widespread in human motor behavior, dynamic movements demonstrate higher efficiency and greater capacity to address a broader range of skill domains compared to their quasi-static counterparts. Among the frequently studied dynamic manipulation problems, robotic juggling tasks stand out due to their inherent ability to scale their difficulty levels to arbitrary extents, making them an excellent subject for investigation. In this study, we explore juggling patterns with mixed throw heights, following the vanilla siteswap juggling notation, which jugglers widely adopted to describe toss juggling patterns. This requires extending our previous analysis of the simpler cascade juggling task by a throw-height sequence planner and further constraints on the end effector trajectory. These are not necessary for cascade patterns but are vital to achieving patterns with mixed throw heights. Using a simulated environment, we demonstrate successful juggling of most common 3-9 ball siteswap patterns up to 9 ball height, transitions between these patterns, and random sequences covering all possible vanilla siteswap patterns with throws between 2 and 9 ball height. https://kai-ploeger.com/beyond-cascades
... Using MPs as policies also provides additional benefits, such as smooth trajectory generation and more consistent exploration . MP-based ERL approaches have demonstrated the ability to master complex manipulation tasks such as robot baseball (Peters & Schaal, 2008) and juggling (Ploeger et al., 2021). To further improve sample efficiency, Abdolmaleki et al. (2015) introduced a modelbased method to enable more sample-efficient black-box searching. ...
This work introduces Transformer-based Off-Policy Episodic Reinforcement Learning (TOP-ERL), a novel algorithm that enables off-policy updates in the ERL framework. In ERL, policies predict entire action trajectories over multiple time steps instead of single actions at every time step. These trajectories are typically parameterized by trajectory generators such as Movement Primitives (MP), allowing for smooth and efficient exploration over long horizons while capturing high-level temporal correlations. However, ERL methods are often constrained to on-policy frameworks due to the difficulty of evaluating state-action values for entire action sequences, limiting their sample efficiency and preventing the use of more efficient off-policy architectures. TOP-ERL addresses this shortcoming by segmenting long action sequences and estimating the state-action values for each segment using a transformer-based critic architecture alongside an n-step return estimation. These contributions result in efficient and stable training that is reflected in the empirical results conducted on sophisticated robot learning environments. TOP-ERL significantly outperforms state-of-the-art RL methods. Thorough ablation studies additionally show the impact of key design choices on the model performance.
... Nowadays, robots are capable of complex dynamic tasks such as table tennis [1,2], juggling [3,4] or diabolo [5], and play sports such as tennis [6] or soccer [7]. Current planning and learning methods are sufficient for most of these tasks, as the robot's movement is relatively free in the workspace, and they are not required to comply with stringent tasks, hardware, and safety constraints. ...
... The main limitation that MPs, in general, introduces is the reduction in the space of learnable policies. Fortunately, this seems to not be very restrictive as in the literature we can see a broad range of applications in which MPs provided sufficient flexibility [61,62,63,3,64,65,50,66]. In fact, the flexibility of every MP can be controlled by the number of basis functions used. ...
Trajectory planning under kinodynamic constraints is fundamental for advanced robotics applications that require dexterous, reactive, and rapid skills in complex environments. These constraints, which may represent task, safety, or actuator limitations, are essential for ensuring the proper functioning of robotic platforms and preventing unexpected behaviors. Recent advances in kinodynamic planning demonstrate that learning-to-plan techniques can generate complex and reactive motions under intricate constraints. However, these techniques necessitate the analytical modeling of both the robot and the entire task, a limiting assumption when systems are extremely complex or when constructing accurate task models is prohibitive. This paper addresses this limitation by combining learning-to-plan methods with reinforcement learning, resulting in a novel integration of black-box learning of motion primitives and optimization. We evaluate our approach against state-of-the-art safe reinforcement learning methods, showing that our technique, particularly when exploiting task structure, outperforms baseline methods in challenging scenarios such as planning to hit in robot air hockey. This work demonstrates the potential of our integrated approach to enhance the performance and safety of robots operating under complex kinodynamic constraints.
... Environmental changes, like humidity or temperature, alter the already complex contact dynamics, which adds to the complexity of manipulation tasks. Dynamic tasks, like juggling [2] and table tennis [3] involve making and breaking contact, demanding high precision and tolerating less inaccuracy due to rarer contacts. High speeds in these tasks necessitate greater accelerations and introduce a precision-speed tradeoff. ...
It has been a long-standing research goal to endow robot hands with human-level dexterity. Bi-manual robot piano playing constitutes a task that combines challenges from dynamic tasks, such as generating fast while precise motions, with slower but contact-rich manipulation problems. Although reinforcement learning based approaches have shown promising results in single-task performance, these methods struggle in a multi-song setting. Our work aims to close this gap and, thereby, enable imitation learning approaches for robot piano playing at scale. To this end, we introduce the Robot Piano 1 Million (RP1M) dataset, containing bi-manual robot piano playing motion data of more than one million trajectories. We formulate finger placements as an optimal transport problem, thus, enabling automatic annotation of vast amounts of unlabeled songs. Benchmarking existing imitation learning approaches shows that such approaches reach state-of-the-art robot piano playing performance by leveraging RP1M.
... Highly dynamic tasks not requiring reasoning through contacts have historically been used as a testbed for hardware and algorithms in robotics. These tasks include different types of games and sports, such as ball-in-a-cup [3], [4], juggling [5], [6], diabolo [7]. Dynamic tasks involving contacts, such as soccer [8], tennis [9], table tennis [10], [11], and air hockey [12], [13], are typically approached with reinforcement learning methods to off-load the computationally expensive reasoning through contacts to an offline exploration phase. ...
Planning robot contact often requires reasoning over a horizon to anticipate outcomes, making such planning problems computationally expensive. In this letter, we propose a learning framework for efficient contact planning in real-time subject to uncertain contact dynamics. We implement our approach for the example task of robot air hockey. Based on a learned stochastic model of puck dynamics, we formulate contact planning for shooting actions as a stochastic optimal control problem with a chance constraint on hitting the goal. To achieve online re-planning capabilities, we propose to train an energy-based model to generate optimal shooting plans in real time. The performance of the trained policy is validated %in experiments both in simulation and on a real-robot setup. Furthermore, our approach was tested in a competitive setting as part of the NeurIPS 2023 Robot Air Hockey Challenge.
... In robotics, RL has piqued the interest of researchers by learning intricate and impressive motor skills such as juggling (Ploeger et al., 2020), ball-in-cup (Schwab et al., 2019), in-hand manipulation (Andrychowicz et al., 2020), pick-andplace , or locomotion over highly variable terrain (Lee et al., 2020a). Although executing episodes on real robots is much more time-consuming than simulating Go games, it is still feasible and plays an important role in the acquisition of complex robot motor skills. ...
The requirement for a high number of training episodes has been a major limiting factor for the application of Reinforcement Learning (RL) in robotics. Learning skills directly on real robots requires time, causes wear and tear and can lead to damage to the robot and environment due to unsafe exploratory actions. The success of learning skills in simulation and transferring them to real robots has also been limited by the gap between reality and simulation. This is particularly problematic for tasks involving contact with the environment as contact dynamics are hard to model and simulate. In this paper we propose a framework which leverages a shared control framework for modeling known constraints defined by object interactions and task geometry to reduce the state and action spaces and hence the overall dimensionality of the reinforcement learning problem. The unknown task knowledge and actions are learned by a reinforcement learning agent by conducting exploration in the constrained environment. Using a pouring task and grid-clamp placement task (similar to peg-in-hole) as use cases and a 7-DoF arm, we show that our approach can be used to learn directly on the real robot. The pouring task is learned in only 65 episodes (16 min) and the grid-clamp placement task is learned in 75 episodes (17 min) with strong safety guarantees and simple reward functions, greatly alleviating the need for simulation.
... In robotics, RL has piqued the interest of researchers by learning intricate and impressive motor skills such as juggling [Ploeger et al (2020)], ball-in-cup [Schwab et al (2019)], in-hand manipulation [Andrychowicz et al (2020)], pick-and-place [Levine et al (2018)], or locomotion over highly variable terrain [Lee et al (2020a)]. Although executing trials on real robots is much more timeconsuming than simulating Go games, it is still Human SCT Robot RL Agent Fig. 1: In previous work, we developed Shared Control Templates (SCTs) to support human users in executing tasks of daily living, such as pouring liquids or opening drawers [Quere et al (2020)]. ...
The requirement for a high number of trials has been a major limiting factor for the application of Reinforcement Learning (RL) in robotics. Learning skills directly on real robots requires time, causes wear and tear and can lead to damage to the robot and environment due to unsafe exploratory actions. The success of learning skills in simulation and transferring them to real robots has also been limited by the gap between reality and simulation. This is particularly problematic for tasks involving contact with the environment as contact dynamics are hard to model and simulate. In this paper we propose a framework which leverages a shared control framework for modeling known constraints defined by object interactions and task geometry to reduce the state and action spaces and hence the overall dimensionality of the reinforcement learning problem. The unknown task knowledge and actions are learned by a reinforcement learning agent by conducting exploration in the constrained environment. Using a pouring task and grid-clamp placement task (similar to peg-in-hole) as use cases and a 7-DoF arm, we show that our approach can be used to learn directly on the real robot. The pouring task is learned in only 65 episodes (~16 minutes) and the grid-clamp placement task is learned in 75 episodes (~17 minutes) with strong safety guarantees and simple reward functions, greatly alleviating the need for simulation.
... The field of LfD explores how human demonstrations can be used to teach robots new skills [14,44,8,41,13,27]. LfD enables the agent to learn from a small set of examples, i.e., demonstrations provided by a teacher rather than learning from lengthy exploration, i.e., experience collected in an environment [6]. ...
... In this work, we studied the effect of domain experience and participant demographics on the quality of LfD demonstrations. Previous works have studied short horizon skills [28,33,22,37] and teaching robots tasks using experts [21,39,3,14,44,8,41,13,27]. In our work we allow users to refine their teaching strategies through demonstrator training. ...