Baruch Tabanpour’s research while affiliated with Google Inc. and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (7)


MuJoCo Playground
  • Preprint
  • File available

February 2025

·

76 Reads

Kevin Zakka

·

Baruch Tabanpour

·

Qiayuan Liao

·

[...]

·

We introduce MuJoCo Playground, a fully open-source framework for robot learning built with MJX, with the express goal of streamlining simulation, training, and sim-to-real transfer onto robots. With a simple "pip install playground", researchers can train policies in minutes on a single GPU. Playground supports diverse robotic platforms, including quadrupeds, humanoids, dexterous hands, and robotic arms, enabling zero-shot sim-to-real transfer from both state and pixel inputs. This is achieved through an integrated stack comprising a physics engine, batch renderer, and training environments. Along with video results, the entire framework is freely available at playground.mujoco.org

Download


Fig. 3: Example of a generated sequence obtained with a model that includes displacement features and displacement loss (bottom row), and without either of these elements (middle row). We highlight inconsistencies of the joint positions with the red circles.
Fig. 4: Experiments with the dynamics network: training subsequence length N h (a), ablations of the joint displacement feature and loss (b), ablations of non-linearity type and learning rate schedule (c), and evaluation of variants for the contact network (d). Error bars show one standard deviation calculated over 5 runs. The x-axis sweeps the time window over which metrics are computed.
Fig. 5: Evaluation of LARP on datasets with colliding objects.
Fig. 6: Left: Reconstructed 3d poses on four consecutive video frames on AIST-hard dataset. Middle row shows results obtained with the kinematic pipeline from [13, 18] that LARP uses for initialization. Bottom row show results obtained with LARP integrated into [13]. Middle: Motion sequence with person-ball collision simulated with LARP (bottom) and comparison to Bullet engine [8] (top). Right: Examples of generated human motion sequences of a person kicking a ball for three different ball targets. In each image we show position of the ball right after the kick and at the end of the sequence. Note that the person pose differs considerably depending on the ball target.
Fig. 9: Example of estimated pose from "S9-WalkDog" seq. after 11 sec. of input. Left: input frame, middle: result obtained with SuperTrack trained on longer sequences, right: result obtained with LARP.
Learned Neural Physics Simulation for Articulated 3D Human Pose Reconstruction

October 2024

·

12 Reads

We propose a novel neural network approach, LARP (Learned Articulated Rigid body Physics), to model the dynamics of articulated human motion with contact. Our goal is to develop a faster and more convenient methodological alternative to traditional physics simulators for use in computer vision tasks such as human motion reconstruction from video. To that end we introduce a training procedure and model components that support the construction of a recurrent neural architecture to accurately simulate articulated rigid body dynamics. Our neural architecture supports features typically found in traditional physics simulators, such as modeling of joint motors, variable dimensions of body parts, contact between body parts and objects, and is an order of magnitude faster than traditional systems when multiple simulations are run in parallel. To demonstrate the value of LARP we use it as a drop-in replacement for a state of the art classical non-differentiable simulator in an existing video-based reconstruction framework and show comparative or better 3D human pose reconstruction accuracy.




Barkour: Benchmarking Animal-level Agility with Quadruped Robots

May 2023

·

426 Reads

·

2 Citations

Animals have evolved various agile locomotion strategies, such as sprinting, leaping, and jumping. There is a growing interest in developing legged robots that move like their biological counterparts and show various agile skills to navigate complex environments quickly. Despite the interest, the field lacks systematic benchmarks to measure the performance of control policies and hardware in agility. We introduce the Barkour benchmark, an obstacle course to quantify agility for legged robots. Inspired by dog agility competitions, it consists of diverse obstacles and a time based scoring mechanism. This encourages researchers to develop controllers that not only move fast, but do so in a controllable and versatile way. To set strong baselines, we present two methods for tackling the benchmark. In the first approach, we train specialist locomotion skills using on-policy reinforcement learning methods and combine them with a high-level navigation controller. In the second approach, we distill the specialist skills into a Transformer-based generalist locomotion policy, named Locomotion-Transformer, that can handle various terrains and adjust the robot's gait based on the perceived environment and robot states. Using a custom-built quadruped robot, we demonstrate that our method can complete the course at half the speed of a dog. We hope that our work represents a step towards creating controllers that enable robots to reach animal-level agility.


Figure 1: Overview of various POIR components and their dependencies. The BC policy and world model are trained from the expert data, whereas the imitation reward can depend either on the expert data, the world model's ensemble discrepancy, or the BC policy's ensemble discrepancy.
Figure 2: For noise of σ noise = 0.4 shown above, the arm can start in positions much further from the center of the table compared to default noise of σ noise = 0.2. Examples for all environments are shown in Appendix E. ministic and contain two sources of variation by default: the initial position of the robotic arm and the initial position of the object to grasp.
Figure 7: Randomly sampled initial states for Robosuite environments. For all figures, the top row contains initial states with the default initialization noise of 0.02, while the bottom row contains initial states with initialization noise of 0.4. Notice in Figure 7a that with higher initialization noise, the robotic arm is less likely to start in positions directly above the block.
Get Back Here: Robust Imitation by Return-to-Distribution Planning

May 2023

·

86 Reads

We consider the Imitation Learning (IL) setup where expert data are not collected on the actual deployment environment but on a different version. To address the resulting distribution shift, we combine behavior cloning (BC) with a planner that is tasked to bring the agent back to states visited by the expert whenever the agent deviates from the demonstration distribution. The resulting algorithm, POIR, can be trained offline, and leverages online interactions to efficiently fine-tune its planner to improve performance over time. We test POIR on a variety of human-generated manipulation demonstrations in a realistic robotic manipulation simulator and show robustness of the learned policy to different initial state distributions and noisy dynamics.

Citations (2)


... A large body of recent work has focused on using LLMs to generate robot behaviors [16], [17], [6], [18], [19], [8]. Some works focus on using LLMs to generate long-horizon sequences of actions for robots [17], [16], and more recent work focuses on improving existing LLMs for robot-specific tasks [20]. Most relevant to our work is literature that explore generating reward functions directly from language instructions or corrections [5], [21], [6]. ...

Reference:

Efficiently Generating Expressive Quadruped Behaviors via Language-Guided Preference Learning
Learning to Learn Faster from Human Feedback with Language Model Predictive Control
  • Citing Conference Paper
  • July 2024

... Learning-based methods are fully automated and the controllers can be optimized in an end-to-end fashion from robot sensor readings to motor control signals. For example, simulation-based deep reinforcement learning (RL) has been applied in learning legged locomotion over various terrains [1][2][3][4][5][6][7][8][9][10][11] . These approaches generally adopt deep RL algorithms to train locomotion tasks in simulation and then apply the trained controllers to legged robots in reality. ...

Barkour: Benchmarking Animal-level Agility with Quadruped Robots