Copy reference, caption or embed code
Figure 3 - High Acceleration Reinforcement Learning for Real-World Juggling with Binary Rewards

(a) Mean juggling duration of final policies learned with varying batch sizes N. The maximal juggling duration is 10s. (b) Comparison of the learned and hand-tuned policy on each 30 episodes with a maximum duration of 120s on the real system. The learned policy achieves an average juggling duration of 106.82s while the hand-tuned policy achieves 66.52s.
Reference
Caption
Embed code