Copy reference, caption or embed code

Figure 3 - High Acceleration Reinforcement Learning for Real-World Juggling with Binary Rewards

Figure 3: (a) Mean juggling duration of final policies learned with varying batch sizes N. The maximal juggling duration is 10s. (b) Comparison of the learned and hand-tuned policy on each 30 episodes with a maximum duration of 120s on the real system. The learned policy achieves an average juggling duration of 106.82s while the hand-tuned policy achieves 66.52s.
(a) Mean juggling duration of final policies learned with varying batch sizes N. The maximal juggling duration is 10s. (b) Comparison of the learned and hand-tuned policy on each 30 episodes with a maximum duration of 120s on the real system. The learned policy achieves an average juggling duration of 106.82s while the hand-tuned policy achieves 66.52s.
Go to figure page
Reference
Caption
Embed code