Fig 1 - uploaded by Manuel Mazzara
Content may be subject to copyright.
Double-pole cart

Double-pole cart

Source publication
Preprint
Full-text available
Catastrophic forgetting has a significant negative impact in reinforcement learning. The purpose of this study is to investigate how pseudorehearsal can change performance of an actor-critic agent with neural-network function approximation. We tested agent in a pole balancing task and compared different pseudorehearsal approaches. We have found tha...

Contexts in source publication

Context 1
... apply PR algorithms to actor-critic agent executing the single-pole cart balancing problem, a well-known reinforce- ment learning task mentioned by Sutton [3] and extended further (Fig. 1). The task is to balance a pole installed on a cart for as long as it is possible by pushing the cart left or right. If any pole falls or cart leaves the track the game is failed and the agent receives the reward R=-1. The output of this experiment is the number of steps the gent balanced the pole in an episode, the bigger this number ...
Context 2
... plotted tendencies graph and graph of smoothed minimums denoted by the following rule: i th element of minimums vector is min(i th , i + 100 th ) from original sample. Both graphs grow coherently, and are expected to converge to some optimal policy with high performance (Fig.10). The batch PR approach worked far worse than FR PR and worse than in case of an environment with highly risky actions. ...
Context 3
... value of mean is about 140 vs 163 which is 1.15 times lower. As well the batch PR agent seems to reach its optimal policy and neither learning nor relearning occurs any more-it keeps oscillating around same value for most part of experiment-except initial learning at the very beginning of learning series (Fig.11). The T-test proved this difference to be significant. ...
Context 4
... batch PR agent seems to reach its optimal policy and neither learning nor relearning occurs any more-it keeps oscillating around same value for most part of experiment-except initial learning at the very beginning of learning series (Fig.11). The T-test proved this difference to be significant. Applying the learning rate correction has smoothed Fig. 11. Performance graph for batch-backpropagation approach learning curve and improved learning with a significance (T- Stat) about 4.5. It's interesting to note, that unlike all previous approaches it did not improve the mean of performance much. Improvement was neglectable-from 56.39 to 56.99. This PR approach decreased variance from ...

Similar publications

Preprint
Full-text available
Deep learning with noisy labels is challenging as deep neural networks have the high capacity to memorize the noisy labels. In this paper, we propose a learning algorithm called Co-matching, which balances the consistency and divergence between two networks by augmentation anchoring. Specifically, we have one network generate anchoring label from i...

Citations

... There have been numerous articles that have used pseudo-rehearsal to stabilise reinforcement learning while training on a single task (e.g. [27], [28], [29]). When learning a single task from only the current state of the environment, CF can occur as the network learns how it should act in the current state of the environment but does not rehearse previous states and thus, forgets how it should act in those cases. ...
... This has been done by randomly generating input items from basic distributions (e.g. uniform) [27], [28] and a similar idea has been accomplished in actor-critic networks [29]. However, these tasks were very simple reinforcement tasks and did not utilise deep generative structures for generating pseudo-items or convolutional network architectures. ...
Preprint
Full-text available
Neural networks can achieve extraordinary results on a wide variety of tasks. However, when they attempt to sequentially learn a number of tasks, they tend to learn the new task while destructively forgetting previous tasks. One solution to this problem is pseudo-rehearsal, which involves learning the new task while rehearsing generated items representative of previous task/s. We demonstrate that pairing pseudo-rehearsal methods with a generative network is an effective solution to this problem in reinforcement learning. Our method iteratively learns three Atari 2600 games while retaining above human level performance on all three games, performing similar to a network which rehearses real examples from all previously learnt tasks.
Article
Full-text available
Error motion trajectory data are routinely collected on multi-axis machine tools to assess their operational state. There is a wealth of literature devoted to advances in modelling, identification and correction using such data, as well as the collection and processing of alternative data streams for the purpose of machine tool condition monitoring. Until recently, there has been minimal focus on combining these two related fields. This paper presents a general approach to identifying both kinematic and non-kinematic faults in error motion trajectory data, by framing the issue as a generic pattern recognition problem. Because of the typically-sparse nature of datasets in this domain – due to their infrequent, offline collection procedures – the foundation of the approach involves training on a purely simulated dataset, which defines the theoretical fault-states observable in the trajectories. Ensemble methods are investigated and shown to improve the generalisation ability when predicting on experimental data. Machine tools often have unique ‘signatures’ which can significantly-affect their error motion trajectories, which are largely repeatable, but specific to the individual machine. As such, experimentally-obtained data will not necessarily be easily defined in a theoretical simulation. A transfer learning approach is introduced to incorporate experimentally-obtained error motion trajectories into classifiers which were trained primarily on a simulation domain. The approach was shown to significantly improve experimental test set performance, whilst also maintaining all theoretical information learned in the initial, simulation-only training phase. The ultimate approach represents a viable and powerful automated classifier for error motion trajectory data, which can encode theoretical fault-states with efficacy whilst also remain adaptable to machine-specific signatures.