Fig 2 - uploaded by Manuel Mazzara
Content may be subject to copyright.
Tendency graph for the difference between the agent with PR correcting learning on outputs and one correcting learning on all layers
Source publication
Catastrophic forgetting has a significant negative impact in reinforcement learning. The purpose of this study is to investigate how pseudorehearsal can change performance of an actor-critic agent with neural-network function approximation. We tested agent in a pole balancing task and compared different pseudorehearsal approaches. We have found tha...
Context in source publication
Context 1
... The comparison of samples with same parameters was provided by computing a difference vector, plotting its graph and plotting difference method tendency graph. For all parameters the visual evaluation has shown the same thing: FR PR applied to all layers have shown a much better result; all the tendencies graphs looked nearly the same. On Fig. 2 you may see the example for FR PR 30 Rel 10. As we can see almost everywhere the tendency curve of difference is above zero. The performance of the FR PR agent with weight correction applied to all layers is higher. The results of significance test has shown that t − stat ≈ 43.77 >> t. The critical one-tail ≈ 1.645 which means that ...
Similar publications
Deep learning with noisy labels is challenging as deep neural networks have the high capacity to memorize the noisy labels. In this paper, we propose a learning algorithm called Co-matching, which balances the consistency and divergence between two networks by augmentation anchoring. Specifically, we have one network generate anchoring label from i...
Citations
... There have been numerous articles that have used pseudo-rehearsal to stabilise reinforcement learning while training on a single task (e.g. [27], [28], [29]). When learning a single task from only the current state of the environment, CF can occur as the network learns how it should act in the current state of the environment but does not rehearse previous states and thus, forgets how it should act in those cases. ...
... This has been done by randomly generating input items from basic distributions (e.g. uniform) [27], [28] and a similar idea has been accomplished in actor-critic networks [29]. However, these tasks were very simple reinforcement tasks and did not utilise deep generative structures for generating pseudo-items or convolutional network architectures. ...
Neural networks can achieve extraordinary results on a wide variety of tasks. However, when they attempt to sequentially learn a number of tasks, they tend to learn the new task while destructively forgetting previous tasks. One solution to this problem is pseudo-rehearsal, which involves learning the new task while rehearsing generated items representative of previous task/s. We demonstrate that pairing pseudo-rehearsal methods with a generative network is an effective solution to this problem in reinforcement learning. Our method iteratively learns three Atari 2600 games while retaining above human level performance on all three games, performing similar to a network which rehearses real examples from all previously learnt tasks.
Error motion trajectory data are routinely collected on multi-axis machine tools to assess their operational state. There is a wealth of literature devoted to advances in modelling, identification and correction using such data, as well as the collection and processing of alternative data streams for the purpose of machine tool condition monitoring. Until recently, there has been minimal focus on combining these two related fields. This paper presents a general approach to identifying both kinematic and non-kinematic faults in error motion trajectory data, by framing the issue as a generic pattern recognition problem. Because of the typically-sparse nature of datasets in this domain – due to their infrequent, offline collection procedures – the foundation of the approach involves training on a purely simulated dataset, which defines the theoretical fault-states observable in the trajectories. Ensemble methods are investigated and shown to improve the generalisation ability when predicting on experimental data. Machine tools often have unique ‘signatures’ which can significantly-affect their error motion trajectories, which are largely repeatable, but specific to the individual machine. As such, experimentally-obtained data will not necessarily be easily defined in a theoretical simulation. A transfer learning approach is introduced to incorporate experimentally-obtained error motion trajectories into classifiers which were trained primarily on a simulation domain. The approach was shown to significantly improve experimental test set performance, whilst also maintaining all theoretical information learned in the initial, simulation-only training phase. The ultimate approach represents a viable and powerful automated classifier for error motion trajectory data, which can encode theoretical fault-states with efficacy whilst also remain adaptable to machine-specific signatures.