Acquiring a Government Bond Trading Strategy Using Reinforcement Learning.
JACIII 01/2009; 13:691-696.
- [Show abstract] [Hide abstract]
ABSTRACT: In reinforcement learning of long-term tasks, learning efficiency may deteriorate when an agent's probabilistic actions cause too many mistakes before task learning reaches its goal. The new type of state we propose --- fixed mode --- to which a normal state shifts if it has already received sufficient reward --- chooses an action based on a greedy strategy, eliminating randomness of action selection and increasing efficiency. We start by proposing the combining of an algorithm with penalty avoiding rational policy making and online profit sharing with fixed mode states. We then discuss the target system and learning-controller design. In simulation, the learning task involves stabilizing of biped walking by using the learning controller to modify a robot's waist trajectory. We then discuss simulation results and the effectiveness of our proposal.
Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. The impact factor represents a rough estimation of the journal's impact factor and does not reflect the actual current impact factor. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.