Figure - uploaded by Sepp Hochreiter
Content may be subject to copyright.
Figure A3: LSTM memory cell without peepholes. z is the vector of cell input activations, i is the vector of input gate activations, f is the vector of forget gate activations, c is the vector of memory cell states, o is the vector of output gate activations, and y is the vector of cell output activations. The activation functions are g for the cell input, h for the cell state, and σ for the gates. Data flow is either "feed-forward" without delay or "recurrent" with a one-step delay. "Input" connections are from the external input to the LSTM network, while "recurrent" connections take inputs from other memory cells and hidden units of the LSTM network with a delay of one time step. 

Figure A3: LSTM memory cell without peepholes. z is the vector of cell input activations, i is the vector of input gate activations, f is the vector of forget gate activations, c is the vector of memory cell states, o is the vector of output gate activations, and y is the vector of cell output activations. The activation functions are g for the cell input, h for the cell state, and σ for the gates. Data flow is either "feed-forward" without delay or "recurrent" with a one-step delay. "Input" connections are from the external input to the LSTM network, while "recurrent" connections take inputs from other memory cells and hidden units of the LSTM network with a delay of one time step. 

Source publication
Preprint
Full-text available
We propose a novel reinforcement learning approach for finite Markov decision processes (MDPs) with delayed rewards. In this work, biases of temporal difference (TD) estimates are proved to be corrected only exponentially slowly in the number of delay steps. Furthermore, variances of Monte Carlo (MC) estimates are proved to increase the variance of...

Similar publications

Preprint
Full-text available
Decision-making for engineering systems can be efficiently formulated as a Markov Decision Process (MDP) or a Partially Observable MDP (POMDP). Typical MDP and POMDP solution procedures utilize offline knowledge about the environment and provide detailed policies for relatively small systems with tractable state and action spaces. However, in large...