Content uploaded by Felix Gers
Author content
All content in this area was uploaded by Felix Gers on Jun 09, 2016
Content may be subject to copyright.
cc
in
s = s + g y
wg yin
input gating
yc
w
hyout yout
output gating
ouput gate
out
net
out
sc
h( )
output squashing
wc
c
net
net
g( )
c
input squashing
1.0
CEC: memorizing yin
input gate
in
net
in
yc
w
hyout yout
output gating
ouput gate
out
net
out
sc
h( )
output squashing
wg yin
wc
c
net
netg( )
c
input squashing
w
in
y
input gate
in
net
in
input gating
memorizing and forgetting
c
cin
s = s y + g y yϕ
forget gate
ϕ
net
ϕ
ϕ
P
S
T
X
V
V
T
P
S
EB
X
Grammar
Reber
Grammar
Reber
T
P
EB
recurrent connection for continuous prediction
T
P
0
5
10
15
20
25
10 20 30 40 50 60
ERG String Length
Number ERG Strings in %
logarihmic scale
exponential fit
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 10 20 30 40 50 60 70 80 90 100
Probability
Max. ERG String Length
0
10
20
30
40
50
60
0200000 400000 600000 800000 1e+06
Expected Max. ERG String Length
Number Samples
0
10
20
30
40
50
60
110 100 1000 10000 100000 1e+06
Expected Max. ERG String Length
Number Samples
Out
2Out Out
4Out
5Out
6Out
7
Out
1 3
2 4567
In
1 3
In In In In In In
Memory
1
Cell
1
Block
Output
Input
Hidden
Out Gate 1
Forget Gate 1
In Gate 1
Memory
1
Cell
2
Block Out Gate 1
Forget Gate 1
In Gate 1
Memory
2
Cell
2
Block
Memory
2
Cell
1
Block
-50
0
50
100
0 T T T T T P P T T T T P 130
Internal Cell State
Symbol
-9- -10- -14- -10- -10- -9- -10- -10- -12- -10- -9- -9-
-10
0
10
20
680 T P P T T P P T T T T T 850
Internal Cell State
-12- -20- -11- -15- -11- -10- -15- -14- -9- -19- -10- -9- -9-
3.Block, 1.Cell
3.Block, 2.Cell
0
0.2
0.4
0.6
0.8
1
680 T P P T T P P T T T T T 850
Forget Gate Activation
Pattern
-10
0
10
680 T P P T T P P T T T T T 850
Internal State
-12- -20- -11- -15- -11- -10- -15- -14- -9- -19- -10- -9- -9-
1.Block, 1.Cell
0
0.2
0.4
0.6
0.8
1
680 T P P T T P P T T T T T 850
Forget Gate Activation
Pattern
1
10
100
1000
10000
100000
0 5000 10000
Stream Length
Stream Presentations
1
10
100
1000
10000
100000
0 5000 10000
Stream Length
Stream Presentations
A preview of this full-text is provided by The MIT Press.
Content available from Neural Computation
This content is subject to copyright.