PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

Our brain receives a dynamically changing stream of sensorimotor data. Yet, we perceive a rather organized world, which we segment into and perceive as events. Computational theories of cognitive science on event-predictive cognition suggest that our brain forms generative, event-predictive models by segmenting sensorimotor data into suitable chunks of contextual experiences. Here, we introduce a hierarchical, surprise-gated recurrent neural network architecture, which models this process and develops compact compressions of distinct event-like contexts. The architecture contains a contextual LSTM layer, which develops generative compressions of ongoing and subsequent contexts. These compressions are passed into a GRU-like layer, which uses surprise signals to update its recurrent latent state. The latent state is passed forward into another LSTM layer, which processes actual dynamic sensory flow in the light of the provided latent, contextual compression signals. Our model shows to develop distinct event compressions and achieves the best performance on multiple event processing tasks. The architecture may be very useful for the further development of resource-efficient learning, hierarchical model-based reinforcement learning, as well as the development of artificial event-predictive cognition and intelligence.
Content may be subject to copyright.
Fostering Event Compression using
Gated Surprise
Dania Humaidan?[000000031381257X], Sebastian Otte[0000000203050463],
and Martin V. Butz[0000000281208537]
University of Tuebingen - Neuro-Cognitive Modeling Group, Sand 14, 72076
Tuebingen - Germany
Abstract. Our brain receives a dynamically changing stream of senso-
rimotor data. Yet, we perceive a rather organized world, which we seg-
ment into and perceive as events. Computational theories of cognitive
science on event-predictive cognition suggest that our brain forms gen-
erative, event-predictive models by segmenting sensorimotor data into
suitable chunks of contextual experiences. Here, we introduce a hierar-
chical, surprise-gated recurrent neural network architecture, which mod-
els this process and develops compact compressions of distinct event-
like contexts. The architecture contains a contextual LSTM layer, which
develops generative compressions of ongoing and subsequent contexts.
These compressions are passed into a GRU-like layer, which uses surprise
signals to update its recurrent latent state. The latent state is passed for-
ward into another LSTM layer, which processes actual dynamic sensory
flow in the light of the provided latent, contextual compression signals.
Our model shows to develop distinct event compressions and achieves
the best performance on multiple event processing tasks. The architec-
ture may be very useful for the further development of resource-efficient
learning, hierarchical model-based reinforcement learning, as well as the
development of artificial event-predictive cognition and intelligence.
Keywords: Event cognition ·Surprise ·Event segmentation.
1 Introduction
The way our brain perceives information and organizes it remains an open ques-
tion. It appears that we have a tendency to perceive, interpret, and thus under-
stand our sensorimotor data streams in the form of events. The so-called Event
Segmentation Theory (EST) [23] suggests that we utilize temporary increases
in prediction errors for segmenting the stream of sensorimotor information into
separable events [8]. As a result, compact event encodings develop.
Event encodings certainly do not develop for their own sake or for the mere
sake of representing sensorimotor information, though. Rather, it appears that
event encodings are very useful for memorizing events and event successions,
as well as for enabling effective hierarchical reinforcement learning [3], amongst
?Supported by International Max Planck Research School for Intelligent Systems.
arXiv:2005.05704v1 [cs.LG] 12 May 2020
2 D. Humaidan et al.
other benefits [4]. Indeed it appears that our brain prepares for upcoming events
in the prefrontal cortex [21]. Moreover, the whole midline default network [20]
seems to actively maintain a push-pull relationship between externally-generated
stimulations and internally-generated imaginations, including retrospective re-
flections and prospective anticipations.
We have previously modeled such processes with REPRISE – a retrospec-
tive and prospective inference scheme [5,6] – showing promising planning and
event-inference abilities. However, the retrospective inference mechanism is also
rather computationally expensive. Here, we introduce a complementary surprise-
processing modular architecture, which may support the event-inference abilities
of REPRISE as well as, more generally speaking, the development of event-
predictive compressions. We show that when contextual information is selec-
tively routed into a predictive processing layer via GRU-like [7] switching-gates,
suitable event compressions are learned via standard back-propagation through
time. As a result, the architecture can generate and instantly switch between
distinct functional operations.
After providing some further background on related event processing mech-
anisms in humans and neural modeling approaches, we introduce our surprise-
processing modular architecture. We evaluate our system exemplarily on a set
of simple function prediction tasks, where the lower-layer network needs to pre-
dict function value outputs given inputs and contextual information. Meanwhile,
deeper layers learn to distinguish different functional mappings, compressing the
individual functions into event-like encodings. In the near future, we hope to
scale this system to more challenging real-world tasks and to enhance the archi-
tecture such that upcoming surprise signals and consequent event switches are
anticipated as well.
2 Related Work
The ability to distinguish different contexts was previously tested in humans
[23,24,18]. Segmenting these events was suggested to make use of the prediction
failures to update the internal model and suggest that a new event has begun.
Loschky and colleagues [13] showed a group of participants selected parts of a
film. They showed that when the clip could be put within a larger context, the
participant had more systematic eye movements. Baldassano and colleagues [2]
showed that the participants had consistently different brain activity patterns
for different ongoing contexts (flying from the airport and eating at a restau-
rant). Pettijohn and colleagues have shown that increasing the number of event
boundaries can have a positive effect on memory [16].
From the computational aspect, the usage of prediction error to predict the
next stimulus was presented in the work of Reynolds and colleagues [17] who
used a feed forward network in combination with a recurrent neural network
module, memory cells, and a gating mechanism. This model was later extended
with an RL agent that controls the gating mechanism with a learned policy [15].
Successfully segmenting the information stream into understandable units was
Fostering Event Compression using Gated Surprise 3
also attempted with reservoir computing [1]. It was shown that this mechanism
can be sufficient to identify event boundaries. However, it did not develop a
hierarchical structure that is believed to be present when considering action
production and goal directed behaviors [11].
A framework that includes a hierarchical system of multilevel control was
illustrated in [12], which offers a background survey and a general hierarchical
framework for neuro-robotic systems (NRS). In this framework, the processing
of the perceptual information happens in a high level cognition layer whose out-
put passes through a translational mid level layer to a lower level execution
layer. The lower layer includes the sensory feedback between the agent and the
surrounding environment. An interesting hierarchical structure for spatial cogni-
tion was presented in the work of Martinet and colleagues [14]. Their presented
model showed how interrelated brain areas can interact to learn how to navi-
gate towards a target by investigating different policies to find the optimal one.
However, this structure focused on a certain maze and only used the size of the
reward at the goal location to make decisions.
Another important aspect of forming loosely hierarchical structured event
compressions lies in the prediction of event boundaries. Indeed, it was shown that
having background knowledge about the ongoing activities while an event unfolds
can help to predict when the current event might end [9]. This means that the de-
veloping event-generative compression structure may develop deeper knowledge
about the currently unfolding event. Amongst other things, such structures may
develop event boundary anticipating encodings, which, when activated, predict
initially unpredictable event changes.
3 Surprise-Processing Modular Architecture
We present a hierarchical surprise-gated recurrent neural network architecture.
The system simulates the flow of contextual information from a deep layer,
which prepares the processing of upcoming events developing event-compressing
encodings. These encodings are used to modify the processing of the lower-level
sensor- or sensorimotor-processing layer. In between, a GRU-like gating layer
controls when novel context modifying signals are passed on to the lower-level
layer and when the old signal should be kept. As a result, the lower-level layer
is generating predictions context-dependently, effectively learning to distinguish
different events.
3.1 The Structure
The proposed structure is composed of a contextual recurrent neural network
layer, implemented using an LSTM network (LSTMc) [10], which is responsible
for generating an event compression that is representing the currently ongoing or
next upcoming context. This contextual information is fed into a middle layer,
which is implemented by a GRU-like gate [7]. The gate decides how much of
4 D. Humaidan et al.
Fig. 1: The hierarchical structure composed of a deep contextual layer (LSTMc),
a GRU-like gating layer and a low-level function processing layer (LSTMf). Ad-
ditionally, we added an MLP to preprocess the function input (inputPrePro).
the novel contextual information in proportion to the previous contextual in-
formation will be passed on to the lower level layer. This lower level function
processing layer, which is also implemented by an LSTM, predicts a function
value (LSTMf). The function input is preprocessed using an MLP unit (input-
PrePro), before being provided to LSTMf. The structure is shown in Figure 1.
Note that the dotted lines denote unweighted inputs.
The decision about the current context is taken at the GRU-like top down
gating layer. When a new event begins, LSTMf will produce erroneous predic-
tions as the function switched. As a result, this correspondingly large surprise
value, representing the unexpectedly high prediction error [5], may be provided
to the gating layer. A surprise signal can thus be used to manipulate the up-
date gate of a GRU layer, receiving and passing on new contextual information
from LSTMc surprise-dependently. If the context has not changed, then the gate
stays closed, and the same old event compression is provided as the contextual
information to the LSTMf layer.
3.2 The Switch GRU
The used GRU structure was adapted to act as a switch to decide when (i) to
keep the gate closed, in which case the already saved context compression from
the previous time step will be maintained and passed on, or (ii) to open the gate,
in which case the new context generated by LSTMc will flow in. To perform this
task, the update gate at the GRU is modified to be unweighted, with its input
being the surprise signal. The combined gate is now getting the new context
compression from LSTMc and the hidden cell state (context compression of the
previous time step) inputs. The reset gate is removed as it has no role here.
Fostering Event Compression using Gated Surprise 5
4 Experiments and Results
For evaluation purposes, we used an example of a time series that includes four
functions fe(x, y) representing four different contexts or events e. Converting this
into a continuous time series, the previous function output is used as the first
function input at the following time step t, that is, xt=fe(xt1, yt1). Mean-
while, function inputs ytare generated independently and uniformly distributed
between 1 and 1. The four functions are
1. An addition function (add): fadd(x, y)=0.9x+y,
2. A sine function (sin): fsin(x, y) = x+sin(πy).
3. A subtraction function (sub): fsub(x, y)=0.9xy,
4. A constant function (con): fcon(x, y) = x,
Function switches occurred uniformly randomly every 5 to 12 times steps.
4.1 Single Network Experiments
As an upper error baseline, we first evaluated the performance of a single LSTM
layer or a two-layer perceptron (MLP), which receives xand yas input and is
trained to learn fex, y without providing any direct information about e. Next, as
a lower error baseline, we evaluate the performance of the two networks when we
augment the input with a one-hot vector denoting the ongoing event. Finally, to
make the problem harder again and enforce the anticipation of an event switch,
we switched the event information at a time point uniformly randomly earlier
than the actual next scheduled event switch, but at least two time steps after the
last event switch. This was done to simulate the idea of preparing for the next
event switch before it actually happens. In addition, we distinguish between runs
in which the consecutive functions were in the same order and runs in which the
next function type e∈ {add, sub, con, sin}is randomly chosen each time.
The used LSTM network had 10 hidden units and the MLP had two hid-
den layers each with 50 units. The weights of the networks were updated at
a fixed rate every 20 time steps. We used a learning rate of 10-4 and trained
every network for 2 000 epochs each with 2 000 steps. Finally, we tested every
network for 150 test iterations. Reported performance results are averaged over
ten differently initialized networks.
The results are shown in Table 1. As expected, worst performance is obtained
when the network does not receive any context-related information, while best
performance is achieved when context information is provided. When the order
of the successive functions is randomized, performance only slightly degrades.
When the context information switches earlier than the actual function, perfor-
mance degrades, yielding an average error between the case when no context
information is provided and when context information is provided perfectly in
tune with the actual context switches.
When comparing the performance of the LSTM with the MLP, several ob-
servations can be made. First, when no context information or ill-tuned context
6 D. Humaidan et al.
Table 1: Average training prediction error in the different single LSTM layer
experiments.
Experiment LSTM MLP
avg. error stdev. avg. error stdev.
No CI provided 0.2670 0.0272 0.4180 0.0016
CI provided with fixed function order 0.0533 0.0292 0.0098 0.0011
CI provided with random function order 0.0551 0.0215 0.0139 0.0022
CI provided but switched earlier 0.1947 0.0134 0.3180 0.0012
information is provided, LSTM outperforms the MLP. This is most likely the
case because the LSTM can in principle infer the function that currently applies
by analyzing the successive input signals. As a result, it appears to decrease its
prediction error via its recurrent information processing ability. On the other
hand, when perfect context information is provided, the MLP learns to approxi-
mate the function even better than the LSTM module, indicating that the MLP
can play out its full capacity, while the recurrent connections are somewhat
prohibiting better performance with the LSTM module.
4.2 Full Network Experiments
Next, we performed the experiments using the introduced surprise-processing
modular neural architecture.
We evaluated the structure by testing four cases:
The gate is always closed: The GRU-like layer output is constantly zero
(approximately corresponding to the upper error baseline).
The gate is always open: The GRU-like layer output is continuously con-
trolled by the new context compression from LSTMc.
The gate is only open at context switches: The GRU-like layer output main-
tains the context prediction generated by LSTMc when the context is switched.
The gate is gradually open at context switches: The GRU-like layer switches
its context output more smoothly.
Note that the fourth scenario is meant to probe whether a gradual surprise signal
can help to predict the switches between the contexts in a smoother manner.
The gate in this case turns from being closed, to being half-open, to being fully
opened, and back to half-open and closed.
Final test errors – again averaged over ten independently randomly weight-
initialized networks - are shown in Table 2. The results show that the best results
are obtained by keeping the gate closed while the same context is progressing,
and only opening it when a new event starts. As expected, the worst performance
is achieved when the gate is always closed. Note also that the performance only
slightly improves when the gate is always open, indicating that the architecture
cannot detect the event switches on its own. Gradually opening and closing the
gate slightly degrades the performance in comparison to when the gate is only
Fostering Event Compression using Gated Surprise 7
Table 2: Average training prediction error and average distance between the
centers of the clusters formed by the context compressions’ values in different
gate states. The lowest average error and largest average distances are marked
in bold.
Gate status avg. error stdev error Compared
clusters
avg. distance stdev distance
Always closed 0.280 0.059 Any 0.0 0.0
Always open 0.206 0.014
Add Sin 0.28 0.17
Add Sub 1.22 0.42
Add Const 0.64 0.19
Sin Sub 1.27 0.34
Sin Const 0.70 0.24
Sub Const 0.7 0.26
Only open at switch 0.059 0.017
Add Sin 0.69 0.15
Add Sub 3.12 0.42
Add Const 1.46 0.27
Sin Sub 2.59 0.47
Sin Const 0.92 0.17
Sub Const 1.72 0.4
Gradually opened 0.083 0.030
Add Sin 0.61 0.17
Add Sub 2.17 0.69
Add Const 1.35 0.56
Sin Sub 1.81 0.39
Sin Const 1.00 0.20
Sub Const 0.82 0.31
open at the actual switch. When considering the differences in the compression
codes that are passed down to LSTMf in the different events, the largest distances
are generated by the network when the GRU-like update gate is open at the
switch, only, thus indicating that it generated the most distinct compressions
for the four function events.
Figure 2 shows the development of the average prediction error and its stan-
dard deviation with respect to the best performing network setup, that is, the
one where the gate only opens as the switches. As can be seen, the error first
plateaus at a level of 0.4, which approximately corresponds to an identity map-
ping. It then rather reliably appears to find the gradient towards distinguishing
the four functions over the subsequent training epochs, thus converging to an er-
ror level that corresponds to the lower-error boundary of the single-layer LSTM
network with perfect context information (cf. Table 1).
All the above-mentioned results were obtained using a fixed weight update
frequency of 20 time steps, backpropagating the error 20 time steps into the
past. Table 3 shows the effect when the update frequency is changed. In this
case, the gradually changing surprise signal provides better results because in
some of the runs, the network with the gate open at the context switch only fails
to find the gradient down to the 0.05 error niveau. The gradual opening and
closing effectively increases the error flow into LSTMc, increasing the likelihood
of convergence. Thus, in the future a gradual change from weak, smooth surprise
signals to strong and progressively more punctual surprise signals should be
investigated further. Indeed such a surprise signal can be expected when derived
from LST M f during learning [5].
8 D. Humaidan et al.
Fig. 2: The average and standard deviation of the prediction error during train-
ing, averaged over ten differently initialized networks, when the gate is only open
at the switches.
Please remember that in the experiments above the context switch provided
to LSTMc switches earlier than the actual function event switch. As a result,
LSTMc can prepare for the switch but should only pass the information down
when the event switch is actually happening. This is accomplished by the GRU-
like module. Instead, when the surprise signal is provided to LSTMc and the
GRU-like gate is always open, the error less reliably drops to the 0.05 niveau,
as shown in Table 4. On the other hand, when the contextual information was
provided exactly in tune with the currently ongoing event to LSTMc – opening
the gate only at the switches still yielded a slightly better performance than
when the gate was always open (cf. Table 4).
It is also worth mentioning that when we ran the architecture with an MLP
(an MLPf module) as the function processing layer (instead of LSTMf), the
error stayed on an average of .42, without any difference between the gating
mechanisms (cf. Table 4). It thus appears that the gradient information from
LSTMf is more suitable to foster the development of distinct contextual codes.
Finally, we took a closer look at the event-encoding compressions generated
by the contextual layer and passed on by the GRU-like layer. Figure 3 shows
the context compression vector values produced by the deep context layer over
time. Figure 4 shows the outputs of the GRU-like gating layer. We can see stable
Table 3: Average training prediction error when a gradual surprise signal is
provided while using different weight update frequency settings.
Weight update frequency Fixed at 35 Random 20-50 Random 10-30
avg. error stdev error avg. error stdev error avg. error stdev error
Always closed 0.365 0.070 0.428 0.083 0.345 0.078
Always open 0.270 0.071 0.224 0.022 0.206 0.018
Open at context switch 0.200 0.142 0.318 0.122 0.166 0.149
Gradually opened 0.106 0.077 0.103 0.041 0.070 0.013
Fostering Event Compression using Gated Surprise 9
Table 4: Average prediction error when (i) the surprise signal is fed to LSTMc,
whereby the GRU-like gate is always open (Surp. to LSTMc), (ii) the context
information is provided to LSTMc exactly in tune with the function event (In-
tune CI to LSTMc), and when an MLPf is used instead of an LSTMf (MLPf).
Input to LSTMc / Surp. to LSTMc In-tune CI to LSTMc MLPf
Gate status avg. error stdev avg. error stdev avg. error stdev
0 / Always closed 0.2515 0.0678 0.310 0.080 0.4213 0.00164
1 / Always open 0.2280 0.0198 0.066 0.040 0.4215 0.00123
1 at c.s. / open at c.s. 0.1031 0.0555 0.055 0.019 0.4211 0.00165
(a) Gate open at switch (b) Gate always open
Fig. 3: The context compressions produced by the context layer in the structure. The
different background colors indicate the different contexts.
(a) Gate open at switch (b) Gate always open
Fig. 4: The context compressions provided by the GRU-like gating layer to the function
processing layer. The different background colors indicate the different contexts.
compressions when the gate is only open at the switches. When the gate is always
open, the context also switches systematically but much more gradually and thus
less suitably.
The results confirm that our surprise-processing modular architecture can
clearly distinguish between the different contexts and the generated compres-
10 D. Humaidan et al.
Fig. 5: The different context compressions generated by the structure in nine dif-
ferently initialized networks. Note that these compressions are different between
different contexts in the same run.
sions vary between different networks. Figure 5 shows the context compressions
for nine differently initialized networks. It is interesting to note that the context-
respective code for increasing is always close to zero, which is because the data
always started with the increasing function at the beginning of an epoch and a
network reset to zero activities. Moreover, it can be noted that, albeit clearly
different, the constant function code lies somewhat in between the code for in-
creasing and the code for decreasing. Finally, the sine function compression is
distinct but also somewhat in between the increasing and constant function code
(except for the in lower right graph). Further investigations with larger networks
are pending to evaluate whether the sine function may be predictable more ex-
actly with larger networks and whether the compression code from the GRU-like
layer for the sine function may become more distinct from the others in that case.
5 Discussion
Motivated by recent theories on event-predictive cognition [4,23,22], this paper
has investigated how dedicated neural modules can be biased towards reliably
developing even-predictive compressions.
Fostering Event Compression using Gated Surprise 11
We have introduced a surprise-processing modular neural network architec-
ture. The architecture contains a deep contextual layer, which generates suitable
event-encoding compressions. These compressions are selectively passed through
a GRU-like top-down layer, depending on current estimates of surprise. If the
surprise is low, then the same old compression is used. On the other hand, the
larger the current surprise, the more of the current context compression is passed
on to the function processing layer, effectively invoking an event transition. As a
result, the function processing layer predicts subsequent function values depen-
dent on the currently active, compressed, top-down event-predictive signal.
Our surprise-processing modular architecture was able to generate best pre-
dictive performance when the GRU-like gating structures was opened only at
or surrounding the event switch, mimicking the processing of a surprise signal.
When the upcoming context information is provided in advance, the deep con-
text layer does not only consider the currently ongoing event, but it also prepares
the processing of the next one. Thus, it is important that the gating top-down
layer only passes the context compression when a new event actually starts.
Elsewhere, event-triggered learning was proposed for control, such that the
system requests new information and the model is updated only when learning is
actually needed [19]. To this end, our suggested structure shows that even when
the context layer receives always the information regarding the actual ongoing
event, the gate may still open only at the context switch, since this is the time
point when new information needs to be passed to the actual event dynamics
processing layer. As a result, the same prediction accuracy is achieved with a
significantly more resource-efficient manner.
In future work, we will integrate surprise estimates from the LSTMf module
directly, as previous analyzed elsewhere [5]. Moreover, we intend to enhance the
architecture further to enable it to predict event boundaries, whose detection
initially correlates with measures of surprise [9] Finally, the architecture will be
combined with the REPRISE mechanism and scaled to larger problem domains,
including robotic control and object manipulation tasks.
References
1. Asabuki, T., Hiratani, N., Fukai, T.: Interactive reservoir computing for chunking
information streams. PLOS Computational Biology 14(10), e1006400 (Oct 2018)
2. Baldassano, C., Hasson, U., Norman, K.A.: Representation of real-world event
schemas during narrative perception. The Journal of Neuroscience: The Official
Journal of the Society for Neuroscience 38(45), 9689–9699 (2018)
3. Botvinick, M., Niv, Y., Barto, A.C.: Hierarchically organized behavior and its
neural foundations: A reinforcement learning perspective. Cognition 113(3), 262 –
280 (2009)
4. Butz, M.V.: Towards a unified sub-symbolic computational theory of cognition.
Frontiers in Psychology 7(925) (2016)
5. Butz, M.V., Bilkey, D., Humaidan, D., Knott, A., Otte, S.: Learning, planning,
and control in a monolithic neural event inference architecture. Neural Networks
117, 135–144 (2019)
12 D. Humaidan et al.
6. Butz, M.V., Menge, T., Humaidan, D., Otte, S.: Inferring event-predictive goal-
directed object manipulations in REPRISE. Artificial Neural Networks and Ma-
chine Learning – ICANN 2019 (11727), 639–653 (2019)
7. Chung, J., G¨ul¸cehre, C¸., Cho, K., Bengio, Y.: Empirical evaluation of gated recur-
rent neural networks on sequence modeling. CoRR abs/1412.3555 (2014)
8. Franklin, N.T., Norman, K.A., Ranganath, C., Zacks, J.M., Gershman, S.J.: Struc-
tured event memory: a neuro-symbolic model of event cognition. bioRxiv p. 541607
(Feb 2019)
9. Hard, B.M., Meyer, M., Baldwin, D.: Attention reorganizes as structure is detected
in dynamic action. Memory & Cognition 47(1), 17–32 (Jan 2019)
10. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9,
1735–1780 (1997)
11. Koechlin, E., Ody, C., Kouneiher, F.: The architecture of cognitive control in the
human prefrontal cortex. Science 302(5648), 1181–1185 (2003)
12. Li, J., Li, Z., Chen, F., Bicchi, A., Sun, Y., Fukuda, T.: Combined sensing, cogni-
tion, learning, and control for developing future neuro-robotics systems: A survey.
IEEE Transactions on Cognitive and Developmental Systems 11(2), 148–161 (Jun
2019)
13. Loschky, L.C., Larson, A.M., Magliano, J.P., Smith, T.J.: What would jaws do?
the tyranny of film and the relationship between gaze and higher-level narrative
film comprehension. PloS One 10(11), e0142474 (2015)
14. Martinet, L.E., Sheynikhovich, D., Benchenane, K., Arleo, A.: Spatial learning
and action planning in a prefrontal cortical network model. PLoS computational
biology 7(5), e1002045 (May 2011)
15. Metcalf, K., Leake, D.: Modeling unsupervised event segmentation: Learning event
boundaries from prediction errors p. 6
16. Pettijohn, K.A., Thompson, A.N., Tamplin, A.K., Krawietz, S.A., Radvansky,
G.A.: Event boundaries and memory improvement. Cognition 148, 136–144 (Mar
2016)
17. Reynolds, J.R., Zacks, J.M., Braver, T.S.: A computational model of event segmen-
tation from perceptual prediction. Cognitive Science 31(4), 613–643 (Jul 2007)
18. Serrano, A., Sitzmann, V., Ruiz-Borau, J., Wetzstein, G., Gutierrez, D., Masia,
B.: Movie editing and cognitive event segmentation in virtual reality video. ACM
Transactions on Graphics 36(4), 1–12 (Jul 2017)
19. Solowjow, F., Baumann, D., Garcke, J., Trimpe, S.: Event-triggered learning for
resource-efficient networked control. 2018 Annual American Control Conference
(ACC) pp. 6506–6512 (Jun 2018), arXiv: 1803.01802
20. Stawarczyk, D., Bezdek, M.A., Zacks, J.M.: Event representations and predictive
processing: The role of the midline default network core. Topics in Cognitive Sci-
ence (2019), this volume
21. Tanji, J., Hoshi, E.: Behavioral planning in the prefrontal cortex. Current Opinion
in Neurobiology 11(2), 164–170 (Apr 2001)
22. Zacks, J.M.: Event perception and memory. Annual Review of Psychology 71(1),
165–191 (2020)
23. Zacks, J.M., Swallow, K.M.: Event segmentation. Current directions in psycholog-
ical science 16(2), 80–84 (Apr 2007)
24. Zhao, J., Hahn, U., Osherson, D.: Perception and identification of random events.
Journal of Experimental Psychology. Human Perception and Performance 40(4),
1358–1371 (Aug 2014)
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Humans spontaneously organize a continuous experience into discrete events and use the learned structure of these events to generalize and organize memory. We introduce the Structured Event Memory (SEM) model of event cognition, which accounts for human abilities in event segmentation, memory, and generalization. SEM is derived from a probabilistic generative model of event dynamics defined over structured symbolic scenes. By embedding symbolic scene representations in a vector space and parametrizing the scene dynamics in this continuous space, SEM combines the advantages of structured and neural network approaches to high-level cognition. Using probabilistic reasoning over this generative model, SEM can infer event boundaries, learn event schemata, and use event knowledge to reconstruct past experience. We show that SEM can scale up to high-dimensional input spaces, producing human-like event segmentation for naturalistic video data, and accounts for a wide array of memory phenomena. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
Article
Full-text available
We introduce REPRISE, a REtrospective and PRospective Inference SchEme, which learns temporal event-predictive models of dynamical systems. REPRISE infers the unobservable contextual event state and accompanying temporal predictive models that best explain the recently encountered sensorimotor experiences retrospectively. Meanwhile, it optimizes upcoming motor activities prospectively in a goal-directed manner. Here, REPRISE is implemented by a recurrent neural network (RNN), which learns temporal forward models of the sensorimotor contingencies generated by different simulated dynamic vehicles. The RNN is augmented with contextual neurons, which enable the encoding of distinct, but related, sensorimotor dynamics as compact event codes. We show that REPRISE concurrently learns to separate and approximate the encountered sensorimotor dynamics: it analyzes sensorimotor error signals adapting both internal contextual neural activities and connection weight values. Moreover, we show that REPRISE can exploit the learned model to induce goal-directed, model-predictive control, that is, approximate active inference: Given a goal state, the system imagines a motor command sequence optimizing it with the prospective objective to minimize the distance to the goal. The RNN activities thus continuously imagine the upcoming future and reflect on the recent past, optimizing the predictive model, the hidden neural state activities, and the upcoming motor activities. As a result, event-predictive neural encodings develop, which allow the invocation of highly effective and adaptive goal-directed sensorimotor control.
Article
Full-text available
Chunking is the process by which frequently repeated segments of temporal inputs are concatenated into single units that are easy to process. Such a process is fundamental to time-series analysis in biological and artificial information processing systems. The brain efficiently acquires chunks from various information streams in an unsupervised manner; however, the underlying mechanisms of this process remain elusive. A widely-adopted statistical method for chunking consists of predicting frequently repeated contiguous elements in an input sequence based on unequal transition probabilities over sequence elements. However, recent experimental findings suggest that the brain is unlikely to adopt this method, as human subjects can chunk sequences with uniform transition probabilities. In this study, we propose a novel conceptual framework to overcome this limitation. In this process, neural networks learn to predict dynamical response patterns to sequence input rather than to directly learn transition patterns. Using a mutually supervising pair of reservoir computing modules, we demonstrate how this mechanism works in chunking sequences of letters or visual images with variable regularity and complexity. In addition, we demonstrate that background noise plays a crucial role in correctly learning chunks in this model. In particular, the model can successfully chunk sequences that conventional statistical approaches fail to chunk due to uniform transition probabilities. In addition, the neural responses of the model exhibit an interesting similarity to those of the basal ganglia observed after motor habit formation.
Article
Full-text available
Understanding movies and stories requires maintaining a high-level situation model that abstracts away from perceptual details to describe the location, characters, actions, and causal relationships of the currently unfolding event. These models are built not only from information present in the current narrative, but also from prior knowledge about schematic event scripts, which describe typical event sequences encountered throughout a lifetime. We analyzed fMRI data from 44 human subjects (male and female) presented with 16 three-minute stories, consisting of four schematic events drawn from two different scripts (eating at a restaurant or going through the airport). Aside from this shared script structure, the stories varied widely in terms of their characters and storylines, and were presented in two highly dissimilar formats (audiovisual clips or spoken narration). One group was presented with the stories in an intact temporal sequence, while a separate control group was presented with the same events in scrambled order. Regions including the posterior medial cortex, medial prefrontal cortex (mPFC), and superior frontal gyrus exhibited schematic event patterns that generalized across stories, subjects, and modalities. Patterns in mPFC were also sensitive to overall script structure, with temporally scrambled events evoking weaker schematic representations. Using a Hidden Markov Model, patterns in these regions predicted the script (restaurant vs airport) of unlabeled data with high accuracy and were used to temporally align multiple stories with a shared script. These results extend work on the perception of controlled, artificial schemas in human and animal experiments to naturalistic perception of complex narratives.
Conference Paper
Full-text available
Common event-triggered state estimation (ETSE) algorithms save communication in networked control systems by predicting agents' behavior, and transmitting updates only when the predictions deviate significantly. The effectiveness in reducing communication thus heavily depends on the quality of the dynamics models used to predict the agents' states or measurements. Event-triggered learning is proposed herein as a novel concept to further reduce communication: whenever poor communication performance is detected, an identification experiment is triggered and an improved prediction model learned from data. Effective learning triggers are obtained by comparing the actual communication rate with the one that is expected based on the current model. By analyzing statistical properties of the inter-communication times and leveraging powerful convergence results, the proposed trigger is proven to limit learning experiments to the necessary instants. Numerical and physical experiments demonstrate that event-triggered learning improves robustness toward changing environments and yields lower communication rates than common ETSE.
Article
Full-text available
Traditional cinematography has relied for over a century on a well-established set of editing rules, called continuity editing, to create a sense of situational continuity. Despite massive changes in visual content across cuts, viewers in general experience no trouble perceiving the discontinuous flow of information as a coherent set of events. However, Virtual Reality (VR) movies are intrinsically different from traditional movies in that the viewer controls the camera orientation at all times. As a consequence, common editing techniques that rely on camera orientations, zooms, etc., cannot be used. In this paper we investigate key relevant questions to understand how well traditional movie editing carries over to VR. To do so, we rely on recent cognition studies and the event segmentation theory, which states that our brains segment continuous actions into a series of discrete, meaningful events. We first replicate one of these studies to assess whether the predictions of such theory can be applied to VR. We next gather gaze data from viewers watching VR videos containing different edits with varying parameters, and provide the first systematic analysis of viewers' behavior and the perception of continuity in VR. From this analysis we make a series of relevant findings; for instance, our data suggests that predictions from the cognitive event segmentation theory are useful guides for VR editing; that different types of edits are equally well understood in terms of continuity; and that spatial misalignments between regions of interest at the edit boundaries favor a more exploratory behavior even after viewers have fixated on a new region of interest. In addition, we propose a number of metrics to describe viewers' attentional behavior in VR. We believe the insights derived from our work can be useful as guidelines for VR content creation.
Article
Events make up much of our lived experience, and the perceptual mechanisms that represent events in experience have pervasive effects on action control, language use, and remembering. Event representations in both perception and memory have rich internal structure and connections one to another, and both are heavily informed by knowledge accumulated from previous experiences. Event perception and memory have been identified with specific computational and neural mechanisms, which show protracted development in childhood and are affected by language use, expertise, and brain disorders and injuries. Current theoretical approaches focus on the mechanisms by which events are segmented from ongoing experience, and emphasize the common coding of events for perception, action, and memory. Abetted by developments in eye-tracking, neuroimaging, and computer science, research on event perception and memory is moving from small-scale laboratory analogs to the complexity of events in the wild.
Article
Please cite as: Stawarczyk, D., Bezdek, M. A., & Zacks, M. J. (in press). Event representations and predictive processing: The role of the midline default network core. Topics in Cognitive Science. 2 ABSTRACT The human brain is tightly coupled to the world through its sensory-motor systems-but it also spends a lot of its metabolism talking to itself. One important function of this intrinsic activity is the establishment and updating of event models-representations of the current situation that can predictively guide perception, learning, and action control. Here, we propose that event models largely depend on the default network (DN) midline core that includes the posterior cingulate and anterior medial prefrontal cortex. An increasing body of data indeed suggests that this subnetwork can facilitate stimuli processing during both naturalistic event comprehension and cognitive tasks in which mental representations of prior situations, trials, and task rules can predictively guide attention and performance. This midline core involvement in supporting predictions through event models can make sense of an otherwise complex and conflicting pattern of results regarding the possible cognitive functions subserved by the DN. 3
Article
Neuro-robotics systems (NRS) is the current state-of-the-art research with the strategic alliance of neuroscience and robotics. It endows the next generation of robots with embodied intelligence to identify themselves and interact with humans and environments naturally. Therefore, it needs to study the interaction of recent breakthroughs in brain neuroscience, robotics and artificial intelligence where smarter robots could be developed by employing neural mechanisms and understanding brain functions. Recently, more sophisticated neural mechanisms of perception, cognition, learning, and control have been decoded, which investigate how to define and develop the “brain” for future robots. In this paper, a comprehensive survey is summarized by recent achievements in neuro-robotics, and some potential directions for the development of future neuro-robotics are discussed.
Article
Once one sees a pattern, it is challenging to “unsee” it; discovering structure alters processing. Precisely what changes as this happens is unclear, however. We probed this question by tracking changes in attention as viewers discovered statistical patterns within unfolding event sequences. We measured viewers’ “dwell times” (e.g., Hard, Recchia, & Tversky, 2011) as they advanced at their own pace through a series of still-frame images depicting a sequence of event segments (“actions”) that were discoverable only via sensitivity to statistical regularities among the component motion elements. “Knowledgeable” adults, who had had the opportunity to learn these statistical regularities prior to the slideshow viewing, displayed dwell-time patterns indicative of sensitivity to the statistically defined higher-level segmental structure; “naïve” adults, who lacked the opportunity for prior viewing, did not. These findings clarify that attention reorganizes in conjunction with statistically guided discovery of segmental structure within continuous human activity sequences. As patterns emerge in the mind, attention redistributes selectively to target boundary regions, perhaps because they represent highly informative junctures of “predictable unpredictability.”