Content uploaded by Katherine Twomey
All content in this area was uploaded by Katherine Twomey on Oct 26, 2017
Content may be subject to copyright.
Developmental Science. 2017;e12629.
1 of 13
Curiosity- based learning in infants: a neurocomputational
Katherine E. Twomey1 | Gert Westermann2
1Division of Human
Communication, Development and
Communication, Development and Hearing,
Communicative Development (LuCiD), an
Infants are curious learners who drive their own cognitive development by imposing
by which infants structure their own learning is therefore critical to our understanding of
ofinfant visualcategorylearning, capturingexistingempiricaldataontheroleof envi-
learning between the structure of the environment and the plasticity in the learner itself.
tional models of reinforcement learning and for our understanding of this fundamental
mechanism in early development.
• Wepresentanovelformalizationofthe mechanismunderlyingin-
• Weimplementthismechanisminaneural networkthatcaptures
• In the same model we test four potential selection mechanisms and
show that learning is maximized when the model selects stimuli
based on its learning history, its current plasticity and its learning
1 | INTRODUCTION
Formorethan halfacentury,infants’information selectionhasbeen
orously controlled paradigms allow researchers to isolate a variable
offering a fine- grained picture of the range of factors that affect early
learning. Decades of developmental research have brought about a
broad consensus that infants’ information selection and subsequent
tions, the learning environment, and discrepancies between the two
(for a review, see Mather, 2013). On the one hand, there is substantial
evidence that infants’ performance in these studies depends heav-
ily on the characteristics of the learning environment. For example,
earlyworkdemonstrated that infants under 6monthsofage prefer
to look at patterned over homogenous grey stimuli (Fantz, Ordy, &
Udelf, 1962), and in a seminal series of categorization experiments
with 3- month- old infants, Quinn and colleagues demonstrated that
the category representations infants form are directly related to
the visual variability of the familiarization stimuli they see (Quinn,
4- month- old infants were shown to learn animal categories when fa-
miliarized with paired animal images, but not when presented with
2 of 13
TWOMEY and WESTERMann
alsoKovack-Lesh& Oakes, 2007). Thus, therepresentationsinfants
infants’existing knowledge has a profound effect on their behavior
inthese experiments.For example,whilenewborns respondequiva-
lently to images of faces irrespective of the race of those faces, by
8 months infants show holistic processing of images of faces from
their own race, but not of other- race faces, which they process fea-
turally (Ferguson, Kulkofsky, Cashon, & Casasola, 2009). Similarly,
4-month-old infants with pets at home exhibit more sophisticated
visual sampling of pet images than infants with no such experience
Lesh, McMurray, & Oakes, 2014). Effects of learning history also
emerge when infants’ experience is controlled experimentally. For
example,afteraweekoftrainingwith onenamed andone unnamed
novel object, 10-month-old infants exhibited increased visual sam-
plingof the previouslynamedobject in a subsequentsilentlooking-
2010;Gliga,Volein,&Csibra,2010). Thus, learning depends on the
interaction between what infants encounter in- the- moment and what
1.1 | Active learning in curious infants
A long history of experiments, starting with Piaget’s (1952) notion of
childrenas“little scientists”,has shownthatchildrenaremore thanpas-
siveobservers;rather, theytakeanactiverole inconstructingtheirown
learning.Recent work demonstrates this active learning in infants also.
For example, allowing 16-month-old infants to choose between two
to elicit help from their caregivers in finding a hidden object when they
were unable to see the hiding event than when they saw the object
beinghidden(Goupil, Romand-Monnier,& Kouider,2016).Indeed,even
to 8- month- olds increased their visual sampling of a sequence of images
when those images are moderately—but not maximally or minimally—
predictable (Kidd, Piantadosi, & Aslin, 2012; see also Kidd, Piantadosi,
&Aslin,2014).However, as a newly developing field active learning in
Critically, outside the lab infants interact with their environment
freely and largely autonomously, learning about stimuli in whichever
order they choose (Oudeyer & Smith, 2016). Thisexploration is not
drivenbyan external motivation such as finding foodtosatiatehun-
ger. Rather, it is intrinsically motivated(Baldassarreetal.,2014;Berlyne,
1960;Oudeyer& Kaplan, 2007; Schlesinger, 2013): in the real world
infants learn based on their own curiosity. Consequently, in construct-
acquire. However, in the majority of studies on early cognitive devel-
opment,infants’experiencein alearningsituation isfullyspecified by
about the cognitive processes underlying infants’ curiosity as a form of
intrinsicmotivation,or indeed the extent towhichwhat infants learn
riences—and consequently, their mental representations—is fundamen-
tal to our understanding of development more broadly.
1.2 | Computational studies of intrinsic motivation
In contrast to the relative scarcity of research into infant curiosity,
recent years have seen a surge in interest in the role of intrinsic mo-
tivation in autonomous computational systems. Equipping artificial
learningsystems withintrinsic motivationmechanismsis likelyto be
2013;Oudeyer,Kaplan, &Hafner,2007),andconsequently arapidly
expandingbody of computational androboticwork now focuses on
the intrinsic motivation mechanisms that may underlie a range of
behaviors; for example, low-level perceptual encoding (Lonini etal.,
2013; Schlesinger & Amso, 2013), novelty detection (Marsland,
Nehmzow, & Shapiro, 2005), and motion planning (Frank, Leitner,
Computational work in intrinsicmotivation has suggested a wide
range of possible formal mechanisms for artificial curiosity- based learn-
couldbe underpinned bya drive tomaximizelearning progressby in-
teracting with the environment in a novel manner relative to previously
be driven by prediction mechanisms, allowing the system to engage in
or minimal (Botvinick, Niv,& Barto, 2009). Still other approaches as-
sume that curiosity involves maximizing a system’s competence or
osity algorithms, it remains largely agnostic as to the psychological plau-
arate“reward”modulein which thesizeand timing ofthe rewardare
defined a priori by the modeler. Only recently has research highlighted
the value of incorporating developmental constraints in curiosity- based
computationaland robotic learning systems (Oudeyer& Smith, 2016;
Seepanomwan, Caligiore, Cangelosi, & Baldassarre,2015). While this
research shows great promise in incorporating developmentally inspired
curiosity- driven learning mechanisms into artificial learning systems, a
mechanismforcuriosityin humaninfantshasyetto bespecified.The
aim of this paper therefore is to develop a theory of curiosity- based
learning in infants, and to implement these principles in a computational
1.3 | The importance of novelty to curiosity-
Fromvery early indevelopment,infants show a novelty preference;
that is, they prefer new items to items they have already encountered
3 of 13
TWOMEY and WESTERMann
becomes less novel; that is, the child habituates. During habituation,
if a further new stimulus appears, and that stimulus is more novel
to the infant than the currently attended item, the infant abandons
arelinked: broadly,increases innovelty elicitincreasesinattention
that excessive novelty leads to a decrease in attention). Here, we
propose that curiosity in human infants consists of intrinsically mo-
On this view, infants will selectively attend to stimuli that best
supportthisdiscrepancyminimization. However,to date thereisno
agreement in the empirical literature as to what an optimal learn-
ing environment might be. For example, Bulf,Johnson, and Valenza
(2011) demonstrated that newborns learned from highly predictable
sequences of visual stimuli, but not from less predictable sequences.
In contrast, 10-month-old infants in a categorizationtask formed a
uncovered a “Goldilocks” effect in which learning is optimal when
alsoKinney& Kagan, 1976;Twomey,Ranson, & Horst,2014).From
ronment that best supports learning is unclear.
Across these studies, novelty and complexity are operational-
izeddifferently;for example, as objective environmentalpredictability
infants who are engaged in curiosity- driven learning, novelty is not a
perceptualenvironmental characteristics and what the learner knows.
Importantly, each infant has a different learning history which can affect
their exploratorybehavior. Forexample, infant A playswith blocks at
B’s favoritetoy is a rattle, and she is familiar with the noise it makes
elty based both on the learner’s internal representations (what infants
know) and the learning environment(what infants experience). In the
following paragraphs we provide a mechanistic account of this learner–
environment interaction using a neurocomputational model.
1.4 | Computational mechanisms for infant curiosity
Computational models have been widely used to investigate
various cognitive processes, lending themselves in particular to
capturing early developmental phenomena such as category learn-
2008;Westermann & Mareschal, 2004, 2012, 2014). Here we take
a connectionist or neurocomputational approach in which abstract
simulationsof biologicalneural networksareused toimplement and
explore theories of cognitive processes in an explicit way, offering
tions about novel behaviors. Neurocomputational models employ a
network of simple processing units to simulate the learner situated
interest, and can have important effects across representational de-
from the interaction between learner and environment. Thus, neu-
rocomputational models are well suited to implementing and testing
In the current work we employed autoencoder networks: ar-
tificial neural networks in which the input and the output are the
same (Cottrell& Fleming, 1990; Hinton & Salakhutdinov, 2006; see
from infant category learning tasks (Capelier-Mourguy,Twomey, &
Westermann, 2016; French, Mareschal, Mermillod, & Quinn, 2004;
Westermann& Mareschal,2004,2012,2014).Autoencoders imple-
nalrepresentationuntil thetwo match.Atthispointtheinfantlooks
morenovel astimulus,thelonger fixation timewill be. Similarly,au-
toencoder models receivean external stimulus on their input layer,
and aim to reproduce this input on the output layer via a hidden layer.
weighted connections to the hidden layer. Inputs to each hidden layer
unit are summed and this value passed through a typically sigmoid
activationfunction.The values on the hidden units are then passed
throughthe weightedconnectionsto the outputlayer.Again,inputs
to each output node are summed and passed through the activation
function, generating the model’s output representation. Learning is
discrepancybetweentheinput and outputrepresentations.Because
multiple iterations of weight adaptation are required to match the
model’s input and output, erroracts as an index of infants’ looking
Self-supervised autoencoder models are trained with the well-
known generalizeddelta rule (Rumelhart, Hinton, & Williams, 1986)
withthe specialcasethat inputand targetarethesame.Theweight
update rule of these models is:
4 of 13
TWOMEY and WESTERMann
where Δw is the change of a weight after presentation of a stim-
ulus. Thefirst term, (i − o), describes the difference between the
o(1 − o),is the derivative of the sigmoid activationfunction. This
termis minimalforoutputvalues near0 or1 andmaximalforo =
0.5.Because (i − o) represents the discrepancy between the mod-
el’s input and its representation, and because learning in the model
the amount the model can learn from a particular stimulus by con-
strainingthe size of the discrepancyto be reduced. Inthissense,
o(1 − o) reflects the plasticity of the learner, modulating its adapta-
tiontotheexternalenvironment.Finally,η represents the model’s
learningrate.The amountofadaptationisthusa functionboth of
the environment and the internal state of the learner.
adaptation—learning—is proportional to (i−o)o(1 − o); that is, learn-
ing is greatest when (i−o)o(1 − o)is maximal.Ifcuriosityisadrive
tomaximize learning, (i−o)o(1 − o) offers a mechanism for stimu-
lusselectionto maximizelearning:acuriousmodel shouldattempt
tomaximizeitslearningbychoosing stimuli forwhich(i−o)o(1 −
o) is greatest. Below,in Experiment 2 we test this possibility in a
model, and compare it against three alternative methods of stimulus
1.5 | A test case: infant categorization
powerful skill has generated a great deal of interest, and a large
body of research now demonstrates that infant categorization
is flexible and affected by both existing knowledge and in-the-
Stowe& Rakison,2005).Categorization thereforelendsitself well
totesting the curiositymechanism specified above.InExperiment
1 we present a model that captures infants’ behavior in a recent
categorization task in which the learning environment was artifi-
in a controlled laboratory study in which infants do not select in-
ated in the curiosity mechanism against three alternative mecha-
nisms, and demonstrate that learning history and learning plasticity
(i.e., the learner’s internal state) as well as in- the- moment input (i.e.,
the learning environment) are all necessary for maximal learning.
Takentogether, these simulations offeranexplicit and parsimoni-
ous mechanism for curiosity- driven learning, providing new insight
2 | EXPERIMENT 1
variations in perceptual features came from an influential series
of familiarization/novelty preference studies by Barbara Younger
infant might see eight images of different cats, for 10 seconds each.
Then,infantsare presented with two new images side-by-side, one
of which is a novel member of the just- seen category, and one of
preference,if infantslookfor longeratthe out-of-categorystimulus
than the within-category stimulus the experimenter concludes that
out-of-categoryitem.In thisexample,longerlookingatthedog than
whichexcludedthenovel dogexemplar (andindeed, theydo; Quinn
et al., 1993)
Younger(1985) exploredwhetherinfants couldtrackcovariation
of stimulus features and form a category based on this environmen-
talstructure. Ten-month-old infantswere shown aseriesofpictures
of novel animals (see Figure1) that incorporated four features (ear
separation, neck length, leg length and tail width) that could vary
systematicallyinsizebetween discretevalues of1 and 5.At test,all
children saw two simultaneously presented stimuli: one peripheral (a
(anewexemplarwith the centralvalue for each feature dimension).
Infants’increased looking times to the peripheral stimulus indicated
that they had learned a category that included the category- central
stimulus. This study was one of the first to demonstrate the now
Lesh & Oakes,2007; Quinn etal., 1993; Rakison, 2004; Rakison &
tured in a computational model. Matherand Plunkett (2011; hence-
waspresentedduring familiarizationwouldaffectinfants’ categoriza-
Younger (1985, E1).Although all infants saw the same stimuli, M&P
manipulated the order in which stimuli were presented during the fa-
Attest, allinfantssawtwosimultaneously presentednovelstimuli,in
line with Younger (1985): one category-central and one peripheral.
M&P found thatinfants in the maximum distance condition showed
an above- chance preference for the peripheral stimulus, while infants
inthe minimum distancecondition showednopreference.Thus, only
5 of 13
TWOMEY and WESTERMann
egory space”, then infants in the maximum distance condition would
traverse greater distances during familiarization than infants in the
minimum distance condition, leading to better learning. However, it is
not clear from these empirical data how infants adjusted their repre-
lateM&P’stask. Closelyfollowingthe originalexperimental design,we
fine- grained analyses we tested additional conditions with intermediate
perceptual distances as well as randomly presented sequences (the
usual case in familiarization/novelty preference studies with infants).
LikeM&Pwethen tested themodelon new peripheraland category-
centralstimuli. Basedontheir results,we expectedthe model toform
the strongest category after training with maximum distance stimuli,
then intermediate/random distance, and finally minimum distance.
2.1 | Model architecture
Weused an autoencoderarchitectureconsisting of fourinputunits,
threehiddenunits,andfouroutputunits(Figure2).Each input unit
corresponded to one of the four features of the training stimuli (i.e.,
Hidden and output units used a sigmoidal activation function and
2.2 | Stimuli
features neck length, leg length, ear separation, and tail width. Individual
stimuliwerebased on the stimulus dimensionsprovidedinYounger
(1985,E1, Broad; seeFigure1). For eachfeature,these values were
normalizedto lie between0and 1. Eachstimulus(that is, inputori)
therefore consisted of a four- element vector in which each element
represented the value for one of the four features. Model inputs were
eachsequence we calculated the mean Euclidean distance (ED) be-
tweensuccessive stimuli. This resultedina single overallperceptual
distance value for each sequence.
We created orders for the following four conditions based on
• Minimum distance (min;cf.M&Pminimumdistance): 24setswith
• Medium distance (med): 24 sets with an intermediate mean ED,
specifically sets 20,149–20,172 when sets are sorted in order of
• stimuli presented in random order
turevalues)and one category-centralstimulus (anewexemplarwith
&Plunkett(2011)withpermission FIGURE2 Model architecture
6 of 13
TWOMEY and WESTERMann
these test stimuli was part of the training set.
2.3 | Procedure
Duringtraining, each stimulus was presented foramaximum of 20
set after each sweep (with no weight updating) and recorded sum
squared error (SSE)asaproxyforlookingtime(Mareschal&French,
2000; Westermann & Mareschal, 2012, 2014). Order of presenta-
tion of training stimuli varied by condition (see Stimuli). Following
eral, one central), presented sequentially for a single sweep with no
weight updates, and again recorded SSE. There were 24 separate
models in each condition, reflecting the 24 participants in each con-
2.4 | Results and discussion
2.4.1 | Training trials
Duringfamiliarization infantsinM&P demonstratedasignificant de-
creasein looking fromthefirst to the finalthree-trialblock. For the
maxandmin conditions we submitted SSE during thefirstandfinal
maineffectofblock(F(1,46)=97.35,p < .0001, η2
G = .46) confirmed
thatoverallSSE decreasedfrom thefirstblock(M= 0.57,SD = 0.11)
tothefinal block (M = 0.54, SD = 0.11). A main effect of condition
(F(1, 46) = 2079.12, p < .0001, η2
G = .96) revealed that there was less
erroroverallinthemaxcondition(M=0.45,SD = 0.03) than in the min
condition (M = 0.66, SD=0.03).Finally,therewasasignificantblock-
by- condition interaction (F(1, 46) = 4.40, p = .041, η2
G = .03), which
arosefromagreaterdecreasein SSEinthe maxcondition(mean de-
crease=0.045)than in the min condition (mean decrease = 0.030).
Thus,as with the infants in M&P, “looking” in the model decreased
2.4.2 | Test trials
InM&P,increased lookingtotheperipheralstimuli attestwastaken
proxyforlookingtime,wecollapsed ouranalysesacross thetwope-
ripheralstimuli(Mather&Plunkett, 2011),and calculatedproportion
sum tests against chance confirmed that in all conditions the model
formed a category (all Vs = 300, all ps<.001). However, a Kruskal-
tion) differed between conditions (H(3) = 80.13, p < .001). Post- hoc
Wilcoxon tests (all Ws two-tailed and Bonferroni-corrected) con-
= 0.99) than in the min condition (Mdn = 0.76; W=576,p < .0001, r =
−1.53),themedcondition(Mdn = 0.79; W=576,p < .0001, r=−1.53)
or the random condition (Mdn = 0.83; W=575,p < .0001, r=−1.51).
formation in M&P’s minimum distance condition, the authors argue
that these infants were in fact learning a category; since distances
were smaller, these infants traversed less of the category space than
resentations were therefore not sufficiently robust to be detected at
data, likely accounting for our detection of differences where M&P
found null effects.
Overall, our results support M&P’s distance-based account.
We maketheir theoretical category space explicit by implementing
stimuli as feature vectors, which can be interpreted as locations in
Euclidean space.The greater overall Euclideandistances in the max
condition thereforeforce the model to “travel” furtherfrom trial to
therefore greater adaptation, resulting in stronger category learning
overall.The model therefore explains how manipulation ofstimulus
order during training can lead to observed differences in learning at
In Experiment 1 (as in M&P) the orderof stimulus presenta-
tion was fixedin each condition to control the mean successive
ED.This approach created an artificially structured environment
in which the model learned best from the inputs with the most
tational data indicate that both infants and the model learn dif-
ferently in differently structured environments—even when those
differences may seem minor, such as the order in which stimuli
***p < .001
all between-condition differences ***
7 of 13
TWOMEY and WESTERMann
areexperienced. However,Experiment 1 reflectedartificially op-
timized ratherthan curiosity-based learning. An important ques-
tion for research on curiosity- based learning is how a model that
selects its own experiences structures its environmentand how
learning in this self- generated environment compares with learn-
ing in the artificially optimized environment in Experiment 1.
Thus,inExperiment2we allowedthe modelto choosethe order
in which it learned from stimuli based both on environmental and
vation in which curiosity is triggered when a learner notices a dis-
crepancy between the environment and their representation (e.g.,
Loewenstein, 1994), the model scans the environment and then
selects the stimulus that maximizes a given function.This learn-
ing is analogous to an infant looking at and processingan array
ofobjects beforechoosing one tolearnfrom.Wecompared the
curiosity- based learning discussed above with three alternative
or plasticity at each learning step.
3 | EXPERIMENT 2
possible mechanisms for stimulus selection.
3.1 | Model architecture and stimuli
Model architecture and parameters and stimuli were identical to
those used in Experiment 1. Stimulus selection proceeded without
replacement;thus,asinExperiment1 the model saw exactly eight
3.2 | Procedure
The procedure used in Experiment 2 was identical to that used in
Experiment 1, with the exception that stimulus order was deter-
mined by the model based on the following four methods of stimulus
3.2.1 | Curiosity
In the curiosityconditionwetestedourformalizationofinfantcurios-
itybased on the delta rule.Specifically,before presentation of each
stimulus, the model calculated (i − o)o(1 − o) for all possible stimuli
where i = input values and o=outputvalues.Forexample,afterpres-
entation of the first stimulus, the model calculated (i − o)o(1 − o) for
each of the remaining seven stimuli, resulting in a set of seven poten-
tialcuriosity values.Thenext stimuluschosenas input tothe model
was that for which the absolute value of this curiosity function was
ing a novelty detection mechanism rather than the novelty reduction
process of learning.
3.2.2 | Objective complexity maximization
M&Pused Euclidean distance as a measure of inter-stimulus novelty
andshowed that maximizing noveltyobjectivelypresentin the learn-
ing environment led to better learning than minimizing this novelty.
However, M&P selected the presentation orders in advance of the
experiment so that the max condition maximized mean ED between
stimuli across the sequence as a whole. However, our model aimed
toprovidean account of in-the-moment information selection. Thus,
in the objective complexity maximization condition, at each step the
modelchosethe stimulusthatwasmaximallydistant(byED)from the
current stimulus. Complexity is therefore specifically implemented as
EDhere. Inthis conditionthe firststimuluswaschosenrandomly and
3.2.3 | Subjective novelty maximization
In the subjective novelty maximization condition the model selected
stimulibymaximizingi − o, leading to the selection of a stimulus that
was maximally different from its representation in the model. This
mechanismmaximized novelty relative to the model’s learning history.
Subjective novelty maximization therefore reflects prediction-error-
based computational reinforcement learning systems (for a review,
seeBotvinick etal., 2009; see also Ribas-Fernandes etal., 2011), in
3.2.4 | Plasticity maximization
Choosing stimuli based on o(1 − o)minimizesthein-the-momenteffect
of the environment (i) on the model’s learning by omitting (i − o). Put
differently,thismechanism maximizesthemodel’splasticity.Thus, in
the plasticity maximization condition the model selected stimuli about
which it was most ready to learn (disregarding how much it would
actually be able to learn from that stimulus).
In all conditions the test phase was exactlyas in Experiment 1,
comparingnetwork errortocentral and peripheralstimuliasa mea-
sure of strength of category learning.
3.3 | Results and discussion
Proportion of total SSE for peripheral test stimuli is depicted in
the model formed a category in all conditions (all ps<.001).Active
learning therefore led to category formation irrespective of the basis
onwhich the modelselectedstimuli. A Kruskal-Wallis testrevealed,
tion we discuss the differences between the four stimulus selection
bestin the curiosity condition.First,the model learned amorerobust
category in the curiosity condition (Mdn = 0.97) than in the objective
8 of 13
TWOMEY and WESTERMann
complexitymaximizationcondition(Mdn = 0.91; W=495,p < .001, r =
−0.92).Thisresulthighlights theroleofthelearnerin thelearning pro-
cess: when the model selected stimuli based solely on objective, envi-
alsooutperformedthe subjectivenoveltymaximization condition(Mdn
= 0.77; W=575,p < .001, r=−1.51).Here,althoughthemodel’slearned
the difference between its representation (o) and the environment (i)
were greatest in- the- moment, the longer- term effect of learning history,
demonstrates that the additional plasticity provided by the o(1−o) term
tent to which the model could adapt to its learning environment, reduc-
ing its ability to select stimuli that would lead to optimum information
aloneis notsufficientto maximizelearning: the modelalsoperformed
dition (Mdn=0.75,W=575,p < .001, r=−1.51).Sincethislattermech-
anism ignores the in- the- moment effect of the environment this result
suggests that while focusing solely on the environment is not the best
strategy for active learning, ignoring how much can actually be learned
fromastimulusis notoptimaleither.Finally,inline withExperiment 1
564,p < .0001, r=−1.37; W= 56,p < .0001, r= −1.36),further high-
lighting the importance of environmental input; however, we found no
and plasticity maximization conditions (W = 318, p= .55, r = −0.12).
Overall,then, ourformalization ofcuriositymaximizedlearning viathe
dynamic interaction of plasticity, learning history, and in- the- moment
Next, wewere interested in the level of complexityof the se-
quences that maximized learning in the curiosity condition. Inthe
contextof Experiment 1 and M&P,we might expect that the curi-
support learning (Kidd etal., 2012, 2014; Kinney & Kagan, 1976;
learningin some cases (Bulfetal.,2011; Son, Smith, & Goldstone,
2008).Tohelp makesense of theseconflictingresults, all ofwhich
come from experimentswith predetermined stimulus presentation
orders,we analyzedthe stimulus sequencesgeneratedbythecuri-
ous model. Overall, the model generated four different sequences
out of the totalpossible 40,320, depicted in Figure5. On the one
hand, these sequences are very similar; recall that the model selected
stimuli without replacement, reducing the degrees of freedom as
trainingproceeded.Ontheotherhand, theyarenot identical.Their
differences stem from the stochasticity provided to the model by the
humandata,the model data exhibit individual differencesunderly-
generated only four different sequences over 24 runs, this result also
predicts that systematicity in infants’ curiosity- based learning should
be relatively robust.
Toobtain an index ofthe level ofcomplexity ofthegenerated
orderswe ranked the entire set of 40,320 permutations bymean
overall ED, generating281 unique values. Table1 provides these
inthe curiositycondition.The curiousmodel generatedsequences
of intermediate objective complexity. However, these sequences
werenot of averagecomplexity(i.e., from ranks around140/281)
butwere towards the highendofthe range. Toexplorethis find-
ineach ofthe foursequences andrankedthese accordingto their
complexity(i.e., a rank of 1 would mean that the model has cho-
senthemaximallydifferent next stimulus from the set ofremain-
ingstimuli).These individual inter-stimulusdistancesareprovided
in Table2. Interestingly, the model did not generate intermediate
ingthemean overallED masks amoreinteresting behavior: in all
sequences,the model firstmaximizedED (1/7) (cf.M&P). Inthree
out of the foursequences the model then minimized the second
ED(6/6),thenchosean intermediateED(3/5)andmaximized EDs
thereafter.Therefore, when measuredin terms of objective com-
plexity,overall intermediate complexity arose froma combination
ofmaximallycomplex, minimallycomplex andmoderatelycomplex
optimalintermediacybeshifted towardsthe morecomplex endof
thescale?Figure6plotsthe curiosityfunctionforvaluesofi and o
***p < .001
Proportion SSE to
9 of 13
TWOMEY and WESTERMann
between 0 and 1 and illustrates that (i − o)o(1 − o) is minimal when
(i − o)iszero,andmaximalwhen(i−o)isaround0.7.Thus,learning
is greatest when both plasticity and subjective novelty are interme-
diate, but shifted towards the higher end of the spectrum.
This striking novelty-maximization–novelty-minimization behavior
emerges because curiosity-driven learning maximizes subjective—not
modelis initializedrandomlywithoutpriorknowledge abouttheto-be-
experiencedstimuli.Atthis stage,thestimulus most similartothis ran-
domrepresentationin thecontextofthe to-be-learnedcategorywould
maximizes learning by choosing a category-peripheral stimulus that
is maximally different from its initial, random representation. Next, it
othercategoryperipheral stimulus.Now,thetwo mostperipheral cate-
gory stimuli, having just been encoded, are the most familiar to the model
possible between these two representations; that is, a category- central
stimulus—and this is what the model chooses. Thus, notwithstanding
thenoise inherentinthe initializationofthe model,which accountsfor
“startfromtheoutside andmovein”strategyfrom theextremes tothe
prototype. Notethat while the model predicts that infants will exhibit
thesamepattern ofexplorationthisisbasedon theassumptionofnoa
learnedrepresentationsby10 months.Whether infantswill exhibitthe
tasks involving truly free exploration—are exciting empirical questions
which we are currently addressing.
toa previouslyunseenprototypical exemplar,weassumethatit has
learned a category with the prototypical exemplar at its center. In
as vectors, can be thought of as locations in representational space.
Category learning is therefore a process of moving from location to
locationwithin this space. Fromthis perspective, theorderin which
thecurious model choosesstimulimaximizes the numberof timesit
traverses the central location in this space, resulting in strong encod-
ingofthis arearelativetoweakencoding ofperipheralstimuli.More
generally, the curiosity mechanism makes the intriguing prediction
forfutureworkthat infants engagedincuriosity-drivenlearning will
1515 5151 5511 1155 2424 2244 4422 4242
1515 5151 5511 1155 4242 2424 4422 2244
1515 5151 2244 2424 5511 1155 4422 4242
1155 5511 4422 4242 5151 1515 2244 2424
Rank mean ED Frequency/24
10 of 13
TWOMEY and WESTERMann
4 | GENERAL DISCUSSION
In the current work we used a neurocomputational model to first
categorization, and then to offer an explicit account of curiosity-
driven learning in human infants. In Experiment 1 we captured
empiricaldatapresented by Mather and Plunkett (2011), in which
10-month-old infants formed a robust category when familiarized
but not in sequences which minimized it. In Experiment 2, we al-
lowedthemodeltotakeanactive role in its own learning by let-
ting it select its own stimuli, comparing four different mechanisms
for stimulus selection. Here, curiosity- based learning depended
critically on the interaction between learning history, plasticity and
the learning environment, allowing the model to choose stimuli for
4.1 | Novelty is in the eye of the beholder
Our goal here was to develop a mechanistic theory of infants’ intrinsi-
callymotivated—orcuriosity-based—visual exploration. We selected
the autoencoder model and its learning mechanism based on their
roots in psychological theory and their established success in cap-
turinginfants’ behaviorin empiricaltasks.Importantly, theproposed
curiosity mechanism is theoretically compatible with classical optimal
1994; Vygotsky, 1980). According to these theories, learning is op-
timal in environments of intermediate novelty. Typically, these ap-
proaches have interpreted this intermediacy as information that is
neither too similar nor too different from what the learner has previ-
offers a new perspective: what constitutes optimal novelty changes
asthechildlearns.Thus, what is initially too novel to be useful be-
is therefore related to subjective novelty, not objective complexity.
Critically, this insight may explain the conflicts in the extant litera-
intermediate novelty: the relationship between subjective novelty and
objective complexity is nonlinear. That is, different levels of objec-
tivecomplexity couldprovide anenvironmentof maximalsubjective
novelty, depending on the infant’s learning history. Developing robust
particular individual differences, is therefore critical to understanding
decisions about what aspect of the environment to learn from, learn-
ing can be maximal. Given recent work showing that infants can
explicitlystructure theirlearningenvironment byasking their care-
givers forhelp (Goupil etal., 2016), this suggests that infants may
also implicitly optimize their own learning (for an early empirical
testofthis predction, see Twomey,Malem,&Westermann, 2016).
Second, in line with looking time studies showing that infants se-
lectinformationsystematically(Kiddetal.,2012, 2014), the model
chosestimuliofintermediate objective complexity.However,anal-
yses of the sequences chosen by the model predict that rather than
Order A (chosen
Order B (chosen
Order C (chosen
Order D (chosen
ED Rank ED Rank ED Rank ED Rank
1 – – – – – – – –
21.5885 1/7 1.5885 1/7 1.5885 1/7 1.5885 1/7
3 1.0974 3/6 1.0974 3/6 0.3971 6/6 0.3971 6/6
41.5885 1/5 1.5885 1/5 0.7942 3/5 0.7942 3/5
50.8717 3/4 0.904 2/4 0.904 1/4 0.904 1/4
60.5487 3/3 0.7942 1/3 1.5885 1/3 1.5885 1/3
7 0.7942 1/2 0.5742 1/2 1.1914 1/2 1.1914 1/2
80.5487 – 0.7942 – 0.7942 – 0.7942 –
between successive stimuli for sequences
chosen in the curiosity condition of
FIGURE6 Plot of the curiosity function, (i − o)o(1 − o)
11 of 13
TWOMEY and WESTERMann
seekingout intermediatecomplexityateach learningevent,infants
may switch systematically between more and less objectively com-
our account goes further than classical theories in which curiosity is
viewedaseitheranovelty-seekingor a novelty-minimizing behav-
ior (e.g., Loewenstein, 1994). Rather, our model predicts that infants’
visualexploration shouldexhibitboth noveltyseeking and novelty-
minimizing components when novelty is viewed objectively, unifying
these theories in a single mechanism.
4.2 | A new approach to computational curiosity in
Thiswork contributes tocomputationalresearchin intrinsic motiva-
ingmodel based onin-the-moment,local decision-making without a
separate, top- down system for monitoring learning progress and/or
reward as generated by a discrete, engineered module that calculates
areward value usingtask-specificcomputations. Our modeldeparts
from this approach, showing that domain- general mechanisms can
produce the motivation to learn, performing a similar function to re-
wardwithout requiringa separatemodule; thatis,in ourmodel, “re-
ward” is part of the algorithm itself. Overall, then the current work
tion, and offers a broader account of the cognitive mechanisms that
may drive curiosity: learning that integrates a search for subjective
novelty modulated by the learner’s plasticity. Here, intrinsically mo-
tivatedinformation selection emerges fromwithinthe model byex-
Overall, this neurocomputational model provides the first formal
account of curiosity- based learning in human infants, integrating sub-
jective novelty and intrinsic motivation mechanisms in a single model.
which infants select information to construct their own optimal learning
environment, and it provides a parsimonious mechanism by which this
it is possible that another one of the many potential mechanisms for
intrinsically motivated learning may take over later in development,
particularly once metacognition is established and language begins in
the current implementation of curiosity not only provides novel insight
tic theory of early intrinsically motivated visual learning.
This work was supported by the ESRC International Centre for
Language and Communicative Development (LuCiD), an ESRC
Future Research Leaders fellowship to KT and a British Academy/
Leverhulme Trust Senior Research Fellowship to GW. The support
of the Economic and Social Research Council (ES/L008955/1; ES/
N01703X/1)is gratefully acknowledged. Data and scripts are avail-
able on request from the authors. Portions of these data were pre-
sented at the 2015 5th International Conference on Development
andLearning and on EpigeneticRobotics,Providence, Rhode Island,
Katherine E. Twomey http://orcid.org/0000-0002-5077-2741
Gert Westermann http://orcid.org/0000-0003-2803-1872
Althaus, N., & Mareschal, D. (2013). Modeling cross-modal interac-
tions in early word learning. IEEE Transactions on Autonomous Mental
Development, 5, 288–297.
(2014). Intrinsic motivations and open- ended development in animals,
humans,androbots:Anoverview.Frontiers in Psychology, 5,985.
Baranes, A., & Oudeyer, P.-Y. (2013). Active learning of inverse models
with intrinsically motivated goal exploration in robots. Robotics and
Autonomous Systems, 61, 49–73.
Begus,K., Gliga, T.,& Southgate,V.(2014). Infants learnwhat theywant
to learn: Responding to infant pointing leads to superior learning. PLoS
ONE, 9, e108817.
Berlyne,D.E.(1960).Conflict, arousal, and curiosity.NewYork:McGraw-Hill.
rizationofobjectsinearlyinfancy.Child Development, 81, 884–897.
haviorand its neural foundations: A reinforcementlearning perspec-
tive. Cognition, 113, 262–280.
Bruner,J.D.,Goodnow,J.J., &Austin, G.A. (1972). Categories and cogni-
tion. In J.P. Spradley(Ed.), Culture and cognition (pp. 168–190). New
newborn infant. Cognition, 121, 127–132.
Capelier-Mourguy,A.,Twomey,K.E.,& Westermann,G.(2016,August). A
neurocomputational model of the effect of learned labels on infants’ object
fromnetworksandbabies.Philosophical Transactions of the Royal Society
of London Series B- Biological Sciences, 358,1205–1214.
Cottrell,G.W.,&Fleming,M. (1990).Facerecognitionusing unsupervised
feature extraction. In Proceedings of the International Neural Network
iar patterns relative to novel ones. Science, 146, 668–670.
Fantz, R.L., Ordy, J.M., & Udelf, M.S. (1962). Maturation of pattern vi-
sionin infants duringthe first sixmonths.Journal of Comparative and
Physiological Psychology, 55, 907–917.
Ferguson, K.T., Kulkofsky, S., Cashon, C.H., & Casasola, M. (2009). The
development ofspecialized processing of own-race faces in infancy.
Infancy, 14, 263–284.
Frank,M., Leitner,J.,Stollenga, M., Förster,A.,& Schmidhuber,J. (2014).
Curiosity driven reinforcement learning for motion planning on human-
oids. Frontiers in Neurorobotics, 7,25.
French, R.M., Mareschal, D., Mermillod, M., & Quinn, P.C. (2004).
The role of bottom-up processing in perceptual categorization by
12 of 13
TWOMEY and WESTERMann
3-to4-month-oldinfants:Simulationsanddata.Journal of Experimental
Psychology: General, 133, 382–397.
Gershkoff-Stowe,L., & Rakison, D.H. (2005). Building object categories in
Gliga, T., Volein, A., & Csibra, G. (2010).Verbal labels modulate percep-
tual object processing in 1- year- old children. Journal of Cognitive
Neuroscience, 22, 2781–2789.
names) for infant categorization: A neurocomputational approach.
Cognitive Science, 33, 709–738.
Gottlieb,J., Oudeyer,P.-Y.,Lopes,M.,&Baranes, A. (2013). Information-
seeking, curiosity, and attention: Computational and neural mecha-
nisms. Trends in Cognitive Sciences, 17,585–593.
whentheyknowtheydon’tknow.Proceedings of the National Academy
of Sciences, USA, 113, 3492–3496.
Hebb, D. (1949). The organization of behavior: A neuropsychological theory.
Horst,J.S.,Oakes,L.M.,& Madole,K.L.(2005). Whatdoesit looklikeand
Child Development, 76, 614–631.
Hurley, K.B., Kovack-Lesh, K.A., & Oakes, L.M. (2010). The influence of
pets on infants’ processing of cat and dog images. Infant Behavior and
Development, 33, 619–628.
Hurley,K.B., &Oakes,L.M. (2015). Experienceanddistribution of atten-
tion: Petexposure and infants’ scanning of animal images. Journal of
Cognition and Development, 16, 11–30.
Kagan,J.(1972).Motivesanddevelopment.Journal of Personality and Social
osity. Neuron, 88, 449–460.
infants allocate attention to visual sequences that are neither too sim-
plenortoocomplex.PLoS ONE, 7, e36399.
Kidd,C., Piantadosi,S.T.,&Aslin,R.N. (2014).TheGoldilockseffectinin-
fant auditory attention. Child Development, 85,1795–1804.
Kinney,D.K.,&Kagan,J.(1976). Infantattentionto auditorydiscrepancy.
Child Development, 47,155–164.
Kovack-Lesh, K.A., McMurray,B., & Oakes, L.M. (2014). Four-month-old
ence and attentional strategy. Developmental Psychology, 50, 402–413.
Kovack-Lesh, K.A., & Oakes, L.M. (2007). Holdyour horses: How expo-
sure to different items influences infant categorization. Journal of
Experimental Child Psychology, 98, 69–93.
Lefort,M.,&Gepperth,A.(2015).Active learning of local predictable represen-
tations with artificial curiosity.Paperpresentedatthe5thInternational
pretation. Psychological Bulletin, 116,75–98.
Robust active binocular vision through intrinsically motivated learning.
Frontiers in Neurorobotics, 7, 20.
Mareschal, D.,& French, R. (2000). Mechanisms of categorization in in-
fancy. Infancy, 1,59–76.
opmental psychology. IEEE Transactions on Evolutionary Computation,
for autonomous mobile robots. Robotics and Autonomous Systems, 51,
Mather, E. (2013). Novelty, attention, and challenges fordevelopmental
psychology. Frontiers in Psychology, 4, 491.
Mather,E., & Plunkett,K.(2011).Sameitems, different order: Effectsof
temporalvariabilityoninfantcategorization.Cognition, 119, 438–447.
ment. Developmental Science, 6, 413–429.
Murakami,M., Kroger,B., Birkholz, P.,& Triesch,J.(2015). Seeing [u] aids
vocal learning: Babbling and imitation of vowels using a 3D vocal tract
model, reinforcement learning, and reservoir computing. Paper presented
Oakes,L.M., Kovack-Lesh,K.A.,& Horst,J.S. (2009). Twoarebetter than
one: Comparison influences infants’ visual recognition memory. Journal
of Experimental Child Psychology, 104, 124–131.
of computational approaches. Frontiers in Neurorobotics, 1, 6.
for autonomous mental development. IEEE Transactions on Evolutionary
Oudeyer, P.-Y., & Smith, L.B. (2016). Howevolution may work through
curiosity- driven developmental process. Topics in Cognitive Science, 8,
Piaget, J. (1952). The origins of intelligence in children (Vol.8). New York:
connectionist net. Connection Science, 4, 293–312.
tations of perceptually similar natural categories by 3- month- old and
4- month- old infants. Perception, 22,463–475.
dynamic featuresin a category context. Journal of Experimental Child
Psychology, 89, 1–30.
Rakison,D.H., & Butterworth, G.E. (1998).Infants’use of objectpartsin
earlycategorization.Developmental Psychology, 34, 49–62.
In A.H. Black & W.F.Prokasy (Eds.), Classical Conditioning II: Current
Research and Theory(pp.64–99).NewYork:Appleton-Century-Crofts.
Y.,& Botvinick, M.M. (2011). A neuralsignature of hierarchical rein-
forcement learning. Neuron, 71, 370–379.
allel distributed processing approach. Behavioral and Brain Sciences, 31,
Schlesinger,M. (2013). Investigatingthe origins ofintrinsicmotivation in
human infants. In G. Baldassare& M. Mirolli (Eds.), Intrinsically moti-
vated learning in natural and artificial systems (pp. 367–392). Berlin:
image samples in infants and adults. Frontiers in Psychology, 4, 802.
infant robots. Cognitive Processing, 16,S100–S100.
Sokolov, E.N. (1963). Perception and the conditioned reflex. New York:
Son,J.Y.,Smith,L.B., & Goldstone, R.L. (2008). Simplicityandgeneraliza-
tion: Short-cutting abstraction in children’s object categorizations.
Cognition, 108, 626–638.
Thelen,E.,&Smith,L.B.(1994).A dynamic systems approach to the develop-
ment of cognition and action.Cambridge,MA:MITPress.
Thomas, M., & Karmiloff-Smith,A. (2003). Connectionist models of de-
velopment, developmental disorders, and individual differences.
In R.J. Sternberg,J. Lautrey & T.Lubart (Eds.), Models of intelligence:
13 of 13
TWOMEY and WESTERMann
International perspectives (pp. 133–150). Washington, DC: American
Twomey, K.E., Malem, B., & Westermann, G. (2016, May). Infants’ in-
formation seeking in a category learning task. In In K.E. Twomey
(chair), Understanding infants’ curiosity-based learning: Empirical and
computational approaches. Symposium presented at the XX Biennial
exemplars facilitateword learning. Infant and Child Development, 23,
object representations. Infancy, https://doi.org/10.1111/infa.12201
Vygotsky,L.S.(1980).Mind in society: The development of higher psychologi-
Westermann, G., & Mareschal, D. (2004). From parts to wholes:
Mechanisms of development in infant visual object processing.
Westermann,G., & Mareschal, D. (2012). Mechanisms ofdevelopmental
changeininfantcategorization.Cognitive Development, 27, 367–382.
Westermann, G., & Mareschal,D. (2014). From perceptual to language-
mediatedcategorization.Philosophical Transactions of the Royal Society
B: Biological Sciences, 369, 20120391.
Younger, B.A. (1985). The segregation of items into categories by ten-
month- old infants. Child Development, 56,1574–1583.
Younger, B.A., & Cohen, L.B. (1983). Infant perception of correlations
among attributes. Child Development, 54,858–867.
Younger, B.A., & Cohen, L.B. (1986). Developmental change in infants’
perception of correlations among attributes. Child Development, 57,
How to cite this article:TwomeyKE,WestermannG.
Curiosity- based learning in infants: a neurocomputational
approach. Dev Sci. 2017;e12629. https://doi.org/10.1111/