Content uploaded by Katherine Twomey
Author content
All content in this area was uploaded by Katherine Twomey on Oct 26, 2017
Content may be subject to copyright.
Developmental Science. 2017;e12629.
|
1 of 13
https://doi.org/10.1111/desc.12629
wileyonlinelibrary.com/journal/desc
Received:28October2016
|
Accepted:5September2017
DOI: 10.1111/desc.12629
PAPER
Curiosity- based learning in infants: a neurocomputational
approach
Katherine E. Twomey1 | Gert Westermann2
1Division of Human
Communication, Development and
Hearing,SchoolofHealthSciences,University
ofManchester,Manchester,UK
2DepartmentofPsychology,Universityof
Lancaster,Lancaster,UK
Correspondence
KatherineE.Twomey,DivisionofHuman
Communication, Development and Hearing,
UniversityofManchester,Coupland1,Oxford
Road,ManchesterM139PL,UK.
Email:katherine.twomey@manchester.ac.uk
Funding Information
ESRCInternationalCentreforLanguageand
Communicative Development (LuCiD), an
ESRCFutureResearchLeadersfellowshipto
KTandaBritishAcademy/LeverhulmeTrust
SeniorResearchFellowshiptoGW.Economic
andSocialResearchCouncil(ES/L008955/1;
ES/N01703X/1).
Abstract
Infants are curious learners who drive their own cognitive development by imposing
structureontheirlearningenvironmentastheyexplore.Understandingthemechanisms
by which infants structure their own learning is therefore critical to our understanding of
development.Hereweproposeanexplicitmechanismforintrinsicallymotivatedinfor-
mationselectionthatmaximizeslearning.Wefirstpresentaneurocomputationalmodel
ofinfant visualcategorylearning, capturingexistingempiricaldataontheroleof envi-
ronmentalcomplexityonlearning.Nextwe“setthemodelfree”,allowingittoselectits
ownstimulibasedonaformalizationofcuriosityandthreealternativeselectionmecha-
nisms.Wedemonstratethatmaximallearningemergeswhenthemodelisabletomaxi-
mizestimulusnoveltyrelativetoitsinternalstates,dependingontheinteractionacross
learning between the structure of the environment and the plasticity in the learner itself.
Wediscusstheimplicationsofthisnewcuriositymechanismforbothexistingcomputa-
tional models of reinforcement learning and for our understanding of this fundamental
mechanism in early development.
RESEARCH HIGHLIGHTS
• Wepresentanovelformalizationofthe mechanismunderlyingin-
fants’curiosity-drivenlearningduringvisualexploration.
• Weimplementthismechanisminaneural networkthatcaptures
empiricaldatafromaninfantvisualcategorizationtask.
• In the same model we test four potential selection mechanisms and
show that learning is maximized when the model selects stimuli
based on its learning history, its current plasticity and its learning
environment.
• Themodeloffersnewinsightintohowinfantsmaydrivetheirown
learning.
1 | INTRODUCTION
Formorethan halfacentury,infants’information selectionhasbeen
documentedinlab-basedexperiments.These carefullydesigned,rig-
orously controlled paradigms allow researchers to isolate a variable
ofinterestwhilecontrollingforextraneousenvironmentalinfluences,
offering a fine- grained picture of the range of factors that affect early
learning. Decades of developmental research have brought about a
broad consensus that infants’ information selection and subsequent
learninginempiricaltasksareinfluencedbytheirexistingrepresenta-
tions, the learning environment, and discrepancies between the two
(for a review, see Mather, 2013). On the one hand, there is substantial
evidence that infants’ performance in these studies depends heav-
ily on the characteristics of the learning environment. For example,
earlyworkdemonstrated that infants under 6monthsofage prefer
to look at patterned over homogenous grey stimuli (Fantz, Ordy, &
Udelf, 1962), and in a seminal series of categorization experiments
with 3- month- old infants, Quinn and colleagues demonstrated that
the category representations infants form are directly related to
the visual variability of the familiarization stimuli they see (Quinn,
Eimas,&Rosenkrantz,1993;seealsoYounger,1985).Morerecently,
4- month- old infants were shown to learn animal categories when fa-
miliarized with paired animal images, but not when presented with
thesameimagesindividually(Oakes,Kovack-Lesh,&Horst,2009;see
ThisisanopenaccessarticleunderthetermsoftheCreativeCommonsAttributionLicense,whichpermitsuse,distributionandreproductioninanymedium,
providedtheoriginalworkisproperlycited.
©2017TheAuthors.Developmental SciencePublishedbyJohnWiley&SonsLtd.
2 of 13
|
TWOMEY and WESTERMann
alsoKovack-Lesh& Oakes, 2007). Thus, therepresentationsinfants
learndependonbottom-upperceptualinformation.Equally,however,
infants’existing knowledge has a profound effect on their behavior
inthese experiments.For example,whilenewborns respondequiva-
lently to images of faces irrespective of the race of those faces, by
8 months infants show holistic processing of images of faces from
their own race, but not of other- race faces, which they process fea-
turally (Ferguson, Kulkofsky, Cashon, & Casasola, 2009). Similarly,
4-month-old infants with pets at home exhibit more sophisticated
visual sampling of pet images than infants with no such experience
(Hurley,Kovack-Lesh,&Oakes,2010;Hurley&Oakes,2015;Kovack-
Lesh, McMurray, & Oakes, 2014). Effects of learning history also
emerge when infants’ experience is controlled experimentally. For
example,afteraweekoftrainingwith onenamed andone unnamed
novel object, 10-month-old infants exhibited increased visual sam-
plingof the previouslynamedobject in a subsequentsilentlooking-
timetask(Twomey&Westermann,2017;seealsoBornstein&Mash,
2010;Gliga,Volein,&Csibra,2010). Thus, learning depends on the
interaction between what infants encounter in- the- moment and what
theyknow(Thelen&Smith,1994).
1.1 | Active learning in curious infants
A long history of experiments, starting with Piaget’s (1952) notion of
childrenas“little scientists”,has shownthatchildrenaremore thanpas-
siveobservers;rather, theytakeanactiverole inconstructingtheirown
learning.Recent work demonstrates this active learning in infants also.
For example, allowing 16-month-old infants to choose between two
novelobjectsinanimitationtaskboostedtheirimitationofnovelactions
subsequentlyperformedontheselecteditem(Begus,Gliga,&Southgate,
2014).Similarly,inapointingtask,20-month-oldinfantsweremorelikely
to elicit help from their caregivers in finding a hidden object when they
were unable to see the hiding event than when they saw the object
beinghidden(Goupil, Romand-Monnier,& Kouider,2016).Indeed,even
youngerinfantssystematicallycontroltheirownlearning:forexample,7-
to 8- month- olds increased their visual sampling of a sequence of images
when those images are moderately—but not maximally or minimally—
predictable (Kidd, Piantadosi, & Aslin, 2012; see also Kidd, Piantadosi,
&Aslin,2014).However, as a newly developing field active learning in
infantsiscurrentlypoorlyunderstood(Kidd&Hayden,2015).
Critically, outside the lab infants interact with their environment
freely and largely autonomously, learning about stimuli in whichever
order they choose (Oudeyer & Smith, 2016). Thisexploration is not
drivenbyan external motivation such as finding foodtosatiatehun-
ger. Rather, it is intrinsically motivated(Baldassarreetal.,2014;Berlyne,
1960;Oudeyer& Kaplan, 2007; Schlesinger, 2013): in the real world
infants learn based on their own curiosity. Consequently, in construct-
ingtheirownlearningenvironment,infantsshape theknowledgethey
acquire. However, in the majority of studies on early cognitive devel-
opment,infants’experiencein alearningsituation isfullyspecified by
theexperimenter,oftenthroughapreselectedsequenceofstimulithat
arepresentedforfixedamountsoftime.Thus,wecurrentlyknowlittle
about the cognitive processes underlying infants’ curiosity as a form of
intrinsicmotivation,or indeed the extent towhichwhat infants learn
fromcuriosity-drivenexplorationdiffersfromwhattheylearn inmore
constrainedenvironments.Giventhatactiveexplorationisattheheart
ofdevelopment,understandinghowtheyconstructtheirlearningexpe-
riences—and consequently, their mental representations—is fundamen-
tal to our understanding of development more broadly.
1.2 | Computational studies of intrinsic motivation
In contrast to the relative scarcity of research into infant curiosity,
recent years have seen a surge in interest in the role of intrinsic mo-
tivation in autonomous computational systems. Equipping artificial
learningsystems withintrinsic motivationmechanismsis likelyto be
keytobuildingautonomouslyintelligentsystems(Baranes&Oudeyer,
2013;Oudeyer,Kaplan, &Hafner,2007),andconsequently arapidly
expandingbody of computational androboticwork now focuses on
the intrinsic motivation mechanisms that may underlie a range of
behaviors; for example, low-level perceptual encoding (Lonini etal.,
2013; Schlesinger & Amso, 2013), novelty detection (Marsland,
Nehmzow, & Shapiro, 2005), and motion planning (Frank, Leitner,
Stollenga,Förster,&Schmidhuber,2014).
Computational work in intrinsicmotivation has suggested a wide
range of possible formal mechanisms for artificial curiosity- based learn-
ing(forareview,seeOudeyer&Kaplan,2007).Forexample,curiosity
couldbe underpinned bya drive tomaximizelearning progressby in-
teracting with the environment in a novel manner relative to previously
encounteredevents(Oudeyeretal.,2007).Alternatively,curiositycould
be driven by prediction mechanisms, allowing the system to engage in
activitiesforwhichpredictabilityis maximal(Lefort&Gepperth,2015)
or minimal (Botvinick, Niv,& Barto, 2009). Still other approaches as-
sume that curiosity involves maximizing a system’s competence or
abilitytoperforma task(Murakami,Kroger,Birkholz,&Triesch,2015).
Althoughthiscomputationalworkinvestigatesnumerouspotentialcuri-
osity algorithms, it remains largely agnostic as to the psychological plau-
sibilityoftheimplementationofthosemechanisms(Oudeyer&Kaplan,
2007).Forexample,manyautonomouslearningsystemsemployasep-
arate“reward”modulein which thesizeand timing ofthe rewardare
defined a priori by the modeler. Only recently has research highlighted
the value of incorporating developmental constraints in curiosity- based
computationaland robotic learning systems (Oudeyer& Smith, 2016;
Seepanomwan, Caligiore, Cangelosi, & Baldassarre,2015). While this
research shows great promise in incorporating developmentally inspired
curiosity- driven learning mechanisms into artificial learning systems, a
mechanismforcuriosityin humaninfantshasyetto bespecified.The
aim of this paper therefore is to develop a theory of curiosity- based
learning in infants, and to implement these principles in a computational
modelofinfantcategorization.
1.3 | The importance of novelty to curiosity-
based learning
Fromvery early indevelopment,infants show a novelty preference;
that is, they prefer new items to items they have already encountered
|
3 of 13
TWOMEY and WESTERMann
(Fantz,1964;Sokolov,1963).Asinfantsexploreanitem,however,it
becomes less novel; that is, the child habituates. During habituation,
if a further new stimulus appears, and that stimulus is more novel
to the infant than the currently attended item, the infant abandons
thehabituatediteminfavorofthenew.Thus,noveltyandcuriosity
arelinked: broadly,increases innovelty elicitincreasesinattention
andlearning(althoughseee.g.,Kiddetal.,2012,2014,forevidence
that excessive novelty leads to a decrease in attention). Here, we
propose that curiosity in human infants consists of intrinsically mo-
tivatednoveltyminimizationinwhichdiscrepanciesbetweenstimuli
andexistinginternalrepresentationsofthosestimuliareoptimally
reduced(seealsoRescorla&Wagner,1972;Sokolov,1963).
On this view, infants will selectively attend to stimuli that best
supportthisdiscrepancyminimization. However,to date thereisno
agreement in the empirical literature as to what an optimal learn-
ing environment might be. For example, Bulf,Johnson, and Valenza
(2011) demonstrated that newborns learned from highly predictable
sequences of visual stimuli, but not from less predictable sequences.
In contrast, 10-month-old infants in a categorizationtask formed a
robustcategorywhenfamiliarizedwithnovelstimuliinanorderthat
maximized,butnotminimized,overallperceptualdifferencesbetween
successivestimuli(Mather&Plunkett,2011).Stillotherstudieshave
uncovered a “Goldilocks” effect in which learning is optimal when
stimuliareofintermediatepredictability(Kiddetal.,2012,2014;see
alsoKinney& Kagan, 1976;Twomey,Ranson, & Horst,2014).From
thisperspective,thedegreeofnoveltyand/orcomplexityintheenvi-
ronment that best supports learning is unclear.
Across these studies, novelty and complexity are operational-
izeddifferently;for example, as objective environmentalpredictability
(Kiddetal.,2012,2014),orobjectiveperceptualdifferences(Mather&
Plunkett,2011).Incontrast,inthecurrentworkweemphasizethatfor
infants who are engaged in curiosity- driven learning, novelty is not a
fixedenvironmentalquantitybutishighlysubjective,dependingonboth
perceptualenvironmental characteristics and what the learner knows.
Importantly, each infant has a different learning history which can affect
their exploratorybehavior. Forexample, infant A playswith blocks at
homeandhassubstantialexperiencewithstackingcubeshapes.Infant
B’s favoritetoy is a rattle, and she is familiar with the noise it makes
whenshaken.Consequently,theblocksatnurserywillbemorenovelto
infantB,andtherattlemorenoveltoinfantA.Onthisview,noveltyis
separatefromanyobjectivemeasureofstimuluscomplexity;forexam-
ple,sequencepredictabilityordifferencesinvisual features(Kiddetal.,
2012,2014;Mather&Plunkett,2011).Thus,afullyspecifiedtheoryof
curiosity-drivenlearningmustexplicitlycharacterizethissubjectivenov-
elty based both on the learner’s internal representations (what infants
know) and the learning environment(what infants experience). In the
following paragraphs we provide a mechanistic account of this learner–
environment interaction using a neurocomputational model.
1.4 | Computational mechanisms for infant curiosity
Computational models have been widely used to investigate
various cognitive processes, lending themselves in particular to
capturing early developmental phenomena such as category learn-
ing(e.g.,Althaus&Mareschal,2013;Colunga&Smith,2003; Gliozzi,
Mayor,Hu,&Plunkett,2009;Mareschal&French,2000;Mareschal&
Thomas,2007;Munakata&McClelland,2003; Rogers&McClelland,
2008;Westermann & Mareschal, 2004, 2012, 2014). Here we take
a connectionist or neurocomputational approach in which abstract
simulationsof biologicalneural networksareused toimplement and
explore theories of cognitive processes in an explicit way, offering
process-basedaccountsofknownphenomenaandgeneratingpredic-
tions about novel behaviors. Neurocomputational models employ a
network of simple processing units to simulate the learner situated
andactinginitsenvironment.Inputsreflectthetaskenvironmentof
interest, and can have important effects across representational de-
velopment.Likelearningininfants,learninginthesemodelsemerges
from the interaction between learner and environment. Thus, neu-
rocomputational models are well suited to implementing and testing
developmental theories.
In the current work we employed autoencoder networks: ar-
tificial neural networks in which the input and the output are the
same (Cottrell& Fleming, 1990; Hinton & Salakhutdinov, 2006; see
Figure2).Thesemodelshavesuccessfullycapturedarangeofresults
from infant category learning tasks (Capelier-Mourguy,Twomey, &
Westermann, 2016; French, Mareschal, Mermillod, & Quinn, 2004;
Mareschal&French,2000;Plunkett,Sinha,Møller,&Strandsby,1992;
Westermann& Mareschal,2004,2012,2014).Autoencoders imple-
mentSokolov’s(1963)influentialaccountofnoveltyorientinginwhich
aninfantfixatesanovelstimulustocompareitwithitsmentalrepre-
sentation.Whileattendingtothestimulustheinfantadjuststhisinter-
nalrepresentationuntil thetwo match.Atthispointtheinfantlooks
awayfromthestimulus,switchingattentionelsewhere.Therefore,the
morenovel astimulus,thelonger fixation timewill be. Similarly,au-
toencoder models receivean external stimulus on their input layer,
and aim to reproduce this input on the output layer via a hidden layer.
Specifically,aninputrepresentationispresentedtothemodelviaacti-
vationofalayerofinputnodes.Thisactivationflowsthroughasetof
weighted connections to the hidden layer. Inputs to each hidden layer
unit are summed and this value passed through a typically sigmoid
activationfunction.The values on the hidden units are then passed
throughthe weightedconnectionsto the outputlayer.Again,inputs
to each output node are summed and passed through the activation
function, generating the model’s output representation. Learning is
achievedbyadaptingconnectionweightstominimizeerror,thatis,the
discrepancybetweentheinput and outputrepresentations.Because
multiple iterations of weight adaptation are required to match the
model’s input and output, erroracts as an index of infants’ looking
times(Mareschal&French,2000) or,morebroadly,thequalityofan
internal representation.
Self-supervised autoencoder models are trained with the well-
known generalizeddelta rule (Rumelhart, Hinton, & Williams, 1986)
withthe specialcasethat inputand targetarethesame.Theweight
update rule of these models is:
(1)
Δw=𝜂(i−o)o(1−o)
4 of 13
|
TWOMEY and WESTERMann
where Δw is the change of a weight after presentation of a stim-
ulus. Thefirst term, (i − o), describes the difference between the
inputandthemodel’srepresentationofthisinput.Thesecondterm,
o(1 − o),is the derivative of the sigmoid activationfunction. This
termis minimalforoutputvalues near0 or1 andmaximalforo =
0.5.Because (i − o) represents the discrepancy between the mod-
el’s input and its representation, and because learning in the model
consistsofreducingthisdiscrepancy,thesizeofo(1−o) determines
the amount the model can learn from a particular stimulus by con-
strainingthe size of the discrepancyto be reduced. Inthissense,
o(1 − o) reflects the plasticity of the learner, modulating its adapta-
tiontotheexternalenvironment.Finally,η represents the model’s
learningrate.The amountofadaptationisthusa functionboth of
the environment and the internal state of the learner.
Becauselearninginneurocomputationalmodelsisdrivenbythe
generalizeddeltarule,weproposethatthedeltarulecanprovidea
mechanisticaccountofcuriosity-basedlearning.Specifically,weight
adaptation—learning—is proportional to (i−o)o(1 − o); that is, learn-
ing is greatest when (i−o)o(1 − o)is maximal.Ifcuriosityisadrive
tomaximize learning, (i−o)o(1 − o) offers a mechanism for stimu-
lusselectionto maximizelearning:acuriousmodel shouldattempt
tomaximizeitslearningbychoosing stimuli forwhich(i−o)o(1 −
o) is greatest. Below,in Experiment 2 we test this possibility in a
model, and compare it against three alternative methods of stimulus
selection.
1.5 | A test case: infant categorization
Theabilitytocategorize—orrespondequivalentlyto—discriminably
differentaspectsoftheworldiscentraltohumancognition(Bruner,
Goodnow,&Austin,1972). Consequently,thedevelopmentofthis
powerful skill has generated a great deal of interest, and a large
body of research now demonstrates that infant categorization
is flexible and affected by both existing knowledge and in-the-
momentfeaturesof theenvironment(forareview,seeGershkoff-
Stowe& Rakison,2005).Categorization thereforelendsitself well
totesting the curiositymechanism specified above.InExperiment
1 we present a model that captures infants’ behavior in a recent
categorization task in which the learning environment was artifi-
ciallymanipulated(thus examiningdifferentlearningenvironments
in a controlled laboratory study in which infants do not select in-
formationthemselves).Then,inExperiment2wetestthecuriosity
mechanismby“settingthemodelfree”,allowingittochooseitsown
stimuli.Wecomparethelearner–environmentinteractioninstanti-
ated in the curiosity mechanism against three alternative mecha-
nisms, and demonstrate that learning history and learning plasticity
(i.e., the learner’s internal state) as well as in- the- moment input (i.e.,
the learning environment) are all necessary for maximal learning.
Takentogether, these simulations offeranexplicit and parsimoni-
ous mechanism for curiosity- driven learning, providing new insight
intoexistingempiricalfindings,andgenerating novel,testablepre-
dictionsforfuturework.
2 | EXPERIMENT 1
Earlyevidenceforinfants’abilitytoformcategoriesbasedonsmall
variations in perceptual features came from an influential series
of familiarization/novelty preference studies by Barbara Younger
(Younger,1985;Younger&Cohen,1983,1986).Inthisparadigm,in-
fantsarefamiliarizedwithaseriesof relatedstimuli—forexample,an
infant might see eight images of different cats, for 10 seconds each.
Then,infantsare presented with two new images side-by-side, one
of which is a novel member of the just- seen category, and one of
whichisout-of-category.Forexample,afterfamiliarizationwithcats,
aninfantmightseeanewcatandanewdog.Basedontheirnovelty
preference,if infantslookfor longeratthe out-of-categorystimulus
than the within-category stimulus the experimenter concludes that
theyhavelearnedacategoryduringfamiliarizationwhichexcludesthe
out-of-categoryitem.In thisexample,longerlookingatthedog than
thecatimagewouldindicatethatinfantshadformeda“cat”category
whichexcludedthenovel dogexemplar (andindeed, theydo; Quinn
et al., 1993)
Younger(1985) exploredwhetherinfants couldtrackcovariation
of stimulus features and form a category based on this environmen-
talstructure. Ten-month-old infantswere shown aseriesofpictures
of novel animals (see Figure1) that incorporated four features (ear
separation, neck length, leg length and tail width) that could vary
systematicallyinsizebetween discretevalues of1 and 5.At test,all
children saw two simultaneously presented stimuli: one peripheral (a
newexemplarwithextremefeaturevalues)and onecategory-central
(anewexemplarwith the centralvalue for each feature dimension).
Infants’increased looking times to the peripheral stimulus indicated
that they had learned a category that included the category- central
stimulus. This study was one of the first to demonstrate the now
much-replicatedfindingthatinfants’categorizationishighlysensitive
toperceptualvariability(e.g.,Horst,Oakes,&Madole,2005;Kovack-
Lesh & Oakes,2007; Quinn etal., 1993; Rakison, 2004; Rakison &
Butterworth,1998;Younger&Cohen,1986).
Thetargetempiricaldataforthefirstsimulationarefromarecent
extensionofthisstudywhichtoourknowledgehasnotyetbeen cap-
tured in a computational model. Matherand Plunkett (2011; hence-
forthM&P)exploredwhethertheorderinwhichasinglesetofstimuli
waspresentedduring familiarizationwouldaffectinfants’ categoriza-
tion.Theytrained4810-month-oldinfantswiththeeightstimulifrom
Younger (1985, E1).Although all infants saw the same stimuli, M&P
manipulated the order in which stimuli were presented during the fa-
miliarizationphasesothatinonecondition,infantssawapresentation
orderwhichmaximizedperceptualdifferencesacrossthestimulusset,
andasecondconditionwhichminimizedoverallperceptualdifferences.
Attest, allinfantssawtwosimultaneously presentednovelstimuli,in
line with Younger (1985): one category-central and one peripheral.
M&P found thatinfants in the maximum distance condition showed
an above- chance preference for the peripheral stimulus, while infants
inthe minimum distancecondition showednopreference.Thus, only
infantsinthemaximumdistanceconditionformedacategory.
|
5 of 13
TWOMEY and WESTERMann
M&Ptheorizedthatifstimuliinthistaskwererepresentedina“cat-
egory space”, then infants in the maximum distance condition would
traverse greater distances during familiarization than infants in the
minimum distance condition, leading to better learning. However, it is
not clear from these empirical data how infants adjusted their repre-
sentationsaccordingtothedifferentpresentationregimes.Totranslate
thistheoryintomechanism,weusedanautoencodernetworktosimu-
lateM&P’stask. Closelyfollowingthe originalexperimental design,we
trainedourmodelwithstimulus setsinwhichpresentationordermax-
imizedand minimizedsuccessiveperceptualdistances.Toenablemore
fine- grained analyses we tested additional conditions with intermediate
perceptual distances as well as randomly presented sequences (the
usual case in familiarization/novelty preference studies with infants).
LikeM&Pwethen tested themodelon new peripheraland category-
centralstimuli. Basedontheir results,we expectedthe model toform
the strongest category after training with maximum distance stimuli,
then intermediate/random distance, and finally minimum distance.
2.1 | Model architecture
Weused an autoencoderarchitectureconsisting of fourinputunits,
threehiddenunits,andfouroutputunits(Figure2).Each input unit
corresponded to one of the four features of the training stimuli (i.e.,
leglength,necklength,tailthicknessandearseparation;seeFigure1).
Hidden and output units used a sigmoidal activation function and
weightswereinitializedrandomly.
2.2 | Stimuli
StimuliwerebasedonYounger’s(1985)animaldrawingswiththefour
features neck length, leg length, ear separation, and tail width. Individual
stimuliwerebased on the stimulus dimensionsprovidedinYounger
(1985,E1, Broad; seeFigure1). For eachfeature,these values were
normalizedto lie between0and 1. Eachstimulus(that is, inputori)
therefore consisted of a four- element vector in which each element
represented the value for one of the four features. Model inputs were
generatedinanidenticalmannertothestimulusordersusedbyM&P.
Wecalculatedallpossiblepermutationsofpresentationsequenceof
theeightstimuli,resultingin40,320sequences.InlinewithM&P,for
eachsequence we calculated the mean Euclidean distance (ED) be-
tweensuccessive stimuli. This resultedina single overallperceptual
distance value for each sequence.
We created orders for the following four conditions based on
meanED:
• Maximumdistance(max;cf.M&Pmaximumdistance):24setswith
thelargestmeanED
• Minimum distance (min;cf.M&Pminimumdistance): 24setswith
thesmallestmeanED
• Medium distance (med): 24 sets with an intermediate mean ED,
specifically sets 20,149–20,172 when sets are sorted in order of
distance(set20160isthe“median”set)
• stimuli presented in random order
Testsetswereidenticalacrossconditions,andasinM&Pconsisted
oftwocategory-peripheralstimuli(newexemplarswith extremefea-
turevalues)and one category-centralstimulus (anewexemplarwith
FIGURE1 StimuliusedinYounger(1985)andthecurrent
simulations.AdaptedfromPlunkett,Hu&Cohen(2008)andMather
&Plunkett(2011)withpermission FIGURE2 Model architecture
6 of 13
|
TWOMEY and WESTERMann
thecentralvalueforeachfeaturedimension;seeFigure1).Neitherof
these test stimuli was part of the training set.
2.3 | Procedure
Duringtraining, each stimulus was presented foramaximum of 20
sweeps(weightupdates)oruntilnetworkerrorfellbelowathreshold
of0.01(Mareschal&French,2000).Thethresholdsimulatedinfants’
lookingawayafterfullyencodingthepresentstimulus.Toobtainan
indexoffamiliarization,wetestedthemodelwiththeentiretraining
set after each sweep (with no weight updating) and recorded sum
squared error (SSE)asaproxyforlookingtime(Mareschal&French,
2000; Westermann & Mareschal, 2012, 2014). Order of presenta-
tion of training stimuli varied by condition (see Stimuli). Following
M&P,wetestedthemodelwiththreenovelteststimuli(twoperiph-
eral, one central), presented sequentially for a single sweep with no
weight updates, and again recorded SSE. There were 24 separate
models in each condition, reflecting the 24 participants in each con-
ditionofM&P.
2.4 | Results and discussion
2.4.1 | Training trials
Duringfamiliarization infantsinM&P demonstratedasignificant de-
creasein looking fromthefirst to the finalthree-trialblock. For the
maxandmin conditions we submitted SSE during thefirstandfinal
three-trialblockstoa2(block:first,last;within-subjects)×2 (condi-
tion:max,min;between-subjects)mixedANOVA.InlinewithM&P,a
maineffectofblock(F(1,46)=97.35,p < .0001, η2
G = .46) confirmed
thatoverallSSE decreasedfrom thefirstblock(M= 0.57,SD = 0.11)
tothefinal block (M = 0.54, SD = 0.11). A main effect of condition
(F(1, 46) = 2079.12, p < .0001, η2
G = .96) revealed that there was less
erroroverallinthemaxcondition(M=0.45,SD = 0.03) than in the min
condition (M = 0.66, SD=0.03).Finally,therewasasignificantblock-
by- condition interaction (F(1, 46) = 4.40, p = .041, η2
G = .03), which
arosefromagreaterdecreasein SSEinthe maxcondition(mean de-
crease=0.045)than in the min condition (mean decrease = 0.030).
Thus,as with the infants in M&P, “looking” in the model decreased
over training.
2.4.2 | Test trials
InM&P,increased lookingtotheperipheralstimuli attestwastaken
asevidencethatinfantshadlearnedacategory.AgainusingSSEasa
proxyforlookingtime,wecollapsed ouranalysesacross thetwope-
ripheralstimuli(Mather&Plunkett, 2011),and calculatedproportion
oftotaltestSSE(i.e.,targetlooking/targetlooking+distractorlook-
ing)totheperipheralstimulus,asdepictedinFigure3.Wilcoxonrank-
sum tests against chance confirmed that in all conditions the model
formed a category (all Vs = 300, all ps<.001). However, a Kruskal-
WallistestrevealedthatSSE(andthereforerobustnessofcategoriza-
tion) differed between conditions (H(3) = 80.13, p < .001). Post- hoc
Wilcoxon tests (all Ws two-tailed and Bonferroni-corrected) con-
firmedthatthemodelproducedmoreSSEinthemaxcondition(Mdn
= 0.99) than in the min condition (Mdn = 0.76; W=576,p < .0001, r =
−1.53),themedcondition(Mdn = 0.79; W=576,p < .0001, r=−1.53)
or the random condition (Mdn = 0.83; W=575,p < .0001, r=−1.51).
Allotherbetween-conditiondifferenceswerealsosignificant(allps <
.0001).Notethatalthoughinfantsdidnotshowevidenceofcategory
formation in M&P’s minimum distance condition, the authors argue
that these infants were in fact learning a category; since distances
were smaller, these infants traversed less of the category space than
theirpeersinthemaximumdistancecondition,andtheircategoryrep-
resentations were therefore not sufficiently robust to be detected at
test.However,ourmodeldataarelessvariablethanM&P’sempirical
data, likely accounting for our detection of differences where M&P
found null effects.
Overall, our results support M&P’s distance-based account.
We maketheir theoretical category space explicit by implementing
stimuli as feature vectors, which can be interpreted as locations in
Euclidean space.The greater overall Euclideandistances in the max
condition thereforeforce the model to “travel” furtherfrom trial to
trial.MaximizingoverallEDleadstogreatererrorearlyintraining,and
therefore greater adaptation, resulting in stronger category learning
overall.The model therefore explains how manipulation ofstimulus
order during training can lead to observed differences in learning at
test.
In Experiment 1 (as in M&P) the orderof stimulus presenta-
tion was fixedin each condition to control the mean successive
ED.This approach created an artificially structured environment
in which the model learned best from the inputs with the most
inter-stimulusvariation.Takentogether,theempiricalandcompu-
tational data indicate that both infants and the model learn dif-
ferently in differently structured environments—even when those
differences may seem minor, such as the order in which stimuli
FIGURE3 ProportionSSEtoperipheralstimulusattestin
Experiment1
***p < .001
chance
***
*** ***
***
all between-condition differences ***
|
7 of 13
TWOMEY and WESTERMann
areexperienced. However,Experiment 1 reflectedartificially op-
timized ratherthan curiosity-based learning. An important ques-
tion for research on curiosity- based learning is how a model that
selects its own experiences structures its environmentand how
learning in this self- generated environment compares with learn-
ing in the artificially optimized environment in Experiment 1.
Thus,inExperiment2we allowedthe modelto choosethe order
in which it learned from stimuli based both on environmental and
internalfactors.Specifically,inlinewiththeoriesofintrinsicmoti-
vation in which curiosity is triggered when a learner notices a dis-
crepancy between the environment and their representation (e.g.,
Loewenstein, 1994), the model scans the environment and then
selects the stimulus that maximizes a given function.This learn-
ing is analogous to an infant looking at and processingan array
ofobjects beforechoosing one tolearnfrom.Wecompared the
curiosity- based learning discussed above with three alternative
strategiesthatmaximizedobjectivecomplexity,subjectivenovelty,
or plasticity at each learning step.
3 | EXPERIMENT 2
InExperiment2,themodelplayedanactiveroleinitsownlearningby
selectingtheorderinwhichitlearnedfromstimuli.Weexploredfour
possible mechanisms for stimulus selection.
3.1 | Model architecture and stimuli
Model architecture and parameters and stimuli were identical to
those used in Experiment 1. Stimulus selection proceeded without
replacement;thus,asinExperiment1 the model saw exactly eight
stimuli.
3.2 | Procedure
The procedure used in Experiment 2 was identical to that used in
Experiment 1, with the exception that stimulus order was deter-
mined by the model based on the following four methods of stimulus
selection.
3.2.1 | Curiosity
In the curiosityconditionwetestedourformalizationofinfantcurios-
itybased on the delta rule.Specifically,before presentation of each
stimulus, the model calculated (i − o)o(1 − o) for all possible stimuli
where i = input values and o=outputvalues.Forexample,afterpres-
entation of the first stimulus, the model calculated (i − o)o(1 − o) for
each of the remaining seven stimuli, resulting in a set of seven poten-
tialcuriosity values.Thenext stimuluschosenas input tothe model
was that for which the absolute value of this curiosity function was
maximal.Critically,weightswerenotupdatedafterthisstage,simulat-
ing a novelty detection mechanism rather than the novelty reduction
process of learning.
3.2.2 | Objective complexity maximization
M&Pused Euclidean distance as a measure of inter-stimulus novelty
andshowed that maximizing noveltyobjectivelypresentin the learn-
ing environment led to better learning than minimizing this novelty.
However, M&P selected the presentation orders in advance of the
experiment so that the max condition maximized mean ED between
stimuli across the sequence as a whole. However, our model aimed
toprovidean account of in-the-moment information selection. Thus,
in the objective complexity maximization condition, at each step the
modelchosethe stimulusthatwasmaximallydistant(byED)from the
current stimulus. Complexity is therefore specifically implemented as
EDhere. Inthis conditionthe firststimuluswaschosenrandomly and
successivestimuliwereselectedsothatthenextstimulushadthemaxi-
malED(i.e.,perceptualdistance)fromthecurrentlyprocessedstimulus.
3.2.3 | Subjective novelty maximization
In the subjective novelty maximization condition the model selected
stimulibymaximizingi − o, leading to the selection of a stimulus that
was maximally different from its representation in the model. This
mechanismmaximized novelty relative to the model’s learning history.
Subjective novelty maximization therefore reflects prediction-error-
based computational reinforcement learning systems (for a review,
seeBotvinick etal., 2009; see also Ribas-Fernandes etal., 2011), in
whichthelearnerseeksoutlearningopportunitiesthatmaximizethe
differencebetweenexpectationandobservation.
3.2.4 | Plasticity maximization
Choosing stimuli based on o(1 − o)minimizesthein-the-momenteffect
of the environment (i) on the model’s learning by omitting (i − o). Put
differently,thismechanism maximizesthemodel’splasticity.Thus, in
the plasticity maximization condition the model selected stimuli about
which it was most ready to learn (disregarding how much it would
actually be able to learn from that stimulus).
In all conditions the test phase was exactlyas in Experiment 1,
comparingnetwork errortocentral and peripheralstimuliasa mea-
sure of strength of category learning.
3.3 | Results and discussion
Proportion of total SSE for peripheral test stimuli is depicted in
Figure4.Wilcoxonrank-sumtestsagainstchance(0.5)confirmedthat
the model formed a category in all conditions (all ps<.001).Active
learning therefore led to category formation irrespective of the basis
onwhich the modelselectedstimuli. A Kruskal-Wallis testrevealed,
however,thatSSEdifferedbetweenconditions.In thefollowingsec-
tion we discuss the differences between the four stimulus selection
mechanisms.
Bonferroni-correctedWilcoxtestsconfirmedthatthemodellearned
bestin the curiosity condition.First,the model learned amorerobust
category in the curiosity condition (Mdn = 0.97) than in the objective
8 of 13
|
TWOMEY and WESTERMann
complexitymaximizationcondition(Mdn = 0.91; W=495,p < .001, r =
−0.92).Thisresulthighlights theroleofthelearnerin thelearning pro-
cess: when the model selected stimuli based solely on objective, envi-
ronmentalcharacteristicsitlearnedlesswellthanwhenitalsotookinto
accountitsowninternalstate(learninghistory).Thecuriositycondition
alsooutperformedthe subjectivenoveltymaximization condition(Mdn
= 0.77; W=575,p < .001, r=−1.51).Here,althoughthemodel’slearned
representationsweretakenintoaccountbyselectingstimuliforwhich
the difference between its representation (o) and the environment (i)
were greatest in- the- moment, the longer- term effect of learning history,
whichdeterminesthemodel’sreadinesstolearn,wasignored.Thisresult
demonstrates that the additional plasticity provided by the o(1−o) term
wasnecessaryformaximallearning;omittingthistermaffectedtheex-
tent to which the model could adapt to its learning environment, reduc-
ing its ability to select stimuli that would lead to optimum information
gainwithrespecttoitslearninghistory.However,maximizingplasticity
aloneis notsufficientto maximizelearning: the modelalsoperformed
betterinthecuriosityconditionthanintheplasticitymaximizationcon-
dition (Mdn=0.75,W=575,p < .001, r=−1.51).Sincethislattermech-
anism ignores the in- the- moment effect of the environment this result
suggests that while focusing solely on the environment is not the best
strategy for active learning, ignoring how much can actually be learned
fromastimulusis notoptimaleither.Finally,inline withExperiment 1
andM&P,theobjectivecomplexitymaximizationoutperformedthesub-
jectivenoveltyandplasticitymaximizationconditions(respectively,W =
564,p < .0001, r=−1.37; W= 56,p < .0001, r= −1.36),further high-
lighting the importance of environmental input; however, we found no
differenceinperformancebetweenthesubjectivenoveltymaximization
and plasticity maximization conditions (W = 318, p= .55, r = −0.12).
Overall,then, ourformalization ofcuriositymaximizedlearning viathe
dynamic interaction of plasticity, learning history, and in- the- moment
environmental input.
Next, wewere interested in the level of complexityof the se-
quences that maximized learning in the curiosity condition. Inthe
contextof Experiment 1 and M&P,we might expect that the curi-
ousmodelhad maximizedtheseenvironmentaldistances.However,
otherempiricalworksuggeststhatintermediatedifficultycouldbest
support learning (Kidd etal., 2012, 2014; Kinney & Kagan, 1976;
Twomeyetal.,2014).Equally,simplicityhasbeenshowntosupport
learningin some cases (Bulfetal.,2011; Son, Smith, & Goldstone,
2008).Tohelp makesense of theseconflictingresults, all ofwhich
come from experimentswith predetermined stimulus presentation
orders,we analyzedthe stimulus sequencesgeneratedbythecuri-
ous model. Overall, the model generated four different sequences
out of the totalpossible 40,320, depicted in Figure5. On the one
hand, these sequences are very similar; recall that the model selected
stimuli without replacement, reducing the degrees of freedom as
trainingproceeded.Ontheotherhand, theyarenot identical.Their
differences stem from the stochasticity provided to the model by the
randomweightinitialization,whichcanbeinterpretedasdifferences
betweenparticipants(Thomas&Karmiloff-Smith,2003).Thus,asin
humandata,the model data exhibit individual differencesunderly-
ingasingleglobalpatternofbehavior.Nonetheless,sincethemodel
generated only four different sequences over 24 runs, this result also
predicts that systematicity in infants’ curiosity- based learning should
be relatively robust.
Toobtain an index ofthe level ofcomplexity ofthegenerated
orderswe ranked the entire set of 40,320 permutations bymean
overall ED, generating281 unique values. Table1 provides these
rankings(higherrank=greatercomplexity)forthesequenceschosen
inthe curiositycondition.The curiousmodel generatedsequences
of intermediate objective complexity. However, these sequences
werenot of averagecomplexity(i.e., from ranks around140/281)
butwere towards the highendofthe range. Toexplorethis find-
ingwecalculatedtheindividualsuccessiveEDsfortheeightstimuli
ineach ofthe foursequences andrankedthese accordingto their
complexity(i.e., a rank of 1 would mean that the model has cho-
senthemaximallydifferent next stimulus from the set ofremain-
ingstimuli).These individual inter-stimulusdistancesareprovided
in Table2. Interestingly, the model did not generate intermediate
distancesateverylearningstep.Rather,Table2illustratesthattak-
ingthemean overallED masks amoreinteresting behavior: in all
sequences,the model firstmaximizedED (1/7) (cf.M&P). Inthree
out of the foursequences the model then minimized the second
ED(6/6),thenchosean intermediateED(3/5)andmaximized EDs
thereafter.Therefore, when measuredin terms of objective com-
plexity,overall intermediate complexity arose froma combination
ofmaximallycomplex, minimallycomplex andmoderatelycomplex
stimuliatdifferentstagesofthelearningprocess.Why,then,should
optimalintermediacybeshifted towardsthe morecomplex endof
thescale?Figure6plotsthe curiosityfunctionforvaluesofi and o
FIGURE4 ProportionSSEtoperipheralstimulusattestin
Experiment2
***p < .001
0.75
0.80
0.85
0.90
0.95
1.00
Proportion SSE to
peripheral stimulus
***
***
*** ***
***
***
***
***
***
Curiosity Objective
Complexity
Maximization
Subjective
Novelty
Maximization
Plasticity
Maximization
|
9 of 13
TWOMEY and WESTERMann
between 0 and 1 and illustrates that (i − o)o(1 − o) is minimal when
(i − o)iszero,andmaximalwhen(i−o)isaround0.7.Thus,learning
is greatest when both plasticity and subjective novelty are interme-
diate, but shifted towards the higher end of the spectrum.
This striking novelty-maximization–novelty-minimization behavior
emerges because curiosity-driven learning maximizes subjective—not
objective—novelty,modulatedbythemodel’splasticity.Specifically,the
modelis initializedrandomlywithoutpriorknowledge abouttheto-be-
experiencedstimuli.Atthis stage,thestimulus most similartothis ran-
domrepresentationin thecontextofthe to-be-learnedcategorywould
beaprototypical,category-centralstimulus.Atfirst,therefore,themodel
maximizes learning by choosing a category-peripheral stimulus that
is maximally different from its initial, random representation. Next, it
choosesthestimulusthatagainresultsinmaximalsubjectivenovelty—the
othercategoryperipheral stimulus.Now,thetwo mostperipheral cate-
gory stimuli, having just been encoded, are the most familiar to the model
andarerepresenteddiscretelyattheextremesofthecategoryspace.The
stimuluswhichmaximizessubjectivenoveltyshouldbeasequidistantas
possible between these two representations; that is, a category- central
stimulus—and this is what the model chooses. Thus, notwithstanding
thenoise inherentinthe initializationofthe model,which accountsfor
itschoiceofdifferentspecificorders,broadlythemodel exploreswitha
“startfromtheoutside andmovein”strategyfrom theextremes tothe
prototype. Notethat while the model predicts that infants will exhibit
thesamepattern ofexplorationthisisbasedon theassumptionofnoa
prioriknowledgeatthestartoflearning.Infants,ontheotherhand,have
learnedrepresentationsby10 months.Whether infantswill exhibitthe
samepatternofexploration—andwhetherthepatternholdsindifferent
tasks involving truly free exploration—are exciting empirical questions
which we are currently addressing.
Why,then,shouldthispatternmaximizelearning?Inlinewiththe
empiricalinfantcategorizationliterature,ifthemodelgeneratesmore
errorinresponsetoapreviouslyunseenperipheralexemplarrelative
toa previouslyunseenprototypical exemplar,weassumethatit has
learned a category with the prototypical exemplar at its center. In
M&P’sconceptualizationofcategorylearning,exemplars,represented
as vectors, can be thought of as locations in representational space.
Category learning is therefore a process of moving from location to
locationwithin this space. Fromthis perspective, theorderin which
thecurious model choosesstimulimaximizes the numberof timesit
traverses the central location in this space, resulting in strong encod-
ingofthis arearelativetoweakencoding ofperipheralstimuli.More
generally, the curiosity mechanism makes the intriguing prediction
forfutureworkthat infants engagedincuriosity-drivenlearning will
switchsystematicallybetweenstimuliofmaximumandminimumob-
jectivecomplexity.
FIGURE5 Stimulusorderschosenby
curious model
Trial 12345678
Order
A
1515 5151 5511 1155 2424 2244 4422 4242
Order
B
1515 5151 5511 1155 4242 2424 4422 2244
Order
C
1515 5151 2244 2424 5511 1155 4422 4242
Order
D
1155 5511 4422 4242 5151 1515 2244 2424
TABLE1 RankmeanEuclideandistanceschoseninthecuriosity
conditionofExperiment2
Rank mean ED Frequency/24
34/281 5
41/281 18
50/281 1
10 of 13
|
TWOMEY and WESTERMann
4 | GENERAL DISCUSSION
In the current work we used a neurocomputational model to first
capturetheeffectofobjectiveenvironmentalcomplexityoninfants’
categorization, and then to offer an explicit account of curiosity-
driven learning in human infants. In Experiment 1 we captured
empiricaldatapresented by Mather and Plunkett (2011), in which
10-month-old infants formed a robust category when familiarized
withstimulussequencesthatmaximizedoverallperceptualdistance,
but not in sequences which minimized it. In Experiment 2, we al-
lowedthemodeltotakeanactive role in its own learning by let-
ting it select its own stimuli, comparing four different mechanisms
for stimulus selection. Here, curiosity- based learning depended
critically on the interaction between learning history, plasticity and
the learning environment, allowing the model to choose stimuli for
whichlearningwasmaximalatthegivenpointofthemodel’sdevel-
opmental trajectory.
4.1 | Novelty is in the eye of the beholder
Our goal here was to develop a mechanistic theory of infants’ intrinsi-
callymotivated—orcuriosity-based—visual exploration. We selected
the autoencoder model and its learning mechanism based on their
roots in psychological theory and their established success in cap-
turinginfants’ behaviorin empiricaltasks.Importantly, theproposed
curiosity mechanism is theoretically compatible with classical optimal
incongruityapproaches(e.g.,Hebb,1949;Kagan,1972;Loewenstein,
1994; Vygotsky, 1980). According to these theories, learning is op-
timal in environments of intermediate novelty. Typically, these ap-
proaches have interpreted this intermediacy as information that is
neither too similar nor too different from what the learner has previ-
ouslyencountered—asseeninthe“Goldilocks”effectobservedinre-
centempiricalwork(Kiddetal.,2012,2014).Ourcuriositymechanism
offers a new perspective: what constitutes optimal novelty changes
asthechildlearns.Thus, what is initially too novel to be useful be-
comesamoresuitableinputaslearningprogresses.Themodelmakes
thisprocessexplicit,choosingstimulithatmaximizesubjectivenov-
eltyasmodulatedbyitsplasticity.Theoptimal learningenvironment
is therefore related to subjective novelty, not objective complexity.
Critically, this insight may explain the conflicts in the extant litera-
tureinwhichinfantsindifferenttaskshavebeenshowntolearnbest
fromminimallynovelstimuli,maximallynovelstimuli,andstimuliof
intermediate novelty: the relationship between subjective novelty and
objective complexity is nonlinear. That is, different levels of objec-
tivecomplexity couldprovide anenvironmentof maximalsubjective
novelty, depending on the infant’s learning history. Developing robust
methodsoftappingsubjective noveltyininfantlookingtimetasks,in
particular individual differences, is therefore critical to understanding
thecomplexdynamicsofearlylearning.
Thesesimulationsofferimportantpredictionsforfutureworkin
infantcuriosity.First,themodelshowsthatbasedonin-the-moment
decisions about what aspect of the environment to learn from, learn-
ing can be maximal. Given recent work showing that infants can
explicitlystructure theirlearningenvironment byasking their care-
givers forhelp (Goupil etal., 2016), this suggests that infants may
also implicitly optimize their own learning (for an early empirical
testofthis predction, see Twomey,Malem,&Westermann, 2016).
Second, in line with looking time studies showing that infants se-
lectinformationsystematically(Kiddetal.,2012, 2014), the model
chosestimuliofintermediate objective complexity.However,anal-
yses of the sequences chosen by the model predict that rather than
Trial number
Order A (chosen
× 1)
Order B (chosen
× 5)
Order C (chosen
× 11)
Order D (chosen
× 7)
ED Rank ED Rank ED Rank ED Rank
1 – – – – – – – –
21.5885 1/7 1.5885 1/7 1.5885 1/7 1.5885 1/7
3 1.0974 3/6 1.0974 3/6 0.3971 6/6 0.3971 6/6
41.5885 1/5 1.5885 1/5 0.7942 3/5 0.7942 3/5
50.8717 3/4 0.904 2/4 0.904 1/4 0.904 1/4
60.5487 3/3 0.7942 1/3 1.5885 1/3 1.5885 1/3
7 0.7942 1/2 0.5742 1/2 1.1914 1/2 1.1914 1/2
80.5487 – 0.7942 – 0.7942 – 0.7942 –
TABLE2 Euclideandistances(ED)
between successive stimuli for sequences
chosen in the curiosity condition of
Experiment2
FIGURE6 Plot of the curiosity function, (i − o)o(1 − o)
|
11 of 13
TWOMEY and WESTERMann
seekingout intermediatecomplexityateach learningevent,infants
may switch systematically between more and less objectively com-
plexstimuliinthepursuitofmaximalsubjectivenovelty.Third,then,
our account goes further than classical theories in which curiosity is
viewedaseitheranovelty-seekingor a novelty-minimizing behav-
ior (e.g., Loewenstein, 1994). Rather, our model predicts that infants’
visualexploration shouldexhibitboth noveltyseeking and novelty-
minimizing components when novelty is viewed objectively, unifying
these theories in a single mechanism.
4.2 | A new approach to computational curiosity in
visual exploration
Thiswork contributes tocomputationalresearchin intrinsic motiva-
tionbymodelingcuriosityusingthemechanismsinherentintheexist-
ingmodel based onin-the-moment,local decision-making without a
separate, top- down system for monitoring learning progress and/or
reward.Existingcomputationalandroboticsystemstypicallysimulate
reward as generated by a discrete, engineered module that calculates
areward value usingtask-specificcomputations. Our modeldeparts
from this approach, showing that domain- general mechanisms can
produce the motivation to learn, performing a similar function to re-
wardwithout requiringa separatemodule; thatis,in ourmodel, “re-
ward” is part of the algorithm itself. Overall, then the current work
offersanexplicitimplementationofcuriosityininfants’visualexplora-
tion, and offers a broader account of the cognitive mechanisms that
may drive curiosity: learning that integrates a search for subjective
novelty modulated by the learner’s plasticity. Here, intrinsically mo-
tivatedinformation selection emerges fromwithinthe model byex-
ploitingitslearningmechanisminawaythatoptimizesthereduction
ofdiscrepancybetweenexpectationandexperience.
Overall, this neurocomputational model provides the first formal
account of curiosity- based learning in human infants, integrating sub-
jective novelty and intrinsic motivation mechanisms in a single model.
Themodelisbasedontheviewthatearlylearningisanactiveprocessin
which infants select information to construct their own optimal learning
environment, and it provides a parsimonious mechanism by which this
learningcantakeplace.Clearly,ourmodelisrestrictedtovisualexplo-
ration;thus,investigatingwhetherthesemechanismsgeneralizetoem-
bodiedlearningsituationsisanexcitingavenueforfuturework.Equally,
it is possible that another one of the many potential mechanisms for
intrinsically motivated learning may take over later in development,
particularly once metacognition is established and language begins in
earnest(e.g.,Gottlieb,Oudeyer,Lopes,&Baranes,2013).Nonetheless,
the current implementation of curiosity not only provides novel insight
intoinfantcuriosity-drivencategorylearningandmakespredictionsfor
futureworkbothinandoutsidethelab,butalsooffersanewmechanis-
tic theory of early intrinsically motivated visual learning.
ACKNOWLEDGEMENTS
This work was supported by the ESRC International Centre for
Language and Communicative Development (LuCiD), an ESRC
Future Research Leaders fellowship to KT and a British Academy/
Leverhulme Trust Senior Research Fellowship to GW. The support
of the Economic and Social Research Council (ES/L008955/1; ES/
N01703X/1)is gratefully acknowledged. Data and scripts are avail-
able on request from the authors. Portions of these data were pre-
sented at the 2015 5th International Conference on Development
andLearning and on EpigeneticRobotics,Providence, Rhode Island,
USA.
ORCID
Katherine E. Twomey http://orcid.org/0000-0002-5077-2741
Gert Westermann http://orcid.org/0000-0003-2803-1872
REFERENCES
Althaus, N., & Mareschal, D. (2013). Modeling cross-modal interac-
tions in early word learning. IEEE Transactions on Autonomous Mental
Development, 5, 288–297.
Baldassarre,G.,Stafford,T.,Mirolli,M.,Redgrave,P.,Ryan,R.M.,&Barto,A.
(2014). Intrinsic motivations and open- ended development in animals,
humans,androbots:Anoverview.Frontiers in Psychology, 5,985.
Baranes, A., & Oudeyer, P.-Y. (2013). Active learning of inverse models
with intrinsically motivated goal exploration in robots. Robotics and
Autonomous Systems, 61, 49–73.
Begus,K., Gliga, T.,& Southgate,V.(2014). Infants learnwhat theywant
to learn: Responding to infant pointing leads to superior learning. PLoS
ONE, 9, e108817.
Berlyne,D.E.(1960).Conflict, arousal, and curiosity.NewYork:McGraw-Hill.
Bornstein,M.H.,&Mash,C.(2010).Experience-basedandon-linecatego-
rizationofobjectsinearlyinfancy.Child Development, 81, 884–897.
Botvinick,M.M.,Niv,Y.,&Barto,A.C.(2009).Hierarchicallyorganizedbe-
haviorand its neural foundations: A reinforcementlearning perspec-
tive. Cognition, 113, 262–280.
Bruner,J.D.,Goodnow,J.J., &Austin, G.A. (1972). Categories and cogni-
tion. In J.P. Spradley(Ed.), Culture and cognition (pp. 168–190). New
York:Chandler.
Bulf,H.,Johnson,S.P.,&Valenza,E.(2011).Visualstatisticallearninginthe
newborn infant. Cognition, 121, 127–132.
Capelier-Mourguy,A.,Twomey,K.E.,& Westermann,G.(2016,August). A
neurocomputational model of the effect of learned labels on infants’ object
representations.Posterpresentedatthe38thAnnualCognitiveScience
SocietyMeeting,Philadelphia,PA.
Colunga,E.,&Smith,L.B.(2003).Theemergenceofabstractideas:Evidence
fromnetworksandbabies.Philosophical Transactions of the Royal Society
of London Series B- Biological Sciences, 358,1205–1214.
Cottrell,G.W.,&Fleming,M. (1990).Facerecognitionusing unsupervised
feature extraction. In Proceedings of the International Neural Network
Conference(pp.322–325),Paris,France.Dordrecht:Kluwer.
Fantz,R.L.(1964).Visualexperienceininfants:Decreasedattentionfamil-
iar patterns relative to novel ones. Science, 146, 668–670.
Fantz, R.L., Ordy, J.M., & Udelf, M.S. (1962). Maturation of pattern vi-
sionin infants duringthe first sixmonths.Journal of Comparative and
Physiological Psychology, 55, 907–917.
Ferguson, K.T., Kulkofsky, S., Cashon, C.H., & Casasola, M. (2009). The
development ofspecialized processing of own-race faces in infancy.
Infancy, 14, 263–284.
Frank,M., Leitner,J.,Stollenga, M., Förster,A.,& Schmidhuber,J. (2014).
Curiosity driven reinforcement learning for motion planning on human-
oids. Frontiers in Neurorobotics, 7,25.
French, R.M., Mareschal, D., Mermillod, M., & Quinn, P.C. (2004).
The role of bottom-up processing in perceptual categorization by
12 of 13
|
TWOMEY and WESTERMann
3-to4-month-oldinfants:Simulationsanddata.Journal of Experimental
Psychology: General, 133, 382–397.
Gershkoff-Stowe,L., & Rakison, D.H. (2005). Building object categories in
developmental time.Mahwah,NJ:PsychologyPress.
Gliga, T., Volein, A., & Csibra, G. (2010).Verbal labels modulate percep-
tual object processing in 1- year- old children. Journal of Cognitive
Neuroscience, 22, 2781–2789.
Gliozzi,V.,Mayor,J.,Hu,J.F.,&Plunkett,K.(2009).Labelsasfeatures(not
names) for infant categorization: A neurocomputational approach.
Cognitive Science, 33, 709–738.
Gottlieb,J., Oudeyer,P.-Y.,Lopes,M.,&Baranes, A. (2013). Information-
seeking, curiosity, and attention: Computational and neural mecha-
nisms. Trends in Cognitive Sciences, 17,585–593.
Goupil,L.,Romand-Monnier,M.,&Kouider,S.(2016).Infantsaskforhelp
whentheyknowtheydon’tknow.Proceedings of the National Academy
of Sciences, USA, 113, 3492–3496.
Hebb, D. (1949). The organization of behavior: A neuropsychological theory.
NewYork:Wiley.
Hinton,G.E.,&Salakhutdinov,R.R.(2006).Reducingthedimensionalityof
datawithneuralnetworks.Science, 313,504–507.
Horst,J.S.,Oakes,L.M.,& Madole,K.L.(2005). Whatdoesit looklikeand
whatcanitdo?Categorystructureinfluenceshowinfantscategorize.
Child Development, 76, 614–631.
Hurley, K.B., Kovack-Lesh, K.A., & Oakes, L.M. (2010). The influence of
pets on infants’ processing of cat and dog images. Infant Behavior and
Development, 33, 619–628.
Hurley,K.B., &Oakes,L.M. (2015). Experienceanddistribution of atten-
tion: Petexposure and infants’ scanning of animal images. Journal of
Cognition and Development, 16, 11–30.
Kagan,J.(1972).Motivesanddevelopment.Journal of Personality and Social
Psychology, 22,51–66.
Kidd,C.,&Hayden,B.Y.(2015).Thepsychologyandneuroscienceofcuri-
osity. Neuron, 88, 449–460.
Kidd,C.,Piantadosi,S.T.,&Aslin,R.N.(2012).TheGoldilockseffect:Human
infants allocate attention to visual sequences that are neither too sim-
plenortoocomplex.PLoS ONE, 7, e36399.
Kidd,C., Piantadosi,S.T.,&Aslin,R.N. (2014).TheGoldilockseffectinin-
fant auditory attention. Child Development, 85,1795–1804.
Kinney,D.K.,&Kagan,J.(1976). Infantattentionto auditorydiscrepancy.
Child Development, 47,155–164.
Kovack-Lesh, K.A., McMurray,B., & Oakes, L.M. (2014). Four-month-old
infants’visualinvestigationofcatsanddogs:Relationswithpetexperi-
ence and attentional strategy. Developmental Psychology, 50, 402–413.
Kovack-Lesh, K.A., & Oakes, L.M. (2007). Holdyour horses: How expo-
sure to different items influences infant categorization. Journal of
Experimental Child Psychology, 98, 69–93.
Lefort,M.,&Gepperth,A.(2015).Active learning of local predictable represen-
tations with artificial curiosity.Paperpresentedatthe5thInternational
ConferenceonDevelopmentandLearningandonEpigeneticRobotics,
Providence, RI.
Loewenstein,G.(1994).Thepsychologyofcuriosity:Areviewandreinter-
pretation. Psychological Bulletin, 116,75–98.
Lonini,L.,Forestier,S.,Teulière,C.,Zhao,Y.,Shi,B.E.,&Triesch,J.(2013).
Robust active binocular vision through intrinsically motivated learning.
Frontiers in Neurorobotics, 7, 20.
Mareschal, D.,& French, R. (2000). Mechanisms of categorization in in-
fancy. Infancy, 1,59–76.
Mareschal,D.,&Thomas,M.S.C.(2007).Computationalmodelingindevel-
opmental psychology. IEEE Transactions on Evolutionary Computation,
11,137–150.
Marsland,S.,Nehmzow,U.,&Shapiro,J.(2005).On-linenoveltydetection
for autonomous mobile robots. Robotics and Autonomous Systems, 51,
191–206.
Mather, E. (2013). Novelty, attention, and challenges fordevelopmental
psychology. Frontiers in Psychology, 4, 491.
Mather,E., & Plunkett,K.(2011).Sameitems, different order: Effectsof
temporalvariabilityoninfantcategorization.Cognition, 119, 438–447.
Munakata,Y.,&McClelland,J.L.(2003).Connectionistmodelsofdevelop-
ment. Developmental Science, 6, 413–429.
Murakami,M., Kroger,B., Birkholz, P.,& Triesch,J.(2015). Seeing [u] aids
vocal learning: Babbling and imitation of vowels using a 3D vocal tract
model, reinforcement learning, and reservoir computing. Paper presented
atthe5thInternationalConferenceonDevelopmentandLearningand
onEpigeneticRobotics,Providence,RI.
Oakes,L.M., Kovack-Lesh,K.A.,& Horst,J.S. (2009). Twoarebetter than
one: Comparison influences infants’ visual recognition memory. Journal
of Experimental Child Psychology, 104, 124–131.
Oudeyer,P.-Y.,&Kaplan,F.(2007).Whatisintrinsicmotivation?Atypology
of computational approaches. Frontiers in Neurorobotics, 1, 6.
Oudeyer,P.-Y.,Kaplan,F.,&Hafner,V.V.(2007).Intrinsicmotivationsystems
for autonomous mental development. IEEE Transactions on Evolutionary
Computation, 11(2),265–286.
Oudeyer, P.-Y., & Smith, L.B. (2016). Howevolution may work through
curiosity- driven developmental process. Topics in Cognitive Science, 8,
492–502.
Piaget, J. (1952). The origins of intelligence in children (Vol.8). New York:
InternationalUniversityPress.
Plunkett,K.,Sinha,C.,Møller,M.F.,&Strandsby,O.(1992).Symbolground-
ingortheemergenceofsymbols?Vocabularygrowthinchildrenanda
connectionist net. Connection Science, 4, 293–312.
Quinn,P.C.,Eimas,P.D.,&Rosenkrantz,S.L.(1993).Evidenceforrepresen-
tations of perceptually similar natural categories by 3- month- old and
4- month- old infants. Perception, 22,463–475.
Rakison,D.H.(2004).Infants’sensitivitytocorrelationsbetweenstaticand
dynamic featuresin a category context. Journal of Experimental Child
Psychology, 89, 1–30.
Rakison,D.H., & Butterworth, G.E. (1998).Infants’use of objectpartsin
earlycategorization.Developmental Psychology, 34, 49–62.
Rescorla,R.A.,&Wagner,A.R.(1972).AtheoryofPavlovianconditioning:
Variationsintheeffectivenessofreinforcementandnonreinforcement.
In A.H. Black & W.F.Prokasy (Eds.), Classical Conditioning II: Current
Research and Theory(pp.64–99).NewYork:Appleton-Century-Crofts.
Ribas-Fernandes,J.J.F.,Solway,A.,Diuk,C.,McGuire,J.T.,Barto,A.G.,Niv,
Y.,& Botvinick, M.M. (2011). A neuralsignature of hierarchical rein-
forcement learning. Neuron, 71, 370–379.
Rogers,T.T.,&McClelland,J.L.(2008).Precisofsemanticcognition:Apar-
allel distributed processing approach. Behavioral and Brain Sciences, 31,
689–749.
Rumelhart,D.E.,Hinton,G.E.,&Williams,R.J.(1986).Learningrepresenta-
tionsbyback-propagatingerrors.Nature, 323,533–536.
Schlesinger,M. (2013). Investigatingthe origins ofintrinsicmotivation in
human infants. In G. Baldassare& M. Mirolli (Eds.), Intrinsically moti-
vated learning in natural and artificial systems (pp. 367–392). Berlin:
Springer.
Schlesinger,M.,&Amso,D. (2013).Imagefree-viewingasintrinsically-
motivatedexploration:Estimatingthelearnabilityofcenter-of-gaze
image samples in infants and adults. Frontiers in Psychology, 4, 802.
Seepanomwan,K.,Caligiore,D.,Cangelosi,A.,&Baldassarre,G.(2015).The
roleofintrinsicmotivationsinthedevelopmentoftooluse:Astudyin
infant robots. Cognitive Processing, 16,S100–S100.
Sokolov, E.N. (1963). Perception and the conditioned reflex. New York:
Macmillan.
Son,J.Y.,Smith,L.B., & Goldstone, R.L. (2008). Simplicityandgeneraliza-
tion: Short-cutting abstraction in children’s object categorizations.
Cognition, 108, 626–638.
Thelen,E.,&Smith,L.B.(1994).A dynamic systems approach to the develop-
ment of cognition and action.Cambridge,MA:MITPress.
Thomas, M., & Karmiloff-Smith,A. (2003). Connectionist models of de-
velopment, developmental disorders, and individual differences.
In R.J. Sternberg,J. Lautrey & T.Lubart (Eds.), Models of intelligence:
|
13 of 13
TWOMEY and WESTERMann
International perspectives (pp. 133–150). Washington, DC: American
PsychologicalAssociation.
Twomey, K.E., Malem, B., & Westermann, G. (2016, May). Infants’ in-
formation seeking in a category learning task. In In K.E. Twomey
(chair), Understanding infants’ curiosity-based learning: Empirical and
computational approaches. Symposium presented at the XX Biennial
InternationalConferenceonInfantStudies,NewOrleans,LA.
Twomey,K.E.,Ranson,S.L.,&Horst,J.S.(2014).That’smorelikeit:Multiple
exemplars facilitateword learning. Infant and Child Development, 23,
105–122.
Twomey,K.E.,&Westermann,G.(2017).Labelsshape pre-speechinfants’
object representations. Infancy, https://doi.org/10.1111/infa.12201
Vygotsky,L.S.(1980).Mind in society: The development of higher psychologi-
cal processes.Cambridge,MA:HarvardUniversityPress.
Westermann, G., & Mareschal, D. (2004). From parts to wholes:
Mechanisms of development in infant visual object processing.
Infancy, 5,131–151.
Westermann,G., & Mareschal, D. (2012). Mechanisms ofdevelopmental
changeininfantcategorization.Cognitive Development, 27, 367–382.
Westermann, G., & Mareschal,D. (2014). From perceptual to language-
mediatedcategorization.Philosophical Transactions of the Royal Society
B: Biological Sciences, 369, 20120391.
Younger, B.A. (1985). The segregation of items into categories by ten-
month- old infants. Child Development, 56,1574–1583.
Younger, B.A., & Cohen, L.B. (1983). Infant perception of correlations
among attributes. Child Development, 54,858–867.
Younger, B.A., & Cohen, L.B. (1986). Developmental change in infants’
perception of correlations among attributes. Child Development, 57,
803–815.
How to cite this article:TwomeyKE,WestermannG.
Curiosity- based learning in infants: a neurocomputational
approach. Dev Sci. 2017;e12629. https://doi.org/10.1111/
desc.12629