Content uploaded by Alison Gopnik

Author content

All content in this area was uploaded by Alison Gopnik on Sep 13, 2014

Content may be subject to copyright.

Children’s Imitation of Action

Sequences is Inﬂuenced by Statistical Evidence and Inferred Causal Structure

Daphna Buchsbaum, Alison Gopnik, Thomas L. Grifﬁths

{daphnab, gopnik, tom_grifﬁths}@berkeley.edu

Department of Psychology, University of California, Berkeley, Berkeley, CA 94720 USA

Abstract

Children are ubiquitous imitators, but how do they decide

which actions to imitate? One possibility is that children

might learn which actions are necessary to reproduce by

observing the contingencies between action sequences and

outcomes across repeated observations. We deﬁne a Bayesian

model that predicts that children will decide whether to imitate

part or all of a sequence based on the pattern of statistical

evidence. To test this prediction, we conducted an experi-

ment in which preschool children watched an experimenter

repeatedly perform sequences of varying actions followed by

an outcome. Children’s imitation of sequences that produced

the outcome increased, in some cases resulting in production

of shorter sequences of actions that the children had never

seen performed in isolation. This behavior is consistent with

our model’s predictions, and suggests that children attend to

statistical evidence in deciding which actions to imitate, rather

than obligately imitating successful actions.

Keywords: Cognitive development; Imitation; Statistical

learning; Causal inference; Bayesian inference

Introduction

Learning the causal relationships between everyday se-

quences of actions and their outcomes is a daunting task.

How do you transform a package of bread, a jar of peanut

butter and a jar of jelly into a peanut butter and jelly sand-

wich? Do you cut the bread in half before or after you put

together the sandwich? Can you put the peanut butter on ﬁrst,

or does it always have to be jelly ﬁrst? In order to achieve

desired outcomes – from everyday goals such as eating a

tasty sandwich to distinctive human abilities such as making

and using tools – children need to solve a challenging causal

learning problem: observing that the intentional actions

of others lead to outcomes, inferring the causal relations

between those actions and outcomes, and then using that

knowledge to plan their own actions.

To learn from observation in this way, children cannot sim-

ply mimic everything they see. Instead, they must segment

actions into meaningful sequences, and determine which

actions are relevant to outcomes and why. Recent studies

of imitation in children have produced varying answers to

the question of whether children are capable of solving this

problem. While children sometimes selectively reproduce

the most obviously causally effective actions (Williamson,

Meltzoff, & Markman, 2008; Schulz, Hooppell, & Jenkins,

2008), at other times they will “overimitate”, reproducing

apparently unnecessary parts of a causal sequence (Whiten,

Custance, Gomez, Teixidor, & Bard, 1996; Lyons, Young,

& Keil, 2007), or copying an actor’s precise means, when

a more efﬁcient action for accomplishing the same goal is

available (Meltzoff, 1995). Sometimes children may produce

both kinds of behavior in the same study. In the “rational

imitation” studies by Gergely, Bekkering, and Kiraly (2002),

children saw an experimenter activate a machine with hands

free or hands conﬁned. Children both produced exact imita-

tions of the actor (touching their head to a machine to make it

go) and produced more obviously causally effective actions

(touching the machine with a hand), though the proportion

of such actions differed in the different intentional contexts.

We suggest that these different results reﬂect the multiple

sources of information that contribute to a rational statistical

inference about causally effective actions. Children need

to balance their prior knowledge about causal relations, the

new evidence that is presented to them by the adult, and

their knowledge of the adult’s intentions. Moreover, in the

case of imitation there is often no single “right answer”

to the question of what to imitate. After all, a longer

“overimitation” sequence might actually be necessary to

bring about an effect, though that might seem unlikely at

ﬁrst. The imitation problem can be expressed as a problem

of Bayesian inference, with Bayes’ rule indicating how

children might combine these factors to formulate different

causal hypotheses and produce different action sequences

based on those hypotheses. It is difﬁcult to test this idea

however, without knowing the strength of various causal

hypotheses for the children. Since previous studies involved

general folk physical and psychological knowledge (such as

removing a visibly ineffectual bolt to open a puzzle box) it is

difﬁcult to know how strong those hypotheses would be. By

giving children statistical information supporting different

hypotheses we can normatively determine how probable dif-

ferent hypotheses should be, and then see whether children’s

imitation reﬂects those probabilities.

It is also independently interesting to explore the role of

statistical information in imitation. Recent studies show

that children are surprisingly sophisticated in their use of

statistical information such as conditional probabilities in

a range of domains, from phonology (Saffran, Aslin, &

Newport, 1996), to visual perception (Fiser & Aslin, 2002;

Kirkham, Slemmer, & Johnson, 2002), to word meaning (Xu

& Tenenbaum, 2007). Such information plays a particularly

important role in both action processing (Zacks et al., 2001;

Baldwin, Andersson, Saffran, & Meyer, 2008; Buchsbaum,

Grifﬁths, Gopnik, & Baldwin, 2009) and causal inference

(Gopnik et al., 2004; Gopnik & Schulz, 2007), and allows

adults to identify causal subsequences within continuous

streams of action (Buchsbaum et al., 2009). Varying the

probabilities of events within action sequences may thus

provide a way to vary the statistical evidence those sequences

provide in favor of different causal hypotheses.

Statistical inference might be particularly important to

Observed Action Sequence Potential Causal Sequences

ABC+ ABC, BC, C

DBC+ DBC, BC, C

Total Potential Causes ABC, DBC, BC, C

Table 1: Example demonstrations, and the associated set

of potential causal sequences. Letters represent unique

observed actions, a + indicates a causal outcome.

imitation because it could allow children to not only deter-

mine the causal relationship between action sequences and

outcomes, but to identify irrelevant actions within causally

effective sequences. Imagine that I am making a peanut

butter sandwich, and that between opening the jar, and

spreading the peanut butter, I get peanut butter on my hands,

so I wipe them on a paper towel. If this is the ﬁrst time

you’ve seen me make a sandwich, you might mistakenly

think that hand-wiping is a necessary step. However, after

watching me make a sandwich a couple of times, you might

notice that while opening the jar always predicts spreading

the peanut butter, it doesn’t always predict hand-wiping, and

could infer that this step is extraneous. In most previous

work on children’s imitation of casual sequences, children

observed only a single demonstration of how to generate the

outcome (e.g. Whiten et al., 1996; Lyons et al., 2007).

In this paper, we look at whether children use statistical

evidence from repeated demonstrations to infer the correct

causal actions within a longer sequence and imitate them. We

present a Bayesian analysis of causal inference from repeated

action sequence demonstrations, followed by an experiment

investigating children’s imitative behavior and causal infer-

ences. We showed preschool children different sequences of

three actions followed by an effect, using our Bayesian model

to guide our manipulation of the probabilistic evidence, such

that the statistical relations between actions and outcomes

differed across conditions in ways that supported different

causal hypotheses. We then examine which sequences

the children produced themselves, and compare children’s

performance to our model’s predictions. We conclude by

discussing our results in the context of broader work on

imitation, and causal and intentional inference.

Bayesian Ideal Observer Model

In many real world situations, the causal structure of a

demonstrated sequence of actions is not fully observable. In

particular, which actions are causally necessary and which

are superﬂuous may be unclear. One way children may

overcome this difﬁculty is through repeated observations. By

watching someone make a sandwich or turn on a lightbulb

on multiple occasions, children can pick up on which actions

consistently predict the desired outcome, and which do not.

While it is intuitively plausible that children can use the

statistical evidence in repeated demonstrations to infer causal

structure, we would like to verify that normative inferences

from repeated observations of action sequences and their

outcomes vary in a systematic way with different patterns of

data. One way to derive what the normative distribution over

ABC

Effect

DBC BC C

ABC DBC BC

Effect

C

ABC

Effect

DBC BC C

ABC

Effect

DBC BC C

Figure 1: A subset of the hypothesis space. Each box repre-

sents a hypothesis about which action sequences are causal.

causes should be is through a Bayesian model (Gopnik et al.,

2004; Grifﬁths & Tenenbaum, 2005). The Bayesian formal-

ism provides a natural way to represent the roles of children’s

prior assumptions and the observed data in forming their

beliefs about which action sequences are likely to be causal.

Model Details

Given observations of several sequences of actions, we

assume that children consider all sequences and terminal

subsequences as potentially causal. These include both

sequences that generate the outcome and those that do not.

For instance, if the sequence “squeeze toy, knock on toy, pull

toy’s handle” is observed, then squeeze, followed by knock,

followed by pull handle would be one possible causal se-

quence, and knock followed by pull handle would be another.

Given all of the observed sequences, we can enumerate the

potential causes (see Table 1 for an example set of demonstra-

tions and potential causes). As in previous work on children’s

causal inference, we use a Deterministic-OR model (c.f.

Cheng, 1997; Pearl, 1996), in which any of the correct

sequences will always bring about the effect. To capture the

intuition that there may be more than one sequence of actions

that can bring about an effect, we consider all of the potential

causes (such as in Table 1), as well as all disjunctions of

these causes. The base causes, together with the disjunctions

form the space of potential hypotheses, H (see Figure 1).

The learner wants to infer the set of causes, h, given

the observed data, d, where the data are composed of an

observed sequence of actions, a, and an outcome, e. Bayes’

theorem provides a way to formalize this inference. Bayes’

theorem relates a learner’s beliefs before observing the data,

their prior p(h), to their beliefs after having observed the

data, their posterior p(h|d),

p(h|d) ∝ p(d|h)p(h), (1)

where p(d|h) is the probability of observing the data given

the hypothesis is true. For Deterministic-OR causal models,

this value is 1 if the sequence is consistent with the hypoth-

esis, and zero otherwise. For example, given the hypothesis

that squeeze is the cause, a consistent observation would be,

knock then squeeze followed by music, and an inconsistent

observation would be squeeze followed by no music. When

multiple sequences of actions and effects are observed, we

assume that these sequences are independent.

A key element in this inference is the learner’s prior

expectations, p(h). Children could have a variety of different

beliefs about the kinds of sequences that bring about effects.

For instance, they could believe that longer sequences, that

include more of the demonstrated actions, are more likely to

bring about effects. Or, they could believe that there tends to

be only one correct sequence, as opposed to many possible

sequences, that cause an effect. We capture these intuitions

with a prior that depends on two parameters, β and p, which

correspond to the learner’s expectations about the length of

causal sequences and number of ways to generate an effect.

We formalize the prior as a generative model. Hypotheses

are constructed by randomly choosing causal sequences, a.

Each sequence has a probability p

a

of being included in each

hypothesis and a probability (1 − p

a

) of not being included,

p(h) ∝

∏

a∈h

p

a

∏

a∗/∈h

(1 − p

a∗

) (2)

where the probability of including causal sequence a is

p

a

=

1

1 +

1−p

p

exp(−β(|a|−2))

, (3)

and |a| is the number of actions in the sequence a. Values of β

that are greater than 0 represent a belief that longer sequences

are more likely to be causes. Values of p less than 0.5 repre-

sent a belief that effects tend to have fewer causes. Together,

Equations 1, 2 and 3 provide a model of inferring hypotheses

about causes from observed sequences and their effects.

In our experiments, rather than probing children’s beliefs

directly, we allow children to play with the toy. Therefore, to

complete the model, we must specify how children choose ac-

tion sequences, a, based on their observations, d. Intuitively,

we expect that if we know the set of causes of the effect, h, we

will randomly choose one of these actions. If we were unsure

about which of several possible causes was the right one, then

we may choose any of the possible contenders, but biased to-

ward whichever one we thought was most likely. We capture

these intuitions formally by choosing an action given the ob-

served data, p(a|d), based on a sum over possible hypotheses,

p(a|d) ∝

∑

h∈H

p(a|h)p(h|d), (4)

where p(a|h) is one if a is a cause under h, and zero

otherwise, and p(h|d) is speciﬁed in Equation 1.

A Simple Modeling Example

We can now verify that the model makes distinct inferences

from repeated demonstrations. In the ﬁrst example, the

demonstrated action sequences are ABC+, DBC+ as in Table

1. That is, a sequence of three actions A, B and C is followed

by an effect. Subsequently, a different sequence of three ac-

tions, D, B, and C is followed by the same effect. In the sec-

ond example, the observed sequences are ABC+, DBC. Here,

the second three-action sequence is not followed by the effect.

Observed Sequences ABC DBC BC C

ABC+, DBC+ 0.23 0.23 0.27 0.27

ABC+, DBC 1.0 0.0 0.0 0.0

Table 2: Example model results, p = 0.5 and β = 0.

Observed Sequences ABC DBC BC C

ABC+, DBC+ 0.26 0.26 0.35 0.13

ABC+, DBC 1.0 0.0 0.0 0.0

Table 3: Example model results, p = 0.1 and β = 1.0.

Using values of p = 0.5 and β = 0 results in a prior that

assigns equal probability to all possible causal hypotheses –

a uniform prior. With this uniform prior, we can now ﬁnd the

probability of choosing to perform each action sequence to

bring about the effect given the observed data, p(a|d), as de-

scribed in Equation 4. Our model infers that, in the ﬁrst case,

all the sequences are possible causes, with BC and C being

somewhat more likely, and equally probable. Notice that the

model infers that the subsequences BC and C are the most

likely causes, even though neither was observed on its own.

The second case is quite different. Here the model sees that

DBC and its subsequences BC and C did not lead to the effect

in the second demonstration, and infers that ABC is the only

possible cause among the candidate sequences (see Table 2).

We now use values of p = 0.1 and β = 1.0 leading the

model to favor simpler hypotheses containing fewer causes,

and causes that use more of the observed demonstration.

1

This prior does not change results in the second case, where

ABC is still the only possible cause. However, in the ﬁrst

case, the model now infers that the subsequence BC is the

most likely individual cause, since it is the longest observed

sequence to consistently predict the effect (see Table 3).

Model Predictions for Children’s Inferences

Our rational model makes differential predictions based

on repeated statistical evidence, and is able to infer sub-

sequences as causal without seeing them performed in

isolation. We can now use the model to help us construct

demonstration sequences that normatively predict selective

imitation in some cases, and “overimitation” in others. If

children are also making rational inferences from variations

in the action sequences they observe, then their choice of

which actions to imitate in order to bring about an effect

should similarly vary with the evidence. We test our predic-

tion that children rationally incorporate statistical evidence

into their decisions to imitate only part of an action sequence

versus the complete sequence in the following section.

Experiment

Method

Participants Participants were 81 children (M = 54

months, Range = 41 − 70 months, 46% female) recruited

from local preschools and a science museum. An additional

1

These parameter values qualitatively ﬁt children’s imitative

behavior, as we discuss later in the paper.

“ABC” Condition “BC” Condition “C” Condition

ABC+ ABC+ ABC+

DEC ADC ADC+

ABC+ DBC+ DBC+

EDC AEC AEC+

ABC+ EBC+ EBC+

Table 4: The demonstration sequences for “ABC” , “BC”

and “C” conditions. Each child observed the experimenter

performing all 5 action sequences in their condition.

18 children were excluded from the study because of demon-

stration error (4), equipment failure (3), lack of English

(1), unavailable birth date (1), did not try toy (6), extreme

distraction (2), never performed trial termination action (1).

Stimuli There were two novel toys: a blue ball with

rubbery protuberances, and a stuffed toy with rings and tabs

attached to it. Six possible actions could be demonstrated on

each toy. Children were assigned to one of three experimental

conditions. In each condition, they saw a different pattern

of evidence involving ﬁve sequences of action and their

outcomes. Each individual action sequence was always three

actions long. In the “ABC” pattern, the same sequence of

three actions (e.g. A=Knock, B=Stretch, C=Roll) is followed

by a musical effect three times, while in the “BC” pattern a

sequence composed of a different ﬁrst action, followed by

the same two-action subsequence (e.g. A=Squish, B=Pull,

C=Shake and D=Flip, B=Pull, C=Shake) is followed by the

effect three times (see Table 4). In both patterns, two addi-

tional sequences that end in C and do not contain BC fail to

produce the effect. Finally, in the “C” pattern the sequences

of actions were identical to those in the “BC” pattern, but

the outcome was always positive. The number of times each

individual action is demonstrated in each sequence position is

identical in all three patterns. As we show later in the paper,

our Bayesian ideal observer model conﬁrms that the statistical

evidence in each pattern supports different causal inferences.

Procedure The experimenter showed the child one of the

toys, and said: “This is my new toy. I know it plays music,

but I haven’t played with it yet, so I don’t know how to make

it go. I thought we could try some things to see if we can

ﬁgure out what makes it play music.” The experimenter

emphasized her lack of knowledge, so that the children would

not assume she knew whether or not any of her actions were

necessary. She then demonstrated one of the three patterns

of evidence, repeating each three-action sequence (and its

outcome) twice. The experimenter named the actions (e.g.

“What if I try rolling it, and then shaking it, and then knock-

ing on it?”), acted pleasantly surprised when the toy played

music (“Yay! It played music’!’), or disappointed when it

did not (“Oh. It didn’t go”), and pointed out the outcome

(“Did you hear that song?” or “I don’t hear anything. Do

you hear anything?”). After she demonstrated all ﬁve of the

3-action sequences, she gave the child the toy and said “Now

it’s your turn! why don’t you try and make it play music”.

Throughout the experiment the music was actually triggered

Condition Triplet Double Single Other

“ABC” 20 1 2 4

“BC” 10 7 0 10

“C” 8 0 8 11

Table 5: Number of children producing each sequence type

by remote activation. To keep the activation criteria uniform

across conditions, the toy always played music the ﬁrst time

a child produced the ﬁnal C action, regardless of the actions

preceding it, terminating the trial. Only this ﬁrst sequence of

actions was used in our analysis.

Children were videotaped, and their actions from the time

they were handed the toy to trial termination were coded by

the ﬁrst author, and 80% of the data was recoded by a blind

coder. Coders initially coded each individual action as one of

the six demonstrated actions, or as “novel”. These sequences

were then transferred into an “ABC” type representation, and

subsequently coded as one of four sequence types: Triplet,

Double, Single or Other (deﬁned below). Inter-coder relia-

bility was very high, with 91% agreement on the “ABC” type

representations, and 100% agreement on sequence types.

Results and Discussion

Overall results are shown in Table 5. Children produced

signiﬁcantly different types of sequences across the three

conditions, p < 0.001 (two-sided Fisher’s exact test). We

will discuss results for the “ABC” and “BC” conditions ﬁrst,

and then return to the “C” condition.

Effect of Statistical Evidence on Imitation In their

imitation, children could either exactly reproduce one of the

three-action sequences that had caused the toy to activate

(that is, ABC in the “ABC” condition or ABC, DBC or EBC

in the “BC” condition), or they could just produce BC in

isolation. We refer to these successful three-action sequences

as “triplets”, and to the BC subsequence as a “double”.

Both a triplet and a double reﬂect potentially correct

hypotheses about what caused the toy to activate in both

conditions. It could be that BC by itself causes the toy to

activate in the “ABC” condition and the A is superﬂuous,

or it could be that three actions are necessary in the “BC”

condition, but the ﬁrst action can vary. In both conditions BC

is followed by the effect three times.

If children automatically encode the adult’s successful

actions as causally necessary, then they should exclusively

imitate triplets in both conditions. However, if children are

also using more complex statistical information, they should

conclude that the BC sequence by itself is more likely to be

causal in the “BC” condition than in the “ABC” condition,

and that the triplet sequence is more likely to be causal in the

“ABC” condition than in the “BC” condition. This is in fact

what we found – the number of children producing triplets

and doubles varied by condition, p < 0.01 (two-sided Fisher’s

exact test), and differed signiﬁcantly between the “ABC” and

“BC” conditions p < 0.05 (two-sided Fisher’s exact test).

Effect of Differing Causal Outcomes on Imitation The

pattern of evidence in the “BC” condition is more complex

than in the “ABC” condition. This may have confused chil-

dren, leading them to produce a variety of random actions,

including BC. The “C” condition controls for this possibility.

In this condition the sequences of actions were identical to

those in the “BC” condition, but the outcome was always

positive. As we show later, our Bayesian ideal observer

model conﬁrms that this provided statistical evidence for the

hypothesis that C alone was sufﬁcient to produce the effect.

In all three conditions, imitation of just the ﬁnal C action

in isolation was coded as a “single”. As in the “ABC” and

“BC” conditions, only the subsequence BC was coded as a

double in the in the “C” condition. Also consistent with the

“ABC” and “BC” conditions, in the “C” condition all ﬁve

demonstrated successful sequences (ABC, ADC, DBC, AEC

and EBC) were coded as triplets.

The “C” condition is as complex as the “BC” condition.

However in the “C” condition the ﬁnal action C produced

by itself reﬂects a likely causal hypothesis. If children selec-

tively imitate subsequences based on the data, then children

in the “C” condition should produce C more frequently than

children in the “BC” condition, and children in the “BC” con-

dition should produce BC more frequently than children in

the “C” condition. Our results support this hypothesis. Chil-

dren in the “BC” and “C” conditions differed signiﬁcantly

in the overall types of sequences they produced, p < 0.001

(two-sided Fisher’s exact test), and the number of children

producing doubles and singles in the two conditions also var-

ied signiﬁcantly, p < 0.001, (two-sided Fisher’s exact test).

Performance of “Other” Actions Across all three con-

ditions, children did not just obligately imitate one of the

successful sequences or subsequences they observed – they

also produced new combinations of actions. Overall, the

types of “other” sequences produced did not qualitatively

differ across conditions, and appear to be a mix of ex-

ploratory behavior and genuine errors. There was a trend

towards children in the “BC” and “C” conditions performing

more of these “Other” sequences than children in the “ABC”

condition p = 0.10, (two-sided Fisher’s exact test). This dif-

ference becomes statistically signiﬁcant when two children

who imitated unsuccessful triplets are excluded from the

analysis, p < 0.05, (two-sided Fisher’s exact test). This result

is compatible with ﬁndings that children tend to increase

their exploratory behavior when the correct causal structure

is more ambiguous (Schulz & Bonawitz, 2007; Schulz et al.,

2008). Finally, four children performed completely novel

actions they had never seen demonstrated. All of these

children were in the “BC” or “C” conditions, consistent with

these conditions eliciting more exploratory actions.

Model Results

Supporting our experimental results, our model makes dis-

tinct predictions in each of the three experimental conditions,

showing that the data lead to differential causal inferences.

Parameter values of p = 0.1 and β = 1.0 were chosen be-

cause they produced a qualitatively good match to children’s

performances, as shown in Figure 2. The relatively high

value for β suggests that children prefer longer (complete)

causal sequences, perhaps representing a pre-existing belief

that adults usually don’t perform extraneous actions. The

relatively low value for p suggests that children employ a

causal Occam’s razor, assuming that simpler hypotheses,

which require fewer causes to explain the data, are more

likely. Overall, these results suggest that children’s imitative

choices conform closely to normative predictions.

Finally, while children performed similarly to our model’s

predictions, there were some differences in performance

as well. Children produced more triplets than our model

predicted, especially in the “ABC” condition. One reason

for this discrepancy may be that children are able to use

information about the knowledge state and intentional stance

of the demonstrator that our current model cannot take

into account. Models that can incorporate intentional and

pedagogical information, in addition to statistical evidence

are an important area of future work (Goodman, Baker, &

Tenenbaum, 2009; Bonawitz et al., 2009). We are currently

developing such a model, and exploring the role of peda-

gogical cues in children’s imitation (Buchsbaum, Gopnik,

Grifﬁths, & Shafto, submitted).

General Discussion

In this paper, we examined whether children are sensitive

to statistical evidence in choosing the actions they imitate.

We demonstrated that children can use statistical evidence to

decide whether to imitate a complete action sequence, or to

selectively imitate only a subsequence. In particular, children

in the “ABC” condition imitated the complete sequence

ABC more often than children in the “BC” condition, while

children in the “BC” condition imitated the subsequence BC

more often than children in the “ABC” condition. Children’s

performance in the “C” condition demonstrated that the

differential imitation in the “ABC” and “BC” conditions

could not be explained as a result of task complexity.

The design of this experiment also eliminated other simple

explanations for these results. There were the same absolute

number of BC demonstrations followed by effects in all

three conditions, but children only produced doubles in the

second condition. Similarly, the absolute number of positive

triplet demonstrations was the same in the “ABC” condition

and the “BC” condition, and was smaller than in the “C”

condition, but children produced more triplets in the ﬁrst

condition than in the other two conditions. Finally, the actual

sequence of actions was the same in the “BC” and “C”

conditions but children behaved differently in the two cases.

Children appeared to selectively imitate by considering the

conditional probability of the various events and outcomes,

and formulating a set of causal hypotheses based on that data.

They then produced responses that matched the probability

distribution of the hypotheses, at least qualitatively.

It is also worth noting the information-processing com-

!"

!#$"

!#%"

!#&"

!#'"

("

)*+,-./"

0123-."

4+56-."

!"#$"%&"'()*"'

+,-./.0102)'-3'4-5"1'67--80%9'!"#$"%&"'()*"'

789:7";1<.-"

79:7";1<.-"

7:7";1<.-"

!"

!#$"

!#%"

!#&"

!#'"

("

)*+,-./"

0123-.""

4+56-."

!"#$"%&"'()*"'

+,-*-,:-%'-3'67015,"%'67--80%9'!"#$"%&"'()*"'

789:7:=+-<*.5"

79:7":=+-<*.5"

7:7":=+-<*.5"

Figure 2: Left: Predictions of our Bayesian model. Right: Children’s actual performance in Experiments 1 and 2.

plexity of this task. Children saw thirty similar actions and

ten outcomes in each condition, and yet they appeared to

track and use this information in deciding which actions

to produce. This is consistent with other studies in which

children and adults show surprising if implicit capacities to

track statistical regularities.

These results extend earlier ﬁndings that show children

take causal and intentional information into account appro-

priately in their imitation. They show that children also take

into account statistical information about the conditional

probability of events and do so in an at least roughly norma-

tive way. The studies also suggest a rational mechanism for

the phenomenon of “overimitation” In particular, the “triplet”

responses could be thought of as a kind of overimitation,

reproducing parts of a causal sequence that are not actually

demonstrably necessary for the effect. These results suggest

that this behavior varies depending on the statistics of the data

and the probability of various hypotheses concerning them.

Other factors may also inﬂuence the child’s judgment of

various causal hypotheses. For example, knowing that the

adult is knowledgeable about the causal system, and is taking

a “pedagogical stance” towards the evidence, may lead the

child to different causal conclusions (Bonawitz et al., 2009).

We are currently investigating the effect of pedagogical

cues on imitation of causal action sequences (Buchsbaum

et al., submitted). Similarly, seeing a repeated sequence of

actions with no obvious physical causal outcome may lead

children to suspect that the actions are intended to have a

social or psychological rather than physical effect. Both

these processes might lead to greater “overimitation” which

would nonetheless be rational.

In general however, this study shows that children are

sensitive to statistical information in determining which

sequences of actions to imitate. Along with other studies,

they support the idea that Bayesian procedures of statistical

learning, procedures that allow the construction of causal

models from statistical patterns, may play a signiﬁcant role

in many important kinds of early learning.

Acknowledgments. We thank Pat Shafto and Cari Kaufman for dis-

cussions on the model design, and Kimmy Yung, Mia Krstic, and

Elon Ullman for help with data collection and coding. This mate-

rial is based upon work supported by a National Science Foundation

Graduate Research Fellowship, the McDonnell Foundation Causal

Learning Initiative and Grant FA9550-07-1-0351 from the Air Force

Ofﬁce of Scientiﬁc Research.

References

Baldwin, D., Andersson, A., Saffran, J., & Meyer, M. (2008).

Segmenting dynamic human action via statistical structure.

Cognition, 106, 1382-1407.

Bonawitz, L., Shafto, P., Gweon, H., Chang, I., Katz, S., & Schulz,

L. (2009). The double-edged sword of pedagogy: Modeling the

effect of pedagogy on preschoolers’ exploratory play. Proceed-

ings of the 31st annual meeting of the Cognitive Science Society.

Buchsbaum, D., Gopnik, A., Grifﬁths, T. L., & Shafto, P. (submit-

ted). Children’s imitation of causal action sequences is inﬂuenced

by statistical and pedagogical evidence.

Buchsbaum, D., Grifﬁths, T. L., Gopnik, A., & Baldwin, D.

(2009). Learning from actions and their consequences: Inferring

causal variables from continuous sequences of human action.

Proceedings of the 31st Annual Conference of the Cognitive

Science Society.

Cheng, P. (1997). From covariation to causation: A causal power

theory. Psychological Review, 104, 367-405.

Fiser, J., & Aslin, R. N. (2002). Statistical learning of higher-order

temporal structures from visual shape sequences. Journal of

Experimental Psychology: Learning, Memory and Cognition, 28,

458-467.

Gergely, G., Bekkering, H., & Kiraly, I. (2002). Rational imitation

in preverbal infants. Nature, 415, 755.

Goodman, N. D., Baker, C. L., & Tenenbaum, J. B. (2009). Cause

and intent: Social reasoning in causal learning. Proceedings of

the 31st Annual Conference of the Cognitive Science Society.

Gopnik, A., Glymour, C., Sobel, D., Schulz, L., Kushnir, T., &

Danks, D. (2004). A theory of causal learning in children: Causal

maps and Bayes nets. Psychological Review, 111, 1-31.

Gopnik, A., & Schulz, L. (2007). Causal learning: Psychology,

philosophy, computation. New York: Oxford University Press.

Grifﬁths, T. L., & Tenenbaum, J. B. (2005). Structure and strength

in causal induction. Cognitive Psychology, 51, 354-384.

Kirkham, N. Z., Slemmer, J. A., & Johnson, S. P. (2002). Visual

statistical learning in infancy: evidence of a domain general

learning mechanism. Cognition, 83, B35-B42.

Lyons, D. E., Young, A. G., & Keil, F. C. (2007). The hidden

structure of overimitation. Proceedings of the National Academy

of Sciences, 104, 19751-19756.

Meltzoff, A. N. (1995). Understanding the intentions of oth-

ers: Re-enactment of intended acts by 18-month-old children.

Developmental Psychology, 31, 838-850.

Pearl, J. (1996). Structural and probabilistic causality. In

D. R. Shanks, K. J. Holyoak, & D. L. Medin (Eds.), The psychol-

ogy of learning and motivation: Causal learning (Vol. 34). San

Diego: Academic Press.

Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical

learning by 8-month old infants. Science, 274, 1926-1928.

Schulz, L. E., & Bonawitz, E. B. (2007). Serious fun: Preschoolers

engage in more exploratory play when evidence is confounded.

Developmental Psychology, 43(4), 1045-1050.

Schulz, L. E., Hooppell, C., & Jenkins, A. C. (2008). Judicious imi-

tation: Children differentially imitate deterministically and proba-

bilistically effective actions. Child Development, 79(2), 395-410.

Whiten, A., Custance, D. M., Gomez, J. C., Teixidor, P., & Bard,

K. A. (1996). Imitative learning of artiﬁcial fruit processing

in children (homo sapiens) and chimpanzees (pan troglodytes).

Journal of Comparative Psychology, 110(1), 3-14.

Williamson, R., Meltzoff, A. N., & Markman, E. (2008). Prior ex-

periences and perceived efﬁcacy inﬂuence 3-year-olds’ imitation.

Developmental Psychology, 44, 275-285.

Xu, F., & Tenenbaum, J. B. (2007). Word learning as bayesian

inference. Psychological Review, 114(2).

Zacks, J. M., Braver, T. S., Sheridan, M. A., Donaldson, D. I.,

Snyder, A. Z., Ollinger, J. M., et al. (2001, June). Human brain

activity time-locked to perceptual event boundaries. Nature

Neuroscience, 4(6), 651-655.