Content uploaded by Steven J. Holochwost
Author content
All content in this area was uploaded by Steven J. Holochwost on May 31, 2017
Content may be subject to copyright.
Music Education, Academic Achievement, and Executive Functions
Steven J. Holochwost
WolfBrown, Cambridge, Massachusetts
Cathi B. Propper
University of North Carolina at Chapel Hill
Dennie Palmer Wolf
WolfBrown, Cambridge, Massachusetts
Michael T. Willoughby
RTI International, Research Triangle Park, North Carolina
Kelly R. Fisher
Johns Hopkins University
Jacek Kolacz and Vanessa V. Volpe
University of North Carolina at Chapel Hill
Sara R. Jaffee
University of Pennsylvania
This study examined whether music education was associated with improved performance on
measures of academic achievement and executive functions. Participants were 265 school-age
children (Grades 1 through 8, 58% female, and 86% African American) who were selected by lottery
to participate in an out-of-school program offering individual- and large-ensemble training on
orchestral instruments. Measures of academic achievement (standardized test scores and grades in
English language arts and math) were taken from participants’ academic records, whereas executive
functions (EFs) were assessed through students’ performance on a computerized battery of common
EF tasks. Results indicated that, relative to controls, students in the music education program scored
higher on standardized tests, t(217) ⫽2.74, p⫽.007; earned better grades in English language arts,
t(163) ⫽3.58, p⬍.001, and math, t(163) ⫽2.56, p⫽.011; and exhibited superior performance on
select tasks of EFs and short-term memory. Further analyses revealed that although the largest
differences in performance were observed between students in the control group and those who had
received the music program for 2 to 3 years, conditional effects were also observed on 3 EF tasks
for students who had been in the program for 1 year. These findings are discussed in light of current
educational policy, with a particular emphasis on the implications for future research designed to
understand the pathways connecting music education and EFs.
Keywords: music education, achievement gap, standardized tests, executive functions
Supplemental materials: http://dx.doi.org/10.1037/aca0000112.supp
From antiquity to the present day, proponents of music edu-
cation have argued that instruction in music yields ancillary
benefits to children’s development (cf., Pont, 2004). These
arguments have recently become more urgent, as practitioners
and policymakers have seized upon music education (and arts
education more generally) to foster a broad array of essential
skills in children, and in particular, children at risk. Children at
risk—those from households with relatively few material re-
sources—are less likely to have access to private music instruc-
tion (Duncan & Murnane, 2015;Southgate & Roscigno, 2009)
and are more likely to attend schools where music teachers’
positions have been eliminated (Parsad & Spiegelman, 2012).
The question of who has access to music education has thus
assumed a moral dimension germane to the current debate
Steven J. Holochwost, WolfBrown, Cambridge, Massachusetts; Cathi B.
Propper, Center for Developmental Science, University of North Carolina
at Chapel Hill; Dennie Palmer Wolf, WolfBrown; Michael T. Willoughby,
RTI International, Research Triangle Park, North Carolina; Kelly R.
Fisher, Science of Learning Institute, Johns Hopkins University; Jacek
Kolacz and Vanessa V. Volpe, Department of Psychology, University of
North Carolina at Chapel Hill; Sara R. Jaffee, Department of Psychology,
University of Pennsylvania.
Jacek Kolacz is now at the Traumatic Stress Research Center, Kinsey
Institute, Indiana University. Vanessa V. Volpe is now at Department of
Psychology, Ursinus College.
We would like to thank the children who participated in this study and
the teachers and leadership of the program.
Correspondence concerning this article should be addressed to Steven J.
Holochwost, WolfBrown, 8A Francis Avenue, Cambridge, MA 02138.
E-mail: steven@wolfbrown.com
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Psychology of Aesthetics, Creativity, and the Arts © 2017 American Psychological Association
2017, Vol. 11, No. 2, 147–166 1931-3896/17/$12.00 http://dx.doi.org/10.1037/aca0000112
147
regarding the inequality of educational opportunity in the
United States (Putnam, 2015).
The issue of access has also gained relevance as the policy-
pendulum has, at least in some quarters, begun to swing toward
a more holistic perspective on education after 15 years of more
narrow focus on standardized metrics of academic achievement
(cf. Robinson & Aronica, 2015). Aspects of children’s broader
development, such as prosocial behavior (Kirschner & Toma-
sello, 2010;Schellenberg, Corrigall, Dys, & Malti, 2015) and
nonacademic skills like persistence (Scott, 1992) have been
linked to instruction in music, and thus proponents of music
education have found themselves well-positioned to advocate
for music instruction on the grounds that it benefits the whole
child. Two aspects of development that have received particular
attention are executive functions (EFs)—the foundational cog-
nitive capacities that allow individuals to set and pursue
goals—and the closely related construct of short-term memory
(STM). Research indicating that the “specific features” of mu-
sic education (Schellenberg, 2006, p. 466) are associated with
improved capabilities in these areas has readily been embraced
by those advocating for music education.
However, as we will review below, the body of evidence linking
music education to improved academic achievement, EFs, and
memory is large and complex. Much of the evidence is correla-
tional in nature and controls for alternative explanations of the
relation between music and these domains to varying degrees. This
raises concerns about internal validity, and in particular, if the
experience of music education causes ancillary benefits or merely
co-occurs with them (Schellenberg & Weiss, 2013). Although a
relatively small number of experimental studies do allow for
causal inference, they are often subject to concerns about external
validity: For example, assigning children to brief, standardized
programs of music instruction does not resemble how music is
typically taught in the field, in which instruction is offered for
longer but variable periods of time. Moreover, for reasons of
student attendance, parent permissions, and staff turnover these
more controlled studies are typically most feasible in schools that
serve students who are more ethnically homogenous and affluent
than those at the center of the debate around musical education and
inequality of opportunity, raising additional questions about the
generalizability of results.
In the current study, we sought to address these limitations
through an experimental evaluation of the effects of a music
education program offered to a diverse sample of children
attending a school in a high-risk area of a large U.S. city.
Although children were assigned by lottery to one of a limited
number of spots in the program, the number of years in which
they participated varied. While random assignment allowed us
to evaluate whether participation in the program caused differ-
ences in students’ academic achievement, EFs, and STM, the
natural variability of the program’s dosage allowed us investi-
gate whether there was heterochronicity in these effects without
interfering with program implementation. Moreover, the fact
that there were students new to the program in its most recent
year allowed us to examine whether program effects were
contingent upon students’ academic achievement, EFs, and
STM prior to program entry.
Music Education and Academic Achievement
Correlational Research
Numerous studies have demonstrated an association between
music education and higher levels of academic achievement. For
example, meta-analyses of data from the College Board revealed
significantly higher SAT math (Vaughn & Winner, 2000) and
verbal (Butzlaff, 2000) scores among students who took at least
one music course in high school. The correlation between music
education and academic achievement has been found to extend to
course grades in some cases, but not others. Although participation
in music education has been linked to higher academic grades
among Canadian high-school students (Cabanac, Perlovsky,
Bonniot-Cabanac, & Cabanac, 2013;Gouzouasis, Guhn, &
Kishor, 2007), studies with both German (Bastian, 2000) and
Swiss (Weber, Spychiger, & Patry, 1993) elementary schoolchil-
dren (ages 9 to 12) found no differences in course grades between
students enrolled in music classes and their peers.
These inconsistent results may be explained, in part, by the fact
that these correlational studies did not control for relevant cova-
riates. This is problematic given the known associations between
factors such as gender and ethnicity and academic achievement
(Kinney, 2008). As others have noted (Fitzpatrick, 2006;Schel-
lenberg & Weiss, 2013), socioeconomic status is a particularly
salient potential confounding variable in the relationship between
music education and academic achievement. It is related to both
academic achievement and access to music education, leading
some to speculate that the observed association between music
education and academic achievement may be an epiphenonmenon
explained entirely by the relationship between affluence and music
education (Winner & Cooper, 2000).
Quasi-Experimental Research
The authors of a number of quasi-experimental studies have
sought to address this issue by controlling for socioeconomic
status in their analyses. Work with large, nationally representative
data sets of students in the United States has revealed a relation-
ship between in-school music instruction and higher standardized
test scores among students in elementary (Southgate & Roscigno,
2009) and high school (Miksza, 2010) that is robust to controls for
socioeconomic status and ethnicity. Studies conducted with
smaller samples of Canadian (Schellenberg, 2006), Portugese
(Santos-Luiz, Mónico, Almeida, & Coimbra, 2015), and Swiss
(Wetter, Koerner, & Schwaninger, 2009) elementary schoolchil-
dren yielded similar results: Music instruction was associated with
higher course grades regardless of whether that instruction had
been offered in (e.g., Santos-Luiz et al., 2015) or out of school
(e.g., Wetter et al., 2009), and this association held after including
socioeconomic status as a covariate.
These studies cannot, however, rule out the possibility that selec-
tion bias accounts for the association between music education and
academic achievement. As Schellenberg noted, “high-functioning
children are more likely than other children to take music lessons, and
to perform well on virtually any test they take” (Schellenberg, 2011,
p. 285), and indeed there is evidence to support this account. In two
different samples of students attending elementary school in an urban
school district in the United States, both Fitzpatrick (2006) and
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
148 HOLOCHWOST ET AL.
Kinney (2008) found that students enrolled in music classes exhibited
higher standardized test scores not only while enrolled in those
classes, but also in years prior to enrollment. Elpus (2013) reported
similar findings for SAT scores of high-school students in a large,
nationally representative sample.
Music Education and IQ
Further support for Schellenberg’s argument may be provided
by studies reporting a positive correlation between music educa-
tion and overall cognitive function as indexed by IQ. Studies with
school-aged Canadian (Corrigall, Schellenberg, & Misura, 2013;
Schellenberg, 2011;Schellenberg & Mankarious, 2012) and Ger-
man (Hille, Gust, Bitz, & Kammer, 2011;Roden, Grube, Bongard,
& Kreutz, 2014) children ranging in age from 7 to 12 years have
consistently reported an association between music education
and IQ, and each of these studies save Hille et al. (2011)
controlled for families’ income. If this association can be ex-
plained by Schellenberg’s (2011) argument (i.e., higher-
functioning children choose to pursue musical instruction), then so
too might the observed relationship between music education and
academic achievement.
However—and contrary to this argument—it is also possible
that musical training caused these students to become more intel-
ligent. Consistent with this account, a small number of experimen-
tal studies have indicated that musical training results in modest
increases in cognitive function as indexed by IQ. Studies with
Canadian (Schellenberg, 2004), Iranian (Kaviani, Mirbaha, Pour-
naseh, & Sagan, 2014), and Israeli (Portowitz, Lichtenstein, Ego-
rova, & Brand, 2009) children ranging in age from 5 to 9 years
have demonstrated that students randomly assigned to in- (Kaviani
et al., 2014;Portowitz et al., 2009) or out-of-school (Schellenberg,
2004) musical instruction ranging from 12 weeks to 2 years yield
increases in IQ corresponding to effect sizes of 0.35 (Schellenberg,
2004) to 0.40 (Kaviani et al., 2014). Despite these results, it would
be premature to conclude that music education causes improve-
ment in cognitive function, as other studies have produced null
findings. For example, Moreno and colleagues found that 8-year-
old Portugese children did not exhibit higher IQ following a
6-month course of instruction (Moreno et al., 2009). And although
10-year-old French-Canadian children did exhibit significantly
higher IQ scores after 1 and 2 years of music lessons, these
differences were not observed after 3 years (Costa-Giomi, 1999).
Experimental Research
In summary, although there is ample correlational evidence of a
relationship between music education and academic achievement,
this evidence is subject to limitations imposed by third-factor
explanations (socioeconomic status chief among them) and selec-
tion bias. And though there are findings from some experimental
studies indicating that music education leads to gains in cognitive
function (as indexed by IQ), other studies have not provided
evidence for this effect. In any event, even evidence for a causal
link between music education and cognitive function could not
speak directly to whether music education would yield similar
benefits for academic achievement. To date, only three experimen-
tal studies of which we are aware addressed this question, and they
produced contradictory results. Gardner and his colleagues found
that 5- to 7-year-old children who received a year of musical or
visual arts instruction earned higher standardized test scores in
math than their peers (Gardiner, Fox, Knowles, & Jeffrey, 1996),
whereas Schellenberg (2004) found that 6-year-old children ran-
domly assigned to a year of an out-of-school musical exhibited
higher scores on a standardized test of academic achievement.
However, Costa-Giomi (2004) found no differences in test scores
or grades among 12-year-old children after three years of piano
instruction. Clearly there is a need for additional experimental
research that uses random assignment to investigate if music
education results in higher levels of academic achievement, and, if
so, whether these effects vary as a function of the duration of
instruction.
Music Education and EFs
Like measures of IQ, measures of EFs seek to assess a broad
array of cognitive functions that underlie performance in multiple
domains, including academic achievement. EFs refer to a set of
cognitive processes that are essential to both setting goals and
organizing behavior in the pursuit of those goals (Diamond, 2013).
Although the specific processes said to comprise EFs varies (cf.
Willoughby, Holochwost, Blanton, & Blair, 2014), included in
most definitions are working memory, inhibitory control, and the
ability to flexibly shift attention (also referred to as set shifting;
Miyake et al., 2000;Ursache, Blair, & Raver, 2012). Working
memory includes a central executive that actively manipulates
information, and two short-term storage buffers where this
information is held (the phonological loop and visuospatial
sketchpad; Baddeley, Logie, Bressi, Della Sala, & Spinnler,
1986). Inhibitory control is defined as the ability to inhibit a
prepotent or dominant behavioral response, while attentional
shifting refers to the ability to willfully orient attention to a new
stimulus (Diamond, 2013).
As others have noted, the “specific features” of music education
may foster EFs more readily than other activities (Schellenberg,
2006, p. 466). Practicing music draws heavily on inhibitory and
attentional resources (Jäncke, 2009). This is certainly the case
when aspects of the music being played change (e.g., tempi, key;
Zuk et al., 2014), but even when these are stable, playing music
(particularly in an ensemble setting) requires moment-to-moment
self-monitoring (e.g., correcting intonation), inhibitory control (re-
sisting the urge to play over others; Jentzsch, Mkrtchian, & Kansal,
2014), and shifts in attentional focus (listening to oneself and to
one’s stand partner, reading the music, glancing at the conductor,
and so forth; Loehr, Kourtis, Vesper, Sebanz, & Knoblich, 2013).
As Moreno et al. (2011) noted, the intense training of these
abilities through music may translate to improved abilities outside
of that context due to the frequent and prolonged recruitment of
brain areas that underlie essential cognitive processes across con-
texts (citing Jung & Haier, 2007). These areas include the prefron-
tal cortex, which both functional imaging (Zuk et al., 2014) and
electrophysiological (Fujioka et al., 2006;Moreno et al., 2011)
methods have revealed to be more active during tasks of EFs
among musically trained children than their peers, and which has
been shown to develop more rapidly among musically trained
children (Hudziak et al., 2014).
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
149
MUSIC, ACADEMICS, AND EXECUTIVE FUNCTIONS
Correlational Research
Despite this compelling conceptual account, much of the evi-
dence linking music education to enhanced EFs is correlational in
nature and drawn from research conducted with adults. Studies
conducted with older adults have demonstrated that musicians
outperform their nonmusician peers on tasks of both auditory and
visual inhibitory control and attention (Amer, Kalender, Hasher,
Trehub, & Wong, 2013) and planning (Hanna-Pladdy & Gajewski,
2012). Research with younger adults has produced similar results.
For example, younger adult musicians outperformed their peers on
a Stroop test of auditory inhibitory control (Bialystok & DePape,
2009), whereas Slevc and colleagues found that musical ability
was associated with performance on tests of auditory and visual
working memory (Slevc, Davey, Buschkuehl, & Jaeggi, 2016).
It would be premature, however, to conclude that these results
offer unequivocal correlational evidence of an association between
music education and EFs. In each study cited above there were
other tasks on which musicians did not outperform their peers, and
in many cases differences in performance that were observed on a
task in one study were not observed for that same task in another
study. This uneven pattern of results by task, both within and
between studies, is evident in research investigating the correlation
between music education and EFs in children as well.
In one such study, Zuk and colleagues (2014) found that school-
age children (ages 9 to 12 years) in the United States who had
received at least two years of music training outperformed a group
of musically untrained children on tests of processing speed, verbal
fluency, and set-shifting (a trail-making task), but not on measures
of working memory (backward digit span) or inhibitory control
(color-word Stroop). In contrast, studies of German students found
that musical training was associated with better performance on a
measure of working memory (backward digit span) among chil-
dren ages 7 to 8 (Roden et al., 2014), and that the number of
months for which children had received music lessons predicted
performance on a series of Stroop tasks assessing selective audi-
tory attention and inhibitory control in a sample of 9- to 12-year-
old children (Degé, Kubicek, & Schwarzer, 2011). However, a
third study by Schellenberg (2011) found that although Canadian
children (again, ages 9 to 12) who received music lessons per-
formed better than their musically untrained peers on tasks assess-
ing working memory, similar results were not observed for mea-
sures of phonological fluency, mental flexibility, inhibitory
control, or planning.
As Degé and colleagues noted, these discrepancies may be
attributable to methodological differences among these studies,
including the measures used, the length and intensity of musical
training children received, and the way musical training was coded
in the analyses (e.g., continuously, as in Degé et al., 2011,or
dichotomously, as in the case of Schellenberg, 2011). Moreover,
many of the third-factor explanations at work in the relationship
between music education and academic achievement apply to the
correlation between musical training and EFs. These factors in-
clude students’ socioeconomic status and the varying extent to
which studies controlled for socioeconomic status by matching
participants (as in Zuk et al., 2014) or through statistical means
(Schellenberg, 2011).
Experimental Research
Short of recruiting a prohibitively large sample that could permit
controlling for all potentially relevant covariates, demonstrating a
causal relationship between music education and improved EFs
requires that participants be assigned at random to receive musical
training. Two studies of which we are aware have used this
approach. Bugos and her colleagues found that older adults (ages
60 to 85) who were randomly assigned to receive 6 months of
piano instruction exhibited significantly faster performance on
interference trials for the trial-making task, though no improve-
ment in difference scores was observed (difference scores were
calculated as reaction time [RT] for interference trials minus RT
for noninterference trials; Bugos, Perlstein, McCrae, Brophy, &
Bedenbaugh, 2007). Moreno and colleagues (2011) assigned pre-
school children to a 4-week course of instruction in either music or
visual arts. Although there were no differences in performance on
a task of inhibitory control (a go/no-go task) prior to the course of
instruction, children assigned to music instruction were signifi-
cantly more accurate in their responses following the course.
Music Education and STM
As noted above, the correlational evidence linking music edu-
cation to working memory is mixed, with some studies reporting
an association between musical instruction and performance on
backward-span tasks (e.g., Roden et al., 2014) and others reporting
no such relation (e.g., Zuk et al., 2014). A more modest proposi-
tion is that music education may enhance the capacity to encode
and retrieve information held in STM. This may be due, in part, to
the memorization of musical passages (Schellenberg, 2004), which
involves the processing of information across multiple sensory
modalities (Williamon & Valentine, 2002).
Given the fundamentally aural nature of music, the link between
musical instruction and aspects of auditory memory have been the
focus of numerous studies. In correlational studies with adults
(Chan, Ho, & Cheung, 1998;Jakobson, Cuddy, & Kilgour, 2003;
Jakobson, Lewycky, Kilgour, & Stoesz, 2008) and children (Ho,
Cheung, & Chan, 2003), musicians were found to exhibit better
performance on delayed-recall assessments of verbal memory than
nonmusicians after controlling for relevant confounds. Studies that
have focused more explicitly on verbal STM have yielded similar
results. Among older adults (Fauvel et al., 2014), younger adults
(Franklin et al., 2008;Hansen, Wallentin, & Vuust, 2013;Lee, Lu,
& Ko, 2007), and children (Lee et al., 2007;Roden et al., 2014),
musically trained individuals have been found to outperform their
peers on auditory forward digit-span tasks.
However, the argument has also been made that students of
musical traditions using notation may realize benefits to the more
remote domain of visual memory. Reading notated music from the
page requires the recognition and encoding of simple visual stimuli
in a predetermined order (i.e., left to right; Jakobson et al., 2008).
Correlational studies with older adults (Amer et al., 2013) and
school-age children (Lee et al., 2007) have supported this account,
reporting associations between musical training and better perfor-
mance on forward spatial-span tasks of visual STM. In a quasi-
experimental study that controlled for parental education and in-
come, Bilhartz and her colleagues observed significantly larger
rates of positive change in visual STM among preschoolers who
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
150 HOLOCHWOST ET AL.
had received 30 weeks of musical training relative to their peers
(Bilhartz, Bruhn, & Olson, 1999).
Although some of these studies controlled for third-factor ex-
planations (e.g., Bilhartz et al., 1999;Lee et al., 2007), experimen-
tal evidence establishing a causal link between music education
and either verbal or visual STM is scarce. To date, only the study
conducted by Bugos and her colleagues has suggested such a link:
older adults who were randomly assigned to receive piano instruc-
tion exhibited improvements in verbal STM (digit forward span)
that approached significance (Bugos et al., 2007).
Current Study
As noted earlier in the introduction, the literature regarding the
ancillary benefits of music education to academic achievement,
EFs, and STM is complex and leaves gaps in our understanding.
The largest and most consistent body of evidence comes from
correlational studies that link musical training to higher levels of
academic achievement. However, even this literature is not with-
out contrary findings, and in any event these studies cannot speak
to whether music education is the cause of these benefits, in part
because these studies cannot address self-selection bias. Causal
inference demands random assignment, but to date most experi-
mental studies have focused on the capacity of music education to
benefit overall cognitive function as indexed by IQ. Here too there
are mixed findings, and as noted above the capacity for music
education to benefit IQ does not necessarily mean that parallel
benefits would be observed in the areas of academic achievement,
EFs, or STM. Moreover, most experimental studies have featured
single-dose courses of music education that cannot address the
effects of more sustained courses of music education that vary in
duration (see Costa-Giomi, 1999, for an exception). Finally, given
that many correlational and most experimental studies were per-
formed with homogenous samples of students of European de-
scent, the extent to which their findings can be generalized to
students attending school in large, urban districts in the United
States is unclear.
It was therefore the purpose of this study to examine the effects
of enrollment in a program of music education on academic
achievement, EFs, and STM in a majority-minority sample of
school-age children. The design of the program as implemented
(see below) featured assignment to an oversubscribed music edu-
cation program by lottery and natural variability in program dos-
age, allowing us to address three specific research questions while
maintaining the ecological validity of the study setting: First, is
there any effect of program enrollment on students’ academic
achievement, EFs, or STM? We hypothesized that students en-
rolled in the program would exhibit higher levels of academic
achievement and improved performance on measures of EFs and
STM. Second, are there dosage effects for program enrollment,
such that longer periods of enrollment in the program are associ-
ated with better performance on measures of academic achieve-
ment, EFs, and STM? We made the exploratory hypothesis that
such effects would be observed, based in part on the results of prior
studies demonstrating an association between the duration of mu-
sical training and academic achievement as indexed by standard-
ized test scores and grades (Corrigall et al., 2013;Schellenberg,
2006). Third, for the subset of students new to the program in its
most recent year, are the effects of program enrollment on year-
end measures of academic achievement, EFs, and STM contingent
upon levels of performance on these measures prior to program
entry? We made the hypothesis that these conditional effects
would be observed, but made no statement about the precise nature
of these effects.
Method
Participants
The sample comprised 265 children attending a single parochial
school located in an economically disadvantaged neighborhood of
a large Northeastern city. Students were enrolled in Grades 1
through 8, with a mean age of 10.2 years (SD ⫽2.15 years) when
the first round of tests of EFs and STM were administered in the
fall of 2012. According to school administrative records, none of
the students in the sample were classified as special education or
limited English proficiency. The sample was majority African
American (85.7%), but included students of Hispanic/Latino
(9.1%), Asian (2.6%) and European (2.6%) descent. The parents or
guardians of all students elected for their children to participate in
an intensive afterschool music-education program (see the Proce-
dures section for a description of the program), but because of
constraints of space and resources, 135 students (50.9% of the
sample) were selected at random to participate in the program via
a lottery; these students were designated as the program group,
although those not selected comprised the control group. Selection
occurred at the beginning of the 2010, 2011, and 2012 academic
years. In the fall of 2012 when data collection began, 50 (37.0% of
the program group) students were entering their third year in the
program, whereas 32 (23.7%) and 53 (39.3%) students were en-
tering their second or first years, respectively. All students entering
their third year in the program had been enrolled in 2010 –2011
and 2011–2012 academic years. All students entering their second
year had been enrolled in 2011–2012, with one exception: Because
of extenuating familial circumstances, one student had been en-
rolled in 2010 –2011, was not enrolled in 2011–2012, and returned
to the program in 2012–2013.
Table 1 reports the distribution of students by gender, ethnicity,
age, and grade; no differences were observed between groups for
any of these factors. Within each grade, students were divided into
two classrooms, A and B. The distribution of students by class-
room within grade did not differ by group (see Supplemental Table
S1). The distribution of students by years enrolled in the program
did not differ by gender or ethnicity, but did differ by age, F(2,
135) ⫽36.2, p⬍.001 and grade,
2
(14) ⫽105.7, p⬍.001, such
that students enrolled in the program for longer periods of time
were significantly older than those in the previous year (see also
Supplemental Table S2).
Procedure
Students in the program group received an intensive course of
music education during the 2010 –2011, 2011–2012, and/or 2012–
2013 academic years inspired by El Sistema, a program of orches-
tral music instruction that originated in Venezuela (see the online
supplementary material for additional details). Each year the pro-
gram ran for 39 weeks, beginning in mid-September 2012 and
concluding in the first week of June 2013. Students enrolled in the
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
151
MUSIC, ACADEMICS, AND EXECUTIVE FUNCTIONS
program met for approximately two hours every day school was in
session. Each meeting featured 40 min of instruction on an orches-
tral instrument in a small-group setting and 40 min’ rehearsal in an
ensemble comprised of players of similar levels of experience and
skill. In between small-group and large-ensemble activities, stu-
dents took a snack break lasting approximately 20 min (the re-
maining 20 min were required for students to travel between
activities). Of the 135 students participating in the program during
the 2012–2013 academic year, 14 (10.4% of those initially en-
rolled) left the program during the school year due to lack of
interest or scheduling conflicts. Given that (a) no student partici-
pated in the program for fewer than 10 weeks; (b) students who
ultimately left the program remained in it for an average of 23
weeks (SD ⫽6.87 weeks); and (c) the minimum program “dosage”
required to yield observable effects is unknown, students who left
the program were considered part of the program group for pur-
poses of analysis. This intent-to-treat approach yields the most
conservative estimates of program effects.
All students in Grades 2 through 8 (N⫽234) in both the
program and control groups were scheduled for EFs and STM
testing during one week in the fall of 2012 and 1 week in the spring
of 2013. Students in first grade were not assessed, as pilot testing
had revealed that they struggled with the instructions for some of
the measures of EFs and STM. Fall 2012 testing occurred 2 weeks
after the program began, whereas Spring 2013 testing occurred 1
week before the program ended (see Figure 1 for a summary of
when measures were administered). In both the fall and spring 18
students missed their scheduled testing date due to absence
(though only one student missed both dates), bringing the maxi-
mum number of students completing any EFs or memory task to
216. These students were not rescheduled, given that this would
have required waiting multiple weeks to accommodate the school
schedule, resulting in unacceptable variability in the time of testing
relative to the initiation and conclusion of the program. Absence
on the date of scheduled testing for both Fall 2012 and Spring 2013
was coded as a binary variable (1 ⫽absence) and regressed on
gender, ethnicity, age, grade, classroom, and group (program vs.
control). Absence for both the fall and spring was unrelated to each
of these factors.
On both the fall and spring testing days, students were escorted
from their classroom to the school’s computer laboratory by a
member of the program staff. All students were familiar with the
lab and its computers, as students use the lab regularly during the
school year. After students were seated a researcher explained that
they would be playing a series of computer games. Students were
told that some of the games might seem like tests, but that they
Table 1
Distribution of Program- and Control-Group Students by Gender, Ethnicity, Age, and Grade
Variable
Overall (N⫽265) Program group (n⫽135) Control group (n⫽130) Comparison
n(%) M(SD)n(%) M(SD)n(%) M(SD)
2
(df)t(df)
Gender
Female 154 (58.1) 82 (60.7) 72 (55.4) .781 (1)
Male 111 (41.9) 53 (39.3) 58 (44.6)
Ethnicity
Asian American 7 (2.6) 3 (2.2) 4 (3.1) 1.61 (3)
African American 227 (85.7) 116 (85.9) 111 (85.4)
Hispanic/Latino 24 (9.1) 11 (8.1) 13 (10.0)
European American 7 (2.6) 5 (3.7) 2 (1.5)
Age 10.2 (2.15) 10.2 (2.09) 10.2 (2.22) ⫺.175 (263)
Grade
1 31 (11.7) 15 (11.1) 16 (12.3) 2.86 (7)
2 13 (4.9) 7 (5.2) 6 (4.6)
3 40 (15.1) 17 (12.6) 23 (17.7)
4 57 (21.5) 32 (23.7) 25 (19.2)
5 18 (6.8) 8 (5.9) 10 (7.7)
6 31 (11.7) 18 (13.3) 13 (10.0)
7 38 (14.3) 20 (14.8) 18 (13.8)
8 25 (9.4) 12 (8.9) 13 (10.0)
Note. For the program- and control-group columns, percentages are reported with respect to the number of students in each group.
Program Begins Program Ends
EF measures, pre-
Grades, T1
Test Scores
EF measures, post-
Grades, T3
mid-Sep., 2012
8 wks.
mid-Nov., 2012
mid-Mar., 2013
10 wks.
early Jun., 2013
Figure 1. Schematic summarizing the administration of measures relative to the beginning and end of the
program. Note that preprogram EF (executive functions) measures were administered 2 weeks after the program
began, whereas postprogram EF measures were administered 1 week before the program ended. T1 ⫽first
trimester report card, T3 ⫽third trimester report card.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
152 HOLOCHWOST ET AL.
were not like the tests students take in school: no one except the
researcher would know how the student did, and he would not tell
anyone. After answering any questions students had, the researcher
proceeded to guide students through nine computerized tasks of
EFs and STM (described below), which required approximately 50
min to complete. After students completed the tasks, the researcher
thanked them and presented them with a small prize (crayons and
stickers).
Measures
Academic achievement. Students in Grades 2 through 8 were
assigned year-end and trimester grades in English Language Arts
(ELA) and math by their primary teachers. For students in Grades
4 through 8, ELA and math grades were assigned on a numeric
scale ranging from 60 to 99; for students in Grades 2 and 3,
academic grades were assigned on a 6-point scale ranging from 0
(unsatisfactory)to5(outstanding). All students in Grades 1
through 7 took a series of standardized tests in March of 2013,
approximately 10 weeks prior to the conclusion of the program
(the TerraNova, 3rd ed.; McGraw-Hill, 2008). Students in Grades
2 through 7 took reading, math, and language arts tests, while
students in Grade 1 took only the reading and math tests. Scores
were reported on a common scale ranging from 450 to 850. A
composite measure of standardized test performance was calcu-
lated as the mean of the scores on the reading, math, and language
arts tests for students in Grades 2 through 7 (␣⫽.86) and the mean
of reading and math tests for students in Grade 1 (␣⫽.79).
EFs. The computerized tasks of EFs and STM were taken
from the Psychology Experiment Building Language, version 12
(Mueller & Piper, 2014). Three of these tasks focused on STM,
and are described below. Of the six EFs tasks, four focused on
inhibitory control or flexible attention while the remaining two
tasks—the Tower of London (Shallice, 1982) and the Wisconsin
Card Sorting Task (Berg, 1948)—were more complex tasks in-
tended to draw upon executive capacities across multiple domains.
Tasks were administered in the order in which they are listed in
Table 2. Inhibitory control and flexible attention were assessed
using a go/no-go task (Bezdjian, Baker, Lozano, & Raine, 2009),
a color-word Stroop task (Troyer, Leach, & Strauss, 2006), a
flanker task (Erickson & Schultz, 1979) and the trail-making task
as implemented by Atkinson and Ryan (2007). Students also
completed a computerized version of the Tower of London task
used by Phillips and his colleagues (Phillips, Wynn, Gilhooly,
Della Sala, & Logie, 1999) and a version of the Wisconsin Card
Sorting Task developed for the Psychology Experiment Building
Language (see Fox, Mueller, Gray, Raber, & Piper, 2013). For
additional details about the tasks, see the online supplementary
materials.
To minimize shared variance due to task, we sought to select a
single indicator for each measure of EFs based on the indicators
chosen for the same tasks in previous studies. However, for the
flanker, go/no-go, and Stroop tasks both the proportion of trials
correct and a score based on RT were calculated. Although less
parsimonious than a single indicator of performance, calculating
both these metrics provides information about the accuracy and
efficiency of students’ responses, both of which are relevant given
that students are neither young children (when accuracy indicators
display sufficient variability to be a meaningful measure of per-
formance; cf. Willoughby et al., 2012) nor adults (when RT
indicators are preferred; cf. Bialystok & DePape, 2009). For the
flanker task, a difference score was calculated as the median RT to
correct incongruent trials (in milliseconds) minus the median re-
action to correct congruent trials, whereas for the go/no-go task the
difference score was calculated as the median RT to correct no-go
trials (in which the child responded to a stimulus that appeared
rarely) minus the mean RT to correct go trials (in which the child
responded to a stimulus that appeared frequently).
A difference score was also calculated for the trail-making task
as the median time required to complete the trials with interference
minus the median time required to complete trials without inter-
ference (Lehto, Juujärvi, Kooistra, & Pulkkinen, 2003). The pro-
portion of trials correct was not calculated as this score would be
directly analogous to the time required to complete the task (the
task will not progress until the child correctly connects one dot to
the next). Although it was possible to calculate a difference score
for the Stroop task, the order in which stimuli were presented—a
block of shapes, a block of color-neutral words (e.g., “went”),
followed by a block of color-words (e.g., “red” printed in blue
ink)—meant that the block of trials in which the prepotent ten-
dency to name the color-word rather than the color was also the
block in which students were most practiced in providing re-
sponses. This conflated practice and Stroop-effects, recommend-
ing against calculating a difference score across blocks (for addi-
tional discussion of this issue, see the online supplementary
materials). Instead, the median RT for correct responses for the
block of trials where the Stroop effect was in place was used as the
measure of efficiency (after Huizinga, Dolan, & van der Molen,
2006). For the Tower of London task, the dependent variable was
the proportion of trials completed in the minimum number of
moves (cf., Huizinga et al., 2006;Lehto et al., 2003), while for the
card-sorting task, it was the proportion of trials in which a perse-
verative error was made (after Huizinga et al., 2006;Welsh,
Pennington, & Groisser, 1991).
STM. Three tasks of visual STM were administered to stu-
dents: the digit-forward condition of the digit span task (McCarthy,
1972), the Corsi block task (Corsi, 1973), and a pictorial span task.
Note that the digit forward-span task was presented visually, with
numbers appearing on a computer screen in sequence, rather than
verbally. For each task, the dependent variable was the mean
number of items (digits, blocks, or picture-word pairs) correctly
recalled across trials. This corresponds most closely to the point
system used by Brocki and Bohlin (2004) as their indicator of
performance on the digit span task.
Demographics. Data regarding students’ gender, ethnicity,
date of birth, grade, and classroom were provided by the school.
Each student’s age at the time of testing was calculated using the
date of testing and their date of birth.
Data Analysis
Our first research question asked whether there was there any
effect of the program on students’ academic achievement, EFs, or
STM. To address this question, year-end academic achievement
and Spring 2013 performance on each computerized task was
modeled as a function of program enrollment for all children in the
sample using the following equation, in which the -ith child was
nested within the -jth classroom:
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
153
MUSIC, ACADEMICS, AND EXECUTIVE FUNCTIONS
Table 2
Bivariate Correlations and Descriptive Statistics for Year-End Measures of Academic Achievement and Executive Functions
Measure 1234567891011121314151617181920
1. ELA grades (2 & 3) —
2. ELA grades (4 to 8) — —
3. Math grades (2 & 3) .80
ⴱⴱ
——
4. Math grades (4 to 8) — .80
ⴱⴱ
——
5. Reading scores .60
ⴱⴱ
.34
ⴱⴱ
.51
ⴱⴱ
.48
ⴱⴱ
—
6. Math scores .58
ⴱⴱ
.46
ⴱⴱ
.56
ⴱⴱ
.67
ⴱⴱ
.81
ⴱⴱ
—
7. Language arts scores .77
ⴱⴱ
.37
ⴱⴱ
.68
ⴱⴱ
.46
ⴱⴱ
.77
ⴱⴱ
.65
ⴱⴱ
—
8. Digit span .30
ⴱ
.19
ⴱ
.20 .31
ⴱⴱ
.45
ⴱⴱ
.52
ⴱⴱ
.41
ⴱⴱ
—
9. Flanker, correct .39
ⴱⴱ
.25
ⴱⴱ
.45
ⴱⴱ
.26
ⴱⴱ
.40
ⴱⴱ
.37
ⴱⴱ
.42
ⴱⴱ
.38
ⴱⴱ
—
10. Flanker, diff .13 ⫺.05 .28 ⫺.06 .06 ⫺.03 .02 .03 ⫺.16
ⴱ
—
11. Go/no-go, correct .42
ⴱⴱ
.03 .50
ⴱⴱ
.11 .37
ⴱⴱ
.31
ⴱⴱ
.29
ⴱⴱ
.35
ⴱⴱ
.42
ⴱⴱ
⫺.12 —
12. Go/no-go, diff .14 ⫺.05 .17 ⫺.15 ⫺.17
ⴱ
⫺.18
ⴱ
⫺.12 ⫺.12 .03 ⫺.14 ⫺.19
ⴱⴱ
—
13. Corsi .40
ⴱⴱ
.22
ⴱⴱ
.27 .34
ⴱⴱ
.38
ⴱⴱ
.27
ⴱⴱ
.35
ⴱⴱ
.43
ⴱⴱ
.30
ⴱⴱ
.04 .25
ⴱⴱ
.03 —
14. Memory Span .48
ⴱⴱ
.26
ⴱⴱ
.49
ⴱⴱ
.31
ⴱⴱ
.37
ⴱⴱ
.41
ⴱⴱ
.33
ⴱⴱ
.35
ⴱⴱ
.34
ⴱⴱ
⫺.09 .35
ⴱⴱ
⫺.11 .32
ⴱⴱ
—
15. Stroop, correct .18 .01 .18 .04 ⫺.01 ⫺.03 0 .15
ⴱ
.02 ⫺.06 .04 .01 .01 .06 —
16. Stroop, RT ⫺.09 ⫺.18
ⴱ
⫺.09 ⫺.25
ⴱⴱ
⫺.33
ⴱⴱ
⫺.38
ⴱⴱ
⫺.21
ⴱⴱ
⫺.29
ⴱⴱ
⫺.28
ⴱⴱ
⫺.02 ⫺.13 .02 ⫺.29
ⴱⴱ
⫺.27
ⴱⴱ
.27
ⴱⴱ
—
17. Trail-making .02 ⫺.16
ⴱ
.02 ⫺.24
ⴱⴱ
⫺.16
ⴱ
⫺.26
ⴱⴱ
⫺.18
ⴱ
⫺.32
ⴱⴱ
⫺.29
ⴱⴱ
⫺.02 ⫺.16
ⴱ
⫺.10 ⫺.32
ⴱⴱ
⫺.19
ⴱⴱ
.13 .30
ⴱⴱ
—
18. Tower of London .18 .18
ⴱ
.13 .25
ⴱⴱ
.15
ⴱ
.23
ⴱⴱ
.25
ⴱⴱ
.15
ⴱ
.09 ⫺.14
ⴱ
.17
ⴱ
⫺.07 ⫺.22
ⴱⴱ
⫺.23
ⴱⴱ
⫺.08 ⫺.19
ⴱⴱ
⫺.15
ⴱ
—
19. Card-sorting ⫺.21 ⫺.09 ⫺.32
ⴱ
⫺.12 ⫺.23
ⴱ
⫺.32
ⴱⴱ
⫺.28
ⴱⴱ
⫺.21
ⴱ
⫺.30
ⴱⴱ
.09 ⫺.18
ⴱⴱ
.06 .22
ⴱⴱ
.19
ⴱⴱ
.05 .20
ⴱⴱ
.07 ⫺.18
ⴱⴱ
—
20. Age ⫺.20 ⫺.15 ⫺.12 ⫺.02 .68
ⴱⴱ
.61
ⴱⴱ
.46
ⴱⴱ
.36
ⴱⴱ
.29
ⴱⴱ
0 .22
ⴱⴱ
⫺.17 .32
ⴱⴱ
.09 ⫺.05 ⫺.23
ⴱⴱ
⫺.23
ⴱⴱ
.11 ⫺.20
ⴱⴱ
—
N52 176 52 176 232 232 188 216 205 204 209 201 211 213 213 204 205 215 215 265
M3.62 85.6 3.71 82.9 641.9 623.1 653.5 4.71 .808 38.4 .920 112.0 4.90 5.33 .857 1,246 10,105 .470 .160 10.2
SD 1.19 9.31 1.45 9.60 40.3 53.4 35.9 .801 .161 37.4 .095 66.0 1.40 1.34 .132 525 9,420 .188 .088 2.15
Note. For pairwise correlations (excepting those involving English Language Arts [ELA] or math grades for subset of participants), N⫽[169, 215]. Correct refers to proportion of trials correct.
ⴱ
p⬍.05.
ⴱⴱ
p⬍.01.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
154 HOLOCHWOST ET AL.
year-end or spring performanceij ⫽intercept ⫹genderij ⫹gradeij
⫹programij ⫹errorij (1)
Program was a four-level categorical variable corresponding to
the number of years a student had been enrolled in the program.
The specific contrast between students who were not in the pro-
gram (i.e., those for whom program ⫽0) and those who had been
in the program for 1, 2, or 3 years (program ⫽[1, 2, 3]) was used
to test the hypothesis that students in the program for any length of
time would exhibit higher levels of academic achievement, EFs, or
STM than their peers who were not in the program.
Our second research question examined whether there were
dosage effects for the program by comparing each successive
increment in enrollment: 1 versus 0 years, 2 versus 1 year, and so
on. Omnibus tests (i.e., a type III test of fixed effects) of the
variable program indicated whether significant differences are
observed between any two levels of the variable, while a series of
Tukey-adjusted contrasts were used to test whether there were
significant differences in the mean estimates for any two levels of
program.
Our third and final research question asked whether the effects
of enrollment on year-end measures of academic achievement,
EFs, and STM would be contingent upon levels of performance on
these measures prior to program entry. To test the hypothesis that
these conditional effects would be observed, the equation used to
test the first hypothesis was modified to include the variable fall
performance, which referred to first-trimester grades or perfor-
mance on the measures of EFs and STM in the fall 2012 admin-
istration of the tasks, enrollment, which referred to enrollment in
the program during the 2012–2013 academic year, and the inter-
action between these variables. The modified equation was thus:
third-trimester or spring performanceij ⫽intercept⫹genderij
⫹gradeij ⫹fall performanceij ⫹enrollmentij
⫹(fall performanceij X enrollmentij)⫹errorij (2)
First-trimester or Fall 2012 assessments corresponded to pre-
program measures only for the subset of participants who were
new to the program in the 2012–2013 academic year, and therefore
these models were tested using only these participants and con-
trols.
All models were estimated in SAS using the PROC MIXED
procedure with full information maximum likelihood (Alison,
2009), given that all data on the dependent variables were missing
at random, such that missingness of data on each measure was not
related to gender, ethnicity, age, grade, classroom, program enroll-
ment, and, in the case of the EFs and STM tasks, fall 2012
performance. Degrees of freedom were calculated using Kenward
and Roger’s method, given that this method is recommended for
small samples with missing data (Littell, Stroup, & Freund, 2002).
Gender was included in all models as a covariate given that
preliminary analyses indicated that it was related to multiple
dependent variables (see below). All models specified that children
were nested within classroom because a series of random-effects
analyses of variance indicated that significant portions of the
variance in standardized test scores and multiple tasks of EFs and
STM were attributable to students’ classroom (see Supplemental
Table S3).
Results
Preliminary Analyses
Table 2 reports bivariate correlations and descriptive statistics
for year-end measures of academic achievement and Spring 2013
tasks of EFs and STM for the full sample. Note that these statistics
are disaggregated by age-group for ELA and math grades, given
that the scale on which grades are assigned changes in fourth
grade, and that these figures exclude students with missing data
due to outliers or other factors (see the online supplementary
material). Two points are notable about the pattern of correlations
presented in Table 2: First, standardized test scores in reading,
math, and language arts were intercorrelated, r(188, 232) ⫽[.65,
.81], p⬍.01, supporting the calculation of the composite test
score. Second, although age was not associated with grades, it was
associated with standardized test scores, r(188, 232) ⫽[.46, .68],
p⬍.01, and multiple measures of EFs and STM: the digit span
task, the proportion of trials correct on the flanker and go/no-go
tasks, the Corsi task, RT on the Stroop task, trail-making task
difference scores, and the card-sorting task, r(210, 215) ⫽[|.20,
.49|], p⫽[.01, .05].
Table 3 reports descriptives and bivariate correlations among
tasks of EFs and STM for the subset of students who were new to
the program in the 2012–2013 academic year, as well as the
students in the control group. Fall 2012 and Spring 2013 perfor-
mance on each measure of EFs and STM was correlated, r(111,
135) ⫽[.33, .76], p⬍.01, with the exceptions of the difference
score for the flanker, the Tower of London and the card-sorting
task. ELA and math grades were also correlated from the first to
the third trimester (not included in Table 3) among both younger
(Grades 2 and 3) ELA: r(52) ⫽.87, p⬍.01, math: r(52) ⫽.72,
p⬍.01, and older students (Grades 4 – 8), ELA: r(173) ⫽.77, p⬍
.01, math: r(173) ⫽.71, p⬍.01.
Exploratory analyses revealed that girls answered a higher pro-
portion of incongruent trials correctly in the Spring 2013 admin-
istration of the flanker task at a rate approaching significance,
t(203) ⫽1.82, p⫽.070, earned significantly higher year-end ELA
grades, t(226) ⫽2.00, p⫽.047, and higher scores on standardized
tests of both reading, t(230) ⫽2.37, p⫽.019, and language arts,
t(186) ⫽3.22, p⫽.001. There were no significant differences in
performance on any measures of academic achievement or any
task of EFs as a function of ethnicity.
Was There Any Effect of the Program?
To examine whether there was any effect of the program on
students’ academic achievement, EFs or STM, year-end academic
achievement and performance on the Spring 2013 administration
of each computerized task was modeled as a function of years in
the program using Equation 1.Table 4 presents estimated means
and effects for these models (parameter estimates are available in
the supplementary material; see Supplemental Tables S4 and S5).
Enrollment in the program was associated with composite stan-
dardized test scores that were 10.7 (SE ⫽3.90) points higher, on
average, than those obtained by students who were not enrolled,
t(217) ⫽2.74, p⫽.007. The corresponding effect size of 0.24 was
calculated by dividing the effect estimate by the standard deviation
for the composite score. Older students (those in Grades 4 through
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
155
MUSIC, ACADEMICS, AND EXECUTIVE FUNCTIONS
Table 3
Bivariate Correlations and Descriptive Statistics for Fall 2012 (F) and Spring 2013 (S) Measures of Executive Functions: New Students and Controls
Measure 12345678910111213141516171819202122232425
1. Digit span (F) —
2. Digit span (S) .53
ⴱⴱ
—
3. Flanker, correct (F) .31
ⴱⴱ
.32
ⴱⴱ
—
4. Flanker, diff (F) 0 ⫺.11 .03 —
5. Flanker, correct (S) .23
ⴱ
.41
ⴱⴱ
.56
ⴱⴱ
.04 —
6. Flanker, diff (S) .15 .01 ⫺.03 .16 ⫺.10 —
7. Go/no-go, correct (F) .16 .44
ⴱⴱ
.33
ⴱⴱ
.16 .23
ⴱ
.05 —
8. Go/no-go, diff (F) ⫺.11 ⫺.14 ⫺.16 ⫺.19 ⫺.02 ⫺.19
ⴱ
⫺.15 —
9. Go/no-go, correct (S) 0 .37
ⴱⴱ
.24
ⴱⴱ
⫺.03 .43
ⴱⴱ
⫺.11 .35
ⴱⴱ
.01 —
10. Go/no-go, diff (S) ⫺.15 ⫺.11 .05 ⫺.13 .06 ⫺.06 ⫺.11 .25
ⴱⴱ
⫺.21
ⴱ
—
11. Corsi (F) .26
ⴱⴱ
.20
ⴱⴱ
.23
ⴱⴱ
.03 .30
ⴱⴱ
.09 .17 ⫺.13 .20
ⴱ
.10 —
12. Corsi (S) .21
ⴱⴱ
.42
ⴱⴱ
.24
ⴱⴱ
⫺.07 .29
ⴱⴱ
.11 .28
ⴱⴱ
⫺.28
ⴱⴱ
.23
ⴱⴱ
.08 .38
ⴱⴱ
—
13. Memory span (F) .44
ⴱⴱ
.45
ⴱⴱ
.41
ⴱⴱ
.04 .41
ⴱⴱ
0 .33
ⴱⴱ
⫺.05 .31
ⴱⴱ
⫺.01 .46
ⴱⴱ
.27
ⴱⴱ
—
14. Memory span (S) .21
ⴱⴱ
.27
ⴱⴱ
.28
ⴱⴱ
⫺.13 .35
ⴱⴱ
⫺.07 .10 .06 .29
ⴱⴱ
⫺.03 .16 .22
ⴱ
.42
ⴱⴱ
—
15. Stroop, correct (F) .03 .01 .11 .01 .22
ⴱ
.06 .02 ⫺.17 .29
ⴱⴱ
.11 .17 .05 .17
ⴱ
.11 —
16. Stroop, RT (F) ⫺.15 ⫺.27
ⴱⴱ
⫺.21
ⴱ
⫺.22
ⴱⴱ
⫺.17 ⫺.03 ⫺.16 .02 .05 .04 ⫺.20
ⴱ
⫺.11 ⫺.29
ⴱⴱ
⫺.10 .19
ⴱ
—
17. Stroop, correct (S) ⫺.07 .16 .05 .05 .08 ⫺.12 .08 ⫺.13 0 .08 .05 .07 ⫺.03 0 .37
ⴱⴱ
.12 —
18. Stroop, RT (S) ⫺.11 ⫺.23
ⴱⴱ
⫺.27
ⴱⴱ
⫺.22
ⴱ
⫺.22
ⴱⴱ
⫺.06 .01 ⫺.03 ⫺.04 .06 ⫺.25
ⴱⴱ
⫺.19
ⴱ
⫺.25
ⴱⴱ
⫺.27
ⴱⴱ
.08 .45
ⴱⴱ
.27
ⴱⴱ
—
19. Trail-making (F) ⫺.29
ⴱⴱ
⫺.16 ⫺.29
ⴱⴱ
⫺.31
ⴱⴱ
⫺.26
ⴱⴱ
⫺.04 ⫺.13 ⫺.04 ⫺.19
ⴱ
⫺.06 ⫺.23
ⴱⴱ
⫺.18
ⴱ
⫺.44
ⴱⴱ
⫺.32
ⴱⴱ
⫺.04 .17
ⴱ
.20
ⴱ
.32
ⴱⴱ
—
20. Trail-making (S) ⫺.29
ⴱⴱ
⫺.29
ⴱⴱ
⫺.25
ⴱⴱ
⫺.12 ⫺.25
ⴱⴱ
.07 ⫺.09 ⫺.03 ⫺.18
ⴱ
⫺.15 ⫺.28
ⴱⴱ
⫺.34
ⴱⴱ
⫺.43
ⴱⴱ
⫺.21
ⴱ
⫺.01 .28
ⴱⴱ
.11 .23
ⴱⴱ
.46
ⴱⴱ
—
21. Tower of London (F) .10 .17 .10 .11 .19
ⴱ
⫺.06 .23
ⴱⴱ
⫺.15 .13 .01 .13 .02 .25
ⴱⴱ
.22
ⴱ
.01 ⫺.07 .05 ⫺.01 ⫺.15 ⫺.02 —
22. Tower of London (S) .18
ⴱ
.24
ⴱⴱ
.12 .05 .13 ⫺.14 .06 ⫺.07 .21
ⴱ
⫺.09 .17 .21
ⴱ
.28
ⴱⴱ
.21
ⴱ
.08 ⫺.17 ⫺.14 ⫺.25
ⴱⴱ
⫺.22
ⴱⴱ
⫺.23
ⴱⴱ
.16 —
23. Card-sorting (F) ⫺.05 ⫺.14 ⫺.11 ⫺.04 .04 ⫺.01 0 ⫺.06 .15 ⫺.06 ⫺.09 ⫺.13 ⫺.01 .19 .04 .04 .13 .05 .10 .21
ⴱ
⫺.01 ⫺.07 —
24. Card-sorting (S) ⫺.07 ⫺.20
ⴱ
⫺.15 ⫺.12 ⫺.23
ⴱⴱ
.01 ⫺.07 .03 ⫺.13 .14 ⫺.08 ⫺.07 ⫺.13 ⫺.12 ⫺.01 ⫺.09 .03 .09 .11 ⫺.08 ⫺.12 ⫺.19 .02 —
25. Age (F) .43
ⴱⴱ
.31
ⴱⴱ
.22
ⴱ
.05 .22
ⴱⴱ
.04 .21
ⴱⴱ
⫺.26
ⴱⴱ
.22
ⴱ
⫺.07 .40
ⴱⴱ
.31
ⴱⴱ
.31
ⴱⴱ
⫺.01 ⫺.17
ⴱ
⫺.17
ⴱ
⫺.03 ⫺.17
ⴱ
⫺.20
ⴱ
⫺.15 ⫺.10 .24
ⴱⴱ
⫺.01 ⫺.12 —
N137 135 133 122 127 127 130 127 131 129 138 133 138 135 138 137 134 127 137 134 138 135 123 135 168
M4.38 4.61 .659 39.1 .789 42.2 .863 119.2 .907 108.6 4.86 5.17 4.48 4.74 .865 1769 .862 1356 17,046 10,826 .348 .476 .231 .174 10.0
SD .571 .784 .197 59.8 .169 39.7 .151 78.6 .108 67.9 1.46 1.38 1.34 1.34 .135 709.8 .131 542.6 11,676 9275 .170 .192 .092 .094 2.02
Note. For pairwise correlations, N⫽[111, 138]; F ⫽fall 2012 administration of tasks; S ⫽spring 2013 administration of tasks; diff ⫽difference score.
ⴱ
p⬍.05.
ⴱⴱ
p⬍.01.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
156 HOLOCHWOST ET AL.
8) enrolled in the program also earned year-end ELA grades that
were 2.45 points higher (SE ⫽.685), t(163) ⫽3.58, p⬍.001; d⫽
0.30, and year-end math grades that were 3.90 points higher (SE ⫽
1.52), t(163) ⫽2.56, p⫽.011; d⫽0.42), on average, than their
peers, differences that correspond to approximately one-quarter
and one third of a letter grade (e.g., B vs. A), respectively.
Enrolled students were more accurate than their peers on the
flanker (Est. ⫽.056; SE ⫽.021), t⫽(194) ⫽2.63, p⫽.009; d⫽
0.35, and go/no-go tasks (Est. ⫽.038; SE ⫽.013), t⫽(198) ⫽2.95,
p⫽.004; d⫽0.40, but were also more efficient on both the flanker
and Stroop tasks. Students who were enrolled in the program exhib-
ited shorter difference scores on the flanker (Est. ⫽⫺19.7; SE ⫽
5.27), t⫽(192) ⫽⫺3.73, p⬍.001; d⫽0.50, and faster RTs on the
Stroop task (Est. ⫽⫺309.7; SE ⫽68.4), t⫽(193) ⫽⫺4.53, p⬍
.0001; d⫽0.57. Enrolled students also made fewer perseverative
errors on the card-sorting task (Est. ⫽⫺.012, SE ⫽.005),
t(206) ⫽⫺2.41, p⫽.017; d⫽0.18, and exhibited better perfor-
mance on the pictorial span task of visual STM (Est. ⫽.201, SE ⫽
8.13), t(204) ⫽2.56, p⫽.011; d⫽0.25.
Were There Dosage Effects for the Program?
The effect estimates included in Table 4 indicate that there were
significant differences in outcomes by level of program enrollment.
The omnibus test for the variable program was significant for the
composite standardized test scores, F(3, 217) ⫽2.72, p⫽.045.
Among students in Grades 4 through 8, effects were also observed for
year-end grades in English language arts, F(3, 165) ⫽3.20, p⫽.025
and math, F(3, 165) ⫽3.96, p⫽.009, though this effect was not
found among students in Grades 2 and 3. The Tukey-adjusted tests of
the differences among the model-implied means estimates indicate
that students in the program for three years earned significantly higher
standardized test scores and both ELA and math grades than their
peers who had not been in the program (see Figure 2a).
Significant effects were also observed for the proportion of trials
correct on the flanker, F(3, 194) ⫽2.75, p⫽.044 and go/no-go tasks,
F(3, 198) ⫽2.96, p⫽.034, the flanker difference scores, F(3, 192) ⫽
5.04, p⫽.002, and the Stroop, F(3, 193) ⫽7.66, p⬍.001, and
card-sorting tasks, F(3, 203) ⫽2.66, p⫽.049. For the proportion of
flanker trials correct and card-sorting tasks, the pattern of differences
among the means estimates was the same as that observed for the
standardized test scores: students in the program for three years
performed significantly better on the tasks than their peers who had
not been in the program. For the Stroop task, the Tukey-adjusted
contrasts revealed that students in the program for either two or three
years performed significantly better than students who had not been in
the program (see Figure 2b), while these contrasts indicated the
flanker difference scores were significantly shorter only among stu-
dents who had been in the program for one year.
The omnibus test for program was also significant for the
pictorial span measure of visual STM, F(3, 202) ⫽2.80, p⫽.041.
The pattern of differences among the means estimates corre-
sponded to that observed for the both the standardized test scores
and the flanker and card-sorting tasks. Compared to their peers
who had not been enrolled in the program, students enrolled for 3
years performed significantly better on the pictorial span measure.
Were the Effects of the Program Contingent Upon
Levels of Preprogram Performance?
For students new to the program in the 2012–2013 school year,
first trimester grades and fall 2012 tests of EFs and STM were
truly preprogram measures. Therefore we used this subset of
participants to test our hypothesis that effects of enrollment would
Table 4
Means and Effect Estimates for Mixed Models of Year-End Measures of Academic Achievement and Executive Functions
Measure
Model-implied mean estimates Effect estimates
0 years 1 year 2 years 3 years Any effect Dosage effects
Est. (SE) Est. (SE) Est. (SE) Est. (SE)t(df)pF(df)p
Test scores 625.6 (3.37) 632.4 (5.07) 637.4 (5.64) 639.1 (5.19)
a
2.74 (217) .007 2.72 (3, 217) .045
ELA grades (2 & 3) 3.60 (.250) 3.23 (.603) 3.95 (.319) 3.74 (.522) .87 (46) .389 .47 (3, 46) .702
ELA grades (4 to 8) 84.0 (1.01) 86.3 (1.79) 85.1 (2.15) 87.9 (1.34)
a
3.58 (163) ⬍.001 3.20 (3, 165) .025
Math grades (2 & 3) 4.03 (.276) 3.99 (.948) 3.80 (.490) 4.08 (.600) 1.81 (46) .109 1.45 (3, 46) .347
Math grades (4 to 8) 81.1 (1.00) 84.3 (1.91) 85.6 (2.31) 85.0 (1.38)
a
2.56 (163) .011 3.96 (3, 165) .009
Digit span 4.57 (.076) 4.72 (.152) 4.70 (.139) 4.82 (.114) ⫺.15 (205) .882 1.41 (3, 205) .240
Flanker, correct 76.9 (1.58) 83.8 (3.18) 80.3 (2.84) 83.5 (1.17)
a
2.63 (194) .009 2.75 (3, 194) .044
Flanker, difference 46.6 (3.89) 17.9 (7.96)
c
28.6 (7.11) 34.3 (5.76) ⫺3.73 (192) ⬍.001 5.04 (3, 192) .002
Go/no-go, correct 90.3 (.94) 94.7 (1.92) 93.7 (1.72) 93.8 (1.42) 2.95 (198) .004 2.96 (3, 198) .034
Go/no-go, difference 105.5 (6.81) 85.8 (13.4) 127.8 (13.0) 113.1 (10.3) 3.37 (190) .718 1.78 (3, 190) .152
Corsi 5.15 (.131) 5.25 (.263) 5.62 (.242) 5.38 (.195) .47 (202) .640 1.16 (3, 202) .325
Memory span 4.68 (.143) 4.81 (.284) 4.92 (.263) 5.34 (.217)
a
2.56 (204) .011 2.80 (3, 202) .041
Stroop, correct 86.1 (1.37) 89.4 (2.72) 83.4 (2.47) 86.3 (2.07) .11 (202) .911 .84 (3, 202) .473
Stroop, RT 1490.8 (52.1) 1264.8 (100.4) 1100.9 (91.9)
b
1177.6 (75.8)
a
⫺4.53 (193) ⬍.001 7.66 (3, 193) ⬍.0001
Trail-making 11,333 (983.1) 14,004 (1938.3) 11,803 (1739.8) 8935.3 (1420.7) .19 (194) .849 1.67 (3, 194) .174
Tower of London .474 (.021) .490 (.039) .481 (.036) .441 (.030) ⫺.61 (202) .544 .50 (3, 202) .683
Card-sorting .171 (.008) .178 (.018) .155 (.016) .133 (.013)
a
⫺2.41 (206) .017 2.66 (3, 203) .049
Note. Bold-face type indicate that the estimate for the variable program was significant. All mixed model estimates for program used no years in the
program as the reference group. Results for tests of any effects report the specific contrast between students in the program for any number of years and
those not in the program; dosage effects report the results of Type III test of fixed effects for the variable program.
a
The Tukey-adjusted comparison between the model-implied estimates for 0 and 3 years in the program was significant.
b
The comparison between 0
and 2 years in the program was significant.
c
The comparison between 0 and 1 years in the program was significant.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
157
MUSIC, ACADEMICS, AND EXECUTIVE FUNCTIONS
be contingent upon preprogram levels of performance on academic
achievement, EFs, and STM. A main effect of enrollment was
found for difference scores on the flanker task, t(97.6) ⫽⫺2.93,
p⫽.004, such that enrolled students (Est. ⫽18.0, SE ⫽10.1)
exhibited significantly shorter difference scores than their peers
(Est. ⫽50.0, SE ⫽4.14), corresponding to a substantial effect size
(d⫽0.81). Significant interactions between fall performance and
enrollment were observed for difference scores for the go/no-go
task (B⫽⫺.611, p⬍.001), RT for the Stroop (B⫽⫺.486, p⫽
.002) and card-sorting tasks (B⫽.595, p⫽.040), whereas
nearly significant interactions were observed for the proportion
of trials correct on the go/no-go task (B⫽.265, p⫽.057). A
nearly significant interaction was also observed for the pictorial
span task of STM (B⫽.343, p⫽.064; see also Supplemental
Tables S6 and S7).
As can be seen in Table 5, in the case of the Stroop task students
who performed poorly (in the 25th percentile) during the fall 2012
administration of the task and who were enrolled in the program
exhibited RTs during the spring 2013 administration that were
615.6 msec shorter than their nonenrolled peers, corresponding to
an effect size of 1.13 for these students (see Figure 3a). The
program exerted opposite effects for the go/no-go and card-sorting
tasks. For the go/no-go task the effect was large for students who
performed well in the fall (d⫽.84), but no effect was observed
among students who performed poorly. For the card-sorting task
the effect of the program was negligible for students who per-
formed poorly in the fall (d⫽.10) but was substantial for students
who had performed well (d⫽0.30), with enrolled students com-
mitting perseverative errors on 2.6% fewer trials than their non-
enrolled peers (see Figure 3b).
Discussion
This study examined the effects of enrollment in an ensemble-
based program of music education on students’ academic achieve-
ment, EFs, and STM in a majority-minority sample of school-age
610
615
620
625
630
635
640
645
650
0 years 1 year 2 years 3 years
e
r
o
c
S
t
s
eT
e
t
is
o
p
moC
Program Enrollment
800
1000
1200
1400
1600
0 years 1 year 2 years 3 years
emiT noitcaeR poortS (msec)
Program Enrollment
a
b
Figure 2. (a) Model-implied mean estimates of composite standardized test scores for each year of program
enrollment. Error bars represents two times the SE of the estimate. (b) Model-implied mean estimates of reaction
time to correct responses on interference trials of the Stroop task. Error bars represents two times the SE of the
estimate.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
158 HOLOCHWOST ET AL.
children. Consistent with our hypothesis, we found that students
enrolled in the program exhibited higher levels of academic
achievement and better performance on select measures of EFs and
STM. Although we did not observe dosage effects for program
enrollment, the number of years’ instruction required to observe
differences between students in the program and those not in the
program varied by measure. Finally, we observed that among
students who were new to the program in its most recent year,
some effects of enrollment on EFs were contingent upon perfor-
mance prior to the start of the program.
The primary contribution of this study is that these results were
obtained using a design that balances concerns of internal and
external validity. The fact that students were randomly assigned to
the program allows for inferences about causation, whereas the
low rate of attrition from the program and the conservative method
of estimating the program’s effects (i.e., intent to treat) allowed for
a valid investigation into the heterochronicity of effects (Schellen-
berg, 2004,2006). The fact that we studied the program as imple-
mented among a sample of majority-minority children attending
school in an area of elevated sociodemographic risk extends the
generalizability of the findings to encompass music education as it
is occurs in the field and as it is offered to underserved students.
The results reported here thus build upon findings from previous,
primarily correlational studies demonstrating that music education
is associated with academic achievement and better performance
on measures of EFs and STM.
Was There Any Effect of the Program?
Enrollment in the program for any number of years was found
to exert a positive effect on academic achievement as measured by
standardized tests in reading, math, and language arts and course
grades in ELA and math. Effects for course grades were observed
only among students in Grades 4 to 8, which may be attributable
to reduced statistical power (the subsample of younger students
was considerably smaller than that for older students) or the
compressed range of the scale used to assign younger students
grades. Nevertheless, these results are broadly consistent with
those yielded by correlational studies (Gouzouasis et al., 2007;
Vaughn & Winner, 2000), including those that controlled for
socioeconomic status (e.g., Corrigall et al., 2013;Miksza, 2010;
Schellenberg, 2006;Southgate & Roscigno, 2009;Wetter et al.,
2009). However, they extend these results by using random as-
signment to address the issue of selection bias (Schellenberg,
2011), and in so doing build upon the small body of experimental
work indicating that music education may lead to enhanced cog-
nitive function (Kaviani et al., 2014;Moreno et al., 2011;Portow-
itz et al., 2009;Schellenberg, 2004) and improved performance on
tests of basic academic skills (Degé & Schwarzer, 2011;Forgeard,
Winner, Norton, & Schlaug, 2009;Graziano, Peterson, & Shaw,
1999;Moreno et al., 2009). Considered in the context of this
literature, the present results suggest that music education may
foster the cognitive capacities that underlie academic achievement,
or, at the very least, improve the ability to take tests that measure
that achievement.
Program enrollment for any number of years also led to
improved accuracy and efficiency on a flanker test of flexible
attention, improved accuracy on a go/no-go task, and improved
performance on a Stroop task and a card-sorting task that draws
upon multiple components of EFs. Thus, consistent with all
previous studies examining music and EFs among adults (e.g.,
Amer et al., 2013) and children (e.g., Zuk et al., 2014) musical
training was associated with improved performance on some,
but not all, measures of EFs. In some cases, we observed
improved performance on the same tasks used in previous
studies, such as the go/no-go (Moreno et al., 2011) or Stroop
tasks (Degé et al., 2011). However, our results build on these
previous findings in two important ways: first, most of these
studies (with the notable exception of Moreno et al., 2011) did
not use random assignment and therefore could not speak to
causation. Second, contrary to previous assertions that the ef-
fects of music education on EFs are limited to “low-level tests
of attention” (Schellenberg & Mankarious, 2012), we observed
significant (albeit modest) effects of music education on a
complex card-sorting task, considered by some to be the “gold
standard” of EFs tasks (Delis, Kaplan, & Kramer, 2001,p.2).
We also observed an effect of program enrollment on per-
formance on a pictorial measure of visual STM. This finding
extends results reported in previous correlational (Lee et al.,
2007) and quasi-experimental (Bilhartz et al., 1999) studies, in
that it establishes for the first time a causal link between music
education and visual STM. It should be noted, however, that a
significant relationship between program enrollment and per-
formance was not observed on for two other tests of visual
STM. We offer one possible explanation for this pattern of
findings below.
For measures of academic achievement, the size of the effect for
enrollment ranged from 0.24 for the composite standardized test
score to 0.30 and 0.42 for course grades in English language arts
Table 5
Mean Estimates for Mixed Models of Executive Functions: New Students and Controls
Measure
Low preprogram performance High preprogram performance
New students
Est. (SE)
Controls
Est. (SE)
New students
Est. (SE)
Controls
Est. (SE)
Go/no-go, difference 111.7 (16.1) 92.3 (18.7) 82.0 (14.6) 138.7 (10.0)
Stroop, RT 1162.1 (149.2) 1777.7 (72.5) 1224.6 (141.7) 1150.7 (87.8)
Card-sorting .174 (.025) .172 (.015) .146 (.031) .165 (.011)
Note. For Stroop and card-sorting tasks, lower scores indicate better performance: faster reaction times and
fewer errors, respectively. Low preprogram performance was defined as scoring in the 25th percentile on the fall
2012 administration of the measure; high preprogram performance was defined as scoring in the 75th percentile.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
159
MUSIC, ACADEMICS, AND EXECUTIVE FUNCTIONS
and math, respectively. For the EFs tasks, effects ranged from 0.18
for the card-sorting task to 0.57 for RT on the Stroop task, whereas
for the pictorial measure of STM the effect was .25. This range
encompasses the size of the effects reported in previous experi-
mental studies of the effects of music education on IQ (i.e., 0.35
for Schellenberg, 2004, 0.40 for Kaviani et al., 2014). According
to the guidelines established by Cohen (1988), these effect sizes
range from small to medium. However, as Cohen himself noted,
these guidelines were never intended to be universally applied to
all findings from data collected in all circumstances, and therefore
Tallmadge’s (1977) benchmark of 0.25 may be more relevant here
(though certainly the extent to which this benchmark can be
universally applied has also been questioned). In one of the few
studies that reported effect sizes for an educational intervention (a
prekindergarten program) on EFs, Weiland and Yoshikawa (2013)
reported effects ranging from 0.20 and 0.27. Thus, with the ex-
ception of the effect of enrollment on the card-sorting task, all the
effects of the music education program approximated or exceeded
both Tallmedge’s benchmark and the size of the effects reported
by Weiland and Yoshikawa (2013).
Were There Dosage Effects for the Program?
Our analyses revealed that there were significant differences in
standardized test scores, year-end grades in ELA and math, select
tasks of EFs, and the pictorial STM task by the level of program
enrollment. The comparisons among means estimates indicated that
for both test scores and grades, significant differences were observed
between students who were not enrolled in the program and those
who had been enrolled for three years. For the EF tasks, the effects of
800
1000
1200
1400
1600
1800
2000
Low Pre-Program Performance High Pre-Program Performance
Post- emiT noitcaeR poortS ma
rgorP (msec)
New Students
Controls
0
0.05
0.1
0.15
0.2
0.25
Low Pre-Program Performance High Pre-Program Performance
s
rorrE evitarevesreP (proportion of trials)
New Students
Controls
a
b
Figure 3. (a) Model-implied mean estimates of reaction time to correct responses on interference trials of the
Stroop task among students new to the program and controls at low and high levels of preprogram performance.
Error bars represents two times the SE of the estimate. (b) Model-implied mean estimates of the proportion of
card-sorting task trials featuring perseverative errors among students new to the program and controls at low and
high levels of preprogram performance. Error bars represents two times the SE of the estimate.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
160 HOLOCHWOST ET AL.
program enrollment were heterochronous: significant differences
were observed between unenrolled and third-year students for the
proportion of trials correct on the flanker task and the card-sorting
task. Differences for the Stroop task were observed between unen-
rolled students and both second- and third-year program students,
whereas disparities for the flanker difference scores were significant
only between unenrolled and first-year students. Differences for the
pictorial STM task paralleled those for test scores and grades, with
significant differences only between students who were not enrolled
in the program and those enrolled for 3 years.
It is therefore the case that, contrary to our hypothesis, dosage
effects— defined as significant differences in outcomes for stu-
dents enrolled in the program for different numbers of years—
were not observed. Instead, significant differences were observed
between students who had not been enrolled in the program and
those who had been enrolled for 1, 2, or 3 years. That said, for
outcomes such as standardized test scores performance continued
to improve with additional years in the program at the trend level,
suggesting the possibility of a dosage effect that we may have been
underpowered to detect (Schellenberg, 2006).
One implication of these findings is that assessments of music
education programs— even programs featuring a very intensive
treatment wherein students receive daily instruction every day
school is in session—may not yield significant findings after a
single school year. After 3 years of intense training, effects were
not observed across all measures of EFs and STM used in this
study. This could be due to limitations of measurement; as others
have noted (Degé et al., 2011), differences in how EFs and
memory are measured may explain discrepant findings across
studies. However, subtle differences in measurement may also
explain inconsistencies within studies. Consider our results for the
Stroop task, in which we reported an effect for musical instruction
on performance where performance was measured as the median
RT for interference trials. With performance thus defined our
results are not only consistent with those reported by Bialystok and
DePape (2009) and Slevc et al. (2016), who also indexed perfor-
mance in terms of RT, but with our findings for the flanker and
go/no-go task. However, had it been feasible to calculate a differ-
ence score for the Stroop task we may have observed inconsisten-
cies not only with the previous studies cited (neither of which
reported a relationship between musical instruction and difference
scores), but with results for our other tasks as well.
However, there is another explanation for our pattern of results
rooted in how the measurement of EFs and STM interacts with
heterochronicity in the development of these abilities. Even if we
had access to the underlying latent constructs that define these
abilities, gaining insight into how different forms of environmental
input (including music education) influenced their development
would be an involved undertaking, given that this development is
likely to be nonlinear and its openness to environmental input may
fluctuate over time. But of course we only have measures that,
with varying degrees of fealty, index certain aspects of these
underlying constructs. As such, we cannot conclude that music
education does not influence EFs because we do not observe an
effect of training on the trail-making task. It may take many more
years of instruction than three for instruction to yield benefits to all
aspects of EFs and STM, or at least for that instruction to yield
benefits that our measures can detect. Certainly this point is
consistent with correlational studies that find a relationship be-
tween cognitive function and being a musician, but define musi-
cians as individuals who have studied music for 6 (Santos-Luiz et
al., 2015), 10 (Hanna-Pladdy & Mackay, 2011), or 20 years (Amer
et al., 2013).
Were the Effects of the Program Contingent Upon
Levels of Preprogram Performance?
Performance on three of the tasks for which effects of cumula-
tive enrollment were observed—the go/no-go, Stroop, and card-
sorting tasks—was also found to vary by enrollment and perfor-
mance on these tasks prior to beginning the program. For the
Stroop task, musical training exerted a remedial or compensatory
effect, such that students who performed poorly on the task prior
to the program achieved much higher scores following training
than their peers. In contrast, musical training exerted an amplifying
effect for the go/no-go and card-sorting tasks, with students who
performed well prior to the program realizing the greatest benefits
from musical training. We may speculate that this difference stems
in part from the nature of the tasks and how performance on those
tasks was measured: RT on the Stroop task is a measure of a
relatively simple aspect of inhibitory control, although the differ-
ence scores for the go/no-go task and, in particular, perseverative
errors on the card-sorting task assess more complex executive
processes. Thus students who come to music education with rel-
atively poor basic inhibitory abilities may realize particular benefit
from engagement in an activity that demands they regularly exer-
cise these capacities, whereas those already adept at the more
complex task of integrating the subroutines of EFs may become
even more skilled when presented with repeated opportunities to
do so.
For students who were new to the program and performed
poorly on their preprogram Stroop task, enrollment conferred a
benefit of 1.13 in magnitude to postprogram performance; for
students who performed well on the preprogram go/no-go and
card-sorting tasks, enrollment resulted in an improvement equiv-
alent to effect sizes of 0.84 and 0.30, respectively. As was the case
for the effects of any enrollment, the magnitude of these effects
exceed both Tallmedge’s benchmark and the size of the effects
reported by Weiland and Yoshikawa (2013). This is noteworthy
given that educational interventions often fail to yield benefits to
untrained domains (Olesen, Westerberg, & Klingberg, 2004) and
that the explicit purpose and pedagogical focus of the program
under examination was not to improve test scores, grades, or EFs,
but rather to teach children music. As others have noted (Amer et
al., 2013;Schellenberg, 2004), this may indicate that music edu-
cation has the rare capacity to yield far transfer effects (Barnett &
Ceci, 2002), or benefits to extramusical cognitive domains.
Limitations and Future Directions
Although this study yielded novel findings regarding the rela-
tionships between music education, academic achievement, EFs,
and STM, those findings are subject to methodological limitations,
particularly with respect to measurement. Chief among these is the
fact that although measures of EFs and STM were taken concur-
rent with the start and end of the program year, the first available
measures of academic achievement and behavior were taken from
students’ first trimester report cards. These were issued in mid-
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
161
MUSIC, ACADEMICS, AND EXECUTIVE FUNCTIONS
November, at which point the program had been in operation for
approximately 8 weeks. This may in part explain why effects of
enrollment on EFs were observed for new students, but not for
academic achievement: given how little we know about the timing
of the program’s effects, it is possible that even a short period of
instruction elevated students’ first-trimester performance, thereby
reducing the likelihood of observing significant change in perfor-
mance from the first to the third trimester.
It should also be noted that while we included measures of STM
(e.g., a forward digit-span task), we did not include measures of
working memory that required the manipulation of information
held in STM by the central executive (e.g., a backward digit-span
task). The decision to focus on STM was based on the argument
that musical practice requires the maintenance of information in
STM (Schellenberg, 2004) and previous research referenced above
that supports this argument (e.g., Amer et al., 2013;Fauvel et al.,
2014;Franklin et al., 2008;Hansen et al., 2013;Lee et al., 2007).
The decision to focus on visual STM was motivated both by the
relative dearth of literature addressing the relationship between
music education and visual (rather than verbal) memory and the
fact that establishing a causal link between musical instruction and
visual memory would constitute stronger evidence of far transfer.
That said, a future study that systematically examined the relation-
ship between music education and multiple forms of memory—
working and short-term, visual and verbal—would add much to
our understanding of the ancillary benefits of music education.
Another limitation of our study was that the same tasks of EFs
and STM were used and in the same order (given the problems
associated with counterbalancing the order of nine tasks in a
relatively small sample) during both the pre- and postprogram
assessments. However, as Moreno et al. (2011) observed this issue
is minimized by random assignment, in that practice and order
effects should apply equally to both the treatment and control
groups. We also did not control for students’ socioeconomic status
or IQ, but again the rationale for doing so—that more affluent or
intelligent students have access to music education (Schellenberg,
2004)—applies to nonexperimental designs. Here students had
equal chance of being assigned to the treatment condition, regard-
less of socioeconomic status or IQ.
Beyond these limitations there are the broader questions of
whether the results observed can be attributed to the influence of
the musical content of the program. It is possible that an after-
school program that did not include musical training but neverthe-
less featured “school-like” activities—teacher-directed activities
requiring students to put forth sustained effect (Ceci & Williams,
1997)—might benefit children’s academic achievement and EFs
directly (cf. Schellenberg, 2004) or that the program’s socioemo-
tional benefits (e.g., to self-esteem; Costa-Giomi, 2004) could
influence these outcomes indirectly. However, to date experimen-
tal studies including a control group of students assigned to non-
musical activities (drama, Schellenberg, 2004; visual arts, Moreno
et al., 2011) have yielded null results for these students, supporting
the assertion that the benefits of these programs are attributable to
specific features of music education (Schellenberg, 2006) such as
progressive difficulty and multimodal sensory integration (Bugos
et al., 2007).
Assuming this is the case, the question remains: how or through
what processes does music achieve these benefits? One possibility
is that music education confers benefits to conative skills such as
achievement motivation or self-esteem. There is some evidence
that music education is correlated with motivation (Santos-Luiz et
al., 2015), though these findings are subject to third-factor expla-
nations and self-selection bias. Although all three measures of
STM included in our study were build-up tasks (in which a correct
answer on a trial with nstimuli results in n⫹1 stimuli on the next
trial), this process occurred most slowly in the pictorial task for
which significant program effects were observed. This raises the
possibility that the superior performance exhibited by enrolled
students on this task were driven, in part, by enhanced levels of
mastery motivation or persistence. Future experimental research
should explore this possibility, addressing whether music educa-
tion causes higher levels of motivation while paying particular
attention to the possibility that students exhibiting the highest
levels of motivation prior to musical education exhibit the most
rapid gains in motivation due to reciprocal, self-promoting inter-
actions between students’ dispositional motivation and subsequent
training.
Another possibility is that music education benefits academic
outcomes through its effects on EFs (Hannon & Trainor, 2007;
Schellenberg & Peretz, 2008), though the empirical investigations
into this possibility have yielded mixed results (cf. Degé et al.,
2011;Schellenberg, 2011). Laying aside the issue of mediation,
what processes may account for music’s effects on EFs? The
relationship between musical training and prefrontal cortical func-
tion (Fujioka et al., 2006;Zuk et al., 2014) and development
(Hudziak et