ArticlePDF Available


School-researcher partnerships and large in vivo experiments help focus on useful, effective, instruction. SCIENCE VOL 342 22 NOVEMBER 2013 935
Science and technology have had enor-
mous impact on many areas of human
endeavor but surprisingly little effect
on education. Many large-scale fi eld trials
of science-based innovations in education
have yielded scant evidence of improvement
in student learning ( 1, 2), although a few
have reliable positive outcomes ( 3, 4). Edu-
cation involves many important issues, such
as cultural questions of values, but we focus
on instructional decision-making in the con-
text of determined instructional goals and
suggests ways to manage instructional com-
Ambiguities and Contexts in Instruction
Many debates about instructional methods
suffer from a tendency to apply compel-
ling labels to vaguely described procedures,
rather than operational defi nitions of instruc-
tional practices ( 5, 6). Even when practices
are reasonably well defi ned, there is not a
consistent evidential base for deciding which
approach is optimal for learning. Empiri-
cal investigations of instructional methods,
including controlled laboratory experiments
in cognitive and educational psychology,
often fail to yield consensus. For instance,
controversy exists regarding benefits of
immediate ( 7) versus delayed feedback ( 8),
or use of concrete ( 9) versus abstract mate-
rials ( 10).
Further complicating the picture is that
results often vary across content or popula-
tions. For example, instruction that is effec-
tive for simple skills has been found to be
ineffective for more complex skills ( 11),
and techniques such as prompting students
to provide explanations ( 12) may not be uni-
versally effective ( 13). Effectiveness of dif-
ferent approaches is often contingent on stu-
dent population or level of prior achievement
or aptitude. Some approaches, for exam-
ple, may be particularly effective for low-
achieving students ( 14, 15). Although spe-
cifi c instructional decisions may be useful at
the level of the individual student (e.g., will
this student learn better right now if I give
her feedback or if I let her grapple with the
material for a while?), the search for general
methods that optimize the effectiveness, effi -
ciency, and level of student engagement is
more challenging.
Complexity of Instructional Design
Of the many factors that affect learning in
real-world contexts, we describe three of
particular importance: instructional tech-
nique, dosage, and timing. Independently
combining choices on one dimension with
choices on other dimensions produces a
vast space of reasonable choice options, as
shown in the fi gure .
Instructional techniques. Many lists of
learning principles suggest instructional
techniques and point to supporting research
( 12, 16). Each list has between 3 and 25
principles. In-depth synthesis of nine such
sources yielded an estimate of 30 indepen-
dent instructional techniques (see the table
and table S1 ).
Dosage and implementation. Many
instructional distinctions have multiple
values or are continuous (e.g., the ratio of
examples to questions or problems given in
an assignment, the spacing of time between
related activities). These dimensions are
mostly compatible with each other—almost
all can be combined with any other.
Intervention timing. The optimal tech-
nique may not be the same early in learning
as it is later. Consider how novice students
benefi t from studying many worked exam-
ples in place of many problems, whereas
shifting to pure problem-solving practice
becomes more effective as students develop
expertise ( 17). Many researchers have sug-
gested that effective instruction should
provide more structure or support early in
learning or for more diffi cult or complex
ideas and fade that assistance as the learner
advances ( 18, 19).
If we consider just 15 of the 30 instruc-
tional techniques we identifi ed, three alter-
native dosage levels, and the possibility
of different dosage choices for early and
late instruction, we compute 315*2 or 205
trillion options. Some combinations may
not be possible or may not make sense in
a particular content area, yet other factors
add further complexity: Many techniques
have more than three possible dosage lev-
els, there may be more than two time points
where the instructional optimum changes,
different knowledge needs in different
domains often require a different optimal
combination. For example, it may be opti-
mal to adjust spacing of practice continually
for each student on each knowledge com-
ponent ( 20). As another example, when the
target knowledge is simple facts, requiring
Instructional Complexity
and the Science to Constrain It
Kenneth R. Koedinger
*, Julie L. Booth
2, David Klahr
School-researcher partnerships and large
in vivo experiments help focus on useful,
effective, instruction.
What instruction is best?
Focused practice
Spacing of practice
Concreteness of
Gradually widen Distributed practice
mix Tes t on
problems Study
examples 50/50
mix Tes t on
problems Study
examples 50/50
mix Tes t on
Mix Abstract
Timing of
Grouping of
Who explains
Block topics
in chapters
Delayed No feedback
Mix Ask for explanations Explainin Mix Ask for explanations
Immediate Delayed No feedback
Dl d
Block topics
in chapters
Instructional design choices. Different choices along different instructional dimensions can be combined
to produce a vast set of instructional options. The path with thicker arrows illustrates one set of choices within
a space of trillions of such options.
1Carnegie Mellon University, Pittsburgh, PA 15213, USA.
2Temple University, Philadelphia, PA 19122, USA.
*Corresponding author.
Published by AAAS
on November 21, 2013www.sciencemag.orgDownloaded from on November 21, 2013www.sciencemag.orgDownloaded from on November 21, 2013www.sciencemag.orgDownloaded from
recall and use of knowledge produces more
robust learning, but for complex problem-
solving skills, studying a substantial num-
ber of worked examples is better ( 1).
The vast size of this space reveals that
simple two-sided debates about improving
learning—in the scientifi c literature, as well
as in the public forum—obscure the com-
plexity that a productive science of instruc-
tion must address.
Taming Instructional Complexity
We make fi ve recommendations to advance
instructional theory and to maximize its rel-
evance to educational practice.
1. Searching in the function space.
Following the Knowledge-Learning-
Instruction framework ( 21), we suggest
three layers of functions of instruction: (i)
to yield better assessment outcomes that
refl ect broad and lasting improvements in
learner performance, (ii) instruction must
change learners’ knowledge base or intel-
lectual capacity and (iii) must require that
learners’ minds execute appropriate learn-
ing processes.
We specify different functions to be
achieved at each layer. The most distal,
but observable, functions of instruction are
assessment outcomes: long-term retention,
transfer to new contexts, or desire for future
learning. More proximal, but unobserv-
able, functions are those that change differ-
ent kinds of knowledge: facts, procedural
skills, principles, learning skills, or learning
beliefs and dispositions. The most imme-
diate and unobservable functions support
learning processes or mechanisms: memory
and fl uency building, induction and refi ne-
ment, or understanding and sense-making
( 21, 22).
Functions at each layer suggest more
focused questions that reduce the instruc-
tional design space ( 23): Which instruc-
tional choices best support memory to
increase long-term retention of facts?
Which are best for inducing general skills
that produce transfer of learning to new sit-
uations? Which are best for sense-making
processes that produce learning skills and
higher learner self-efficacy toward better
future learning? We can associate different
subsets of the instructional design dimen-
sions with individual learning functions.
For example, spacing enhances memory,
worked-examples enhance induction, and
self-explanation enhances sense-making
(see the table). The success of this approach
of separating causal functions of instruction
depends on partial decomposability ( 24) and
some independence of effects of instruc-
tional variables: Designs optimal for one
function (e.g., memory) should not be detri-
mental to another (e.g., induction). To illus-
trate, consider that facts require memory but
not induction; thus, a designer can focus just
on the subset of instructional techniques that
facilitate memory.
Theoretical work can offer insight into
when an instructional choice is depen-
dent on a learning function. Computational
models that learn like human students do
demonstrate, for instance, that interleav-
ing problems of different kinds functions to
improve learning of when to use a principle
or procedure ( 25), whereas blocking simi-
lar problems types (“one subgoal at a time”)
improves learning of how to execute ( 26).
2. Experimental tests of instruc-
tional function decomposability. Optimal
instructional choices may be function-
specifi c, given variation across studies of
instructional techniques where results are
dependent on the nature of the knowledge
goals. For example, if the instructional goal
is long-term retention (an outcome func-
tion) of a fact (a knowledge function), then
better memory processes (a learning func-
tion) are required; more testing than study
will optimize these functions. If the instruc-
tional goal is transfer (a different outcome
function) of a general skill (a different
knowledge function), then better induc-
tion processes (a different learning func-
tion) are required; more worked-example
study will optimize these functions. The
ideal experiment to test this hypothesis is a
two-factor study that varies the knowledge
content (fact-learning versus general
Spacing Space practice across time > mass practice all at once
Scaffolding Sequence instruction toward higher goals > no sequencing
Exam expectations Students expect to be tested > no testing expected
Testing Quiz for retrieval practice > study same material
Segmenting Present lesson in learner-paced segments > as a continuous unit
Feedback Provide feedback during learning > no feedback provided
Pretraining Practice key prior skills before lesson > jump in
Worked example Worked examples + problem-solving practice > practice alone
Concreteness fading Concrete to abstract representations > starting with abstract
Guided attention Words include cues about organization > no organization cues
Linking Integrate instructional components > no integration
Goldilocks Instruct at intermediate difficulty level > too hard or too easy
Activate preconceptions Cue student's prior knowledge > no prior knowledge cues
Feedback timing Immediate feedback on errors > delayed feedback
Interleaving Intermix practice on different skills > block practice all at once
Application Practice applying new knowledge > no application
Variability Practice with varied instances > similar instances
Comparison Compare multiple instances > only one instance
Multimedia Graphics + verbal descriptions > verbal descriptions alone
Modality principle Verbal descriptions presented in audio > in written form
Redundancy Verbal descriptions in audio > both audio & written
Spatial contiguity Present description next to image element described > separated
Temporal contiguity Present audio & image element at the same time > separated
Coherence Extraneous words, pictures, sounds excluded > included
Anchored learning Real-world problems > abstract problems
Metacognition Metacognition supported > no support for metacognition
Explanation Prompt for self-explanation > give explanation > no prompt
Questioning Time for reflection & questioning > instruction alone
Cognitive dissonance Present incorrect or alternate perspectives > only correct
Interest Instruction relevant to student interests > not relevant
Sense-making/Understanding Induction/Refinement Memory/Fluency
Principle Description of Typical Effect
Instructional design principles. These address three different functions of instruction: memory, induction,
and sense-making (see table S1).
Published by AAAS SCIENCE VOL 342 22 NOVEMBER 2013 937
skill) and instructional strategy (example
study versus testing). More experiments
are needed that differentiate how different
instructional techniques enhance different
learning functions.
3. Massive online multifactor studies.
Massive online experiments involve thou-
sands of participants and vary many factors
at once. Such studies ( 27, 28) can accelerate
accumulation of data that can drive instruc-
tional theory development. The point is to
test hypotheses that identify, in the context
of a particular instructional function, what
instructional dimensions can or cannot be
treated independently.
Past studies have emphasized near-term
effects of variations in user-interface fea-
tures ( 27, 28). Designing massive online
studies that vary multiple instructional tech-
niques is feasible, but convenient access to
long-term outcome variables is an unsolved
problem. Proximal variables measuring
student engagement and local performance
are easy to collect (e.g., how long a game
or online course is used; proportion correct
within it). But measures of students’ local
performance and their judgments of learn-
ing are sometimes unrelated, or even nega-
tively correlated, with desired long-term
learning outcomes ( 29).
4. Learning data infrastructure. Massive
instructional experiments are essentially
going on all the time in schools and col-
leges. Because collecting data on such activ-
ities is expensive, variations in instructional
techniques are rarely tracked and associated
with student outcomes. Yet, technology is
increasingly providing low-cost instruments
to evaluate the learning experience for data
collection. Investment is needed in infra-
structure to facilitate large-scale data collec-
tion, access, and use, particularly in urban
and low-income school districts. Two cur-
rent efforts include LearnLab’s huge educa-
tional technology data repository ( 30) and
the Gates Foundation’s Shared Learning
Infrastructure ( 31).
5. School-researcher partnerships. On-
going collaborative problem-solving part-
nerships are needed to facilitate interac-
tion between researchers, practitioners, and
school administrators. When school cooper-
ation is well-managed and most or all of an
experiment is computer-based, large well-
controlled “in vivo” experiments can be run
in courses with substantially less effort than
an analogous lab study.
A lab-derived principle may not scale
to real courses because nonmanipulated
variables may change from the lab to a real
course, which may change learning results.
In in vivo experiments, these background
conditions are not arbitrarily chosen by
the researchers but instead are determined
by the existing context. Thus, they enable
detection of generalization limits more
quickly before moving to long, expensive
randomized fi eld trials.
School-researcher partnerships are use-
ful not only for facilitating experimentation
in real learning contexts but also for design-
ing and implementing new studies that
address practitioner needs ( 32, 33).
In addition to school administrators and
practitioners, partnerships must include crit-
ical research perspectives, including domain
specialists (e.g., biologists and physicists);
learning scientists (e.g., psychologists and
human-computer interface experts); and
education researchers (e.g., physics and
math educators). It is important to forge
compromises between the control desired
by researchers and the fl exibility demanded
by real-world classrooms. Practitioners and
education researchers may involve more
domain specialists and psychologists in
design-based research, in which iterative
changes are made to instruction in a closely
observed, natural learning environment in
order to examine effects of multiple factors
within the classroom ( 34).
Our recommendations would require
reexamination of assumptions about the
types of research that are useful. We see
promise in sustained science-practice
infrastructure funding programs, creation
of new learning science programs at
universities, and emergence of new fi elds
and professional organizations ( 35, 36).
These and other efforts are needed to
bring the full potential of science and tech-
nology to bear on optimizing educational
References and Notes
1. M. Dynarski et al., Effectiveness of Reading and Math-
ematics Software Products: Findings from the First Stu-
dent Cohort [Report provided to Congress by the National
Center for Education Evaluation, Institute of Education
Sciences (IES), Washington, DC, 2007].
2. Coalition for Evidence-Based Policy, Randomized Con-
trolled Trials Commissioned by the IES since 2002: How
Many Found Positive Versus Weak or No Effects; http://
IES-Commissioned-RCTs-positive-vs-weak-or-null-fi nd-
3. J. Roschelle et al., Am. Educ. Res. J. 47, 833–878 (2010).
4. J. F. Pane, B. A. Griffi n, D. F. McCaffrey, R. Karam, Effec-
tiveness of Cognitive Tutor Algebra I at Scale (Working
paper, Rand Corp., Alexandria, VA, 2013);
5. D. Klahr, J. Li, J. Sci. Educ. Technol. 14, 217–238 (2005).
6. D. Klahr, Proc. Natl. Acad. Sci. U.S.A. 110 (suppl. 3),
14075–14080 (2013).
7. A. T. Corbett, J. R. Anderson, Proceedings of ACM CHI 2001
(ACM Press, New York, 2001), pp. 245–252.
8. R. A. Schmidt, R. A. Bjork, Psychol. Sci. 3, 207–217
9. A. Paivio, J. Verbal Learn. Verbal Behav. 4, 32–38 (1965).
10. J. A. Kaminski, V. M. Sloutsky, A. F. Heckler, Science 320,
454–455 (2008).
11. G. Wulf, C. H. Shea, Psychon. Bull. Rev. 9, 185–211
12. H. Pashler et al., Organizing Instruction and Study to
Improve Student Learning (National Center for Education
Research 2007-2004, U.S. Department of Education,
Washington, DC, 2007).
13. R. Wylie, K. R. Koedinger, T. Mitamura, Proceedings of the
31st Annual Conference of the Cognitive Science Society
(CSS, Wheat Ridge, CO, 2009), pp. 1300–1305.
14. R. E. Goska, P. L. Ackerman, J. Educ. Psychol. 88, 249–259
15. S. Kalyuga, Educ. Psychol. Rev. 19, 387–399 (2007).
16. K. A. Dunlosky, E. J. Rawson, E. J. Marsh, M. J. Nathan, D. T.
Willingham, Psychol. Sci. Public Interest 14, 4–58 (2013).
17. S. Kalyuga, P. Ayres, P. Chandler, J. Sweller, Educ. Psychol.
38, 23–31 (2003).
18. R. L. Goldstone, J. Y. Son, J. Learn. Sci. 14, 69–110 (2005).
1 9 . P. A. Wo źniak, E. J. Gorzelańczyk, Acta Neurobiol. Exp.
(Warsz.) 54, 59–62 (1994).
20. P. I. Pavlik, J. R. Anderson, J. Exp. Psychol. Appl. 14,
101–117 (2008).
21. K. R. Koedinger, A. T. Corbett, C. Perfetti, Cogn. Sci. 36,
757–798 (2012).
22. L. Resnick, C. Asterhan, S. Clarke, Eds., Socializing Intel-
ligence through Academic Talk and Dialogue (American
Educational Research Association, Washington, DC, 2013).
23. G. Bradshaw, in Cognitive Models of Science, R. Giere and
H. Feigl, Eds. (University of Minnesota Press, Minneapolis,
1992), pp. 239–250.
24. H. A. Simon, Sciences of the Artifi cial (MIT Press, Cam-
bridge, MA, 1969).
25. N. Li, W. Cohen, K. Koedinger, Lect. Notes Comput. Sci.
7315, 185–194 (2012).
26. K. VanLehn, Artif. Intell. 31, 1 (1987).
27. D. Lomas, K. Patel, J. Forlizzi, K. R. Koedinger, in Proceed-
ings of the SIGCHI Conference on Human Factors in Com-
puting Systems (ACM, New York, 2013), pp. 89–98.
28. E. Andersen, Y. Liu, R. Snider, R. Szeto, Z. Popović,
Proceedings of the SIGCHI Conference on Human Fac-
tors in Computing Systems (ACM, New York, 2011), pp.
29. R. A. Bjork, J. Dunlosky, N. Kornell, Annu. Rev. Psychol. 64,
417–444 (2013).
30. LearnLab, Pittsburgh Science of learning Center; www.
31. InBloom,
32. Strategic Education Research Partnership, www.serpinsti-
33. IES, U.S. Department of Education, Researcher-Practitioner
Partnerships in Education Research, Catalog of Federal
Domestic Assistance 84.305H;
34. S. A. Barab, in Handbook of the Learning Sciences, K. Saw-
yer, Ed. (Cambridge Univ. Press, Cambridge, 2006), pp.
35. Society for Research on Educational Effectiveness, www.
36. International Educational Data Mining Society,
Acknowledgments: The authors receive support from
NSF grant SBE-0836012 and Department of Education grants
R305A100074 and R305A100404. The views expressed are
those of the authors and do not represent those of the funders.
Thanks to V. Aleven, A. Fisher, N. Newcombe, S. Donovan, and T.
Shipley for comments.
Supplementary Materials
Published by AAAS
... Improving education involves identifying and promoting instructional practices that have causal benefits for student outcomes (US Department of Education, 2016National Science Foundation & Institute for Education Sciences, 2013). To test whether an instructional practice exerts a causal influence on an outcome measure, the most straightforward and compelling research method is to conduct an experiment (Whitehurst, 2003;National Research Council, 2002), and in particular, to embed this experiment in an education setting, yielding causal inferences that are authentic to the contexts where they matter in practice (Motz et al., 2018;Koedinger et al., 2013). An experiment satisfies the strong requirements of causal inference by providing evidence that a change in behavior is attributable to a change in treatment, in a specific direction (ruling out reverse causality), and by minimizing (through random assignment) the possibility that it is explainable through other causal mechanisms. ...
... Experimental psychologists have been filling this gap to some extent, conducting experimental studies on human learning, albeit primarily under controlled laboratory conditions, and then advocating for their applicability in education settings (Roediger & Pyc, 2012;Pashler, et al., 2007;Benassi et al., 2014). Their advocacy, however, has been restrained; these same experimental psychologists, and others as well, affirm that research is needed to validate claims in practice (Daniel, 2012;Motz et al., 2018;Koedinger et al., 2013). For example, in Dunlosky et al.'s (2013) extensive review of learning strategies from cognitive and educational psychology, the evidence from education settings is marked as 'insufficient' for 8 out of the 10 strategies under investigation. ...
Full-text available
For researchers seeking to improve education, a common goal is to identify teaching practices that have causal benefits in classroom settings. To test whether an instructional practice exerts a causal influence on an outcome measure, the most straightforward and compelling method is to conduct an experiment. While experimentation is common in laboratory studies of learning, experimentation is increasingly rare in classroom settings, and to date, researchers have argued it is prohibitively expensive and difficult to conduct experiments on education in situ. To address this challenge, we present Terracotta (Tool for Education Research with RAndomized COnTrolled TriAls), an open-source web application that integrates with a learning management system to provide a comprehensive experimental research platform within an online class site. Terracotta automates randomization, informed consent, experimental manipulation of different versions of learning activities, and export of de-identified research data. Here we describe these features, and the results of a live classroom demonstration study using Terracotta.
... Evidence from a meta-analysis of guided inquiry contexts (Furtak et al., 2012) concludes that a combination of guidance and exploration in inquiry learning environments is beneficial. Nevertheless, there are many open questions about what combinations of methods work best (cf., Koedinger et al., 2013). Within science education, there have been calls for more "adequate answers . . . ...
... As many traditional museum exhibits, maker spaces and constructivist theories suggest (Jeffery-Clay, 1998), might students learn as well as or better, without explicit guidance when in the context of more open-ended hands-on construction? Or perhaps better results can be found by selecting from and automating some of the many forms of instructional guidance and learning support that have been developed, explored, and tested (e.g., Clark & Mayer, 2016;Hattie, 2012;Koedinger et al., 2013;Schwartz et al., 2016)? We find it useful to distinguish between more explicit forms of guidance that use verbal instructions, prompts, or feedback from more implicit forms of guidance that use the structure and sequence of tasks to aid learning. ...
Full-text available
Background Museum exhibits encourage exploration with physical materials typically with minimal signage or guidance. Ideally children get interactive support as they explore, but it is not always feasible to have knowledgeable staff regularly present. Technology-based interactive support can provide guidance to help learners achieve scientific understanding for how and why things work and engineering skills for designing and constructing useful artifacts and for solving important problems. We have developed an innovative AI-based technology, Intelligent Science Exhibits that provide interactive guidance to visitors of an inquiry-based science exhibit. Methods We used this technology to investigate alternative views of appropriate levels of guidance in exhibits. We contrasted visitor engagement and learning from interaction with an Intelligent Science Exhibit to a matched conventional exhibit. Findings We found evidence that the Intelligent Science Exhibit produces substantially better learning for both scientific and engineering outcomes, equivalent levels of self-reported enjoyment, and higher levels of engagement as measured by the length of time voluntarily spent at the exhibit. Contribution These findings show potential for transforming hands-on museum exhibits with intelligent science exhibits and more generally indicate how providing children with feedback on their predictions and scientific explanations enhances their learning and engagement.
... The studies by Samani and Pan (2021) and by Sana and Yan (2022) followed this principle, using quizzes or homework assignments that referred to previously learned content and combining interleaving with retrieval practice, which is a different desirable difficulty that benefits learning independently of interleaving (see Roelle et al., 2022). Hence, teaching scientific concepts requires the combination of various learning phases (Koedinger et al., 2012(Koedinger et al., , 2013Oser & Baeriswyl, 2001) to master complexity and secure the acquisition of adequate principle-based cognitive skills to achieve lasting learning. ...
Full-text available
Inductive learning, that is, abstracting conceptual knowledge, rules, or principles from exemplars, plays a major role in educational settings, from literacy acquisition to mathematics and science learning. Interleaving exemplars of different categories rather than presenting blocks might be a simple but powerful way to improve inductive learning by supporting discriminative contrast. Although a consistent advantage of interleaving has been demonstrated for visual materials, relatively few studies have examined educationally relevant materials, such as mathematical tasks, science problems, and verbal materials, and their results are mixed. We discuss how interleaving could be made fruitful for school learning of mathematics, science, and literacy acquisition. We conclude that interleaving should be tailored to the specific learning content and combined with supportive instructional measures that assist students in comparing exemplars for discriminating features. Finally, we sketch research gaps that revolve around the use of interleaved learning in the classroom.
... Determining the optimal instructional choices without a rigorous testing framework is largely intractable (Koedinger, Booth, & Klahr, 2013). We suggest testing the possible hierarchies of skills as a set of A-B tests, similar to instructional methods testing in massive online courses . ...
Full-text available
Background and Context Lopez and Lister first presented evidence for a skill hierarchy of code reading, tracing, and writing for introductory programming students. Further support for this hierarchy could help computer science educators sequence course content to best build student programming skill. Objective This study aims to replicate a slightly simplified hierarchy of skills in CS1 using a larger body of students (600+ vs. 38) in a non-major introductory Python course with computer-based exams. We also explore the validity of other possible hierarchies. Method We collected student score data on 4 kinds of exam questions. Structural equation modeling was used to derive the hierarchy for each exam. Findings We find multiple best-fitting structural models. The original hierarchy does not appear among the “best” candidates, but similar models do. We also determined that our methods provide us with correlations between skills and do not answer a more fundamental question: what is the ideal teaching order for these skills? Implications This modeling work is valuable for understanding the possible correlations between fundamental code-related skills. However, analyzing student performance on these skills at a moment in time is not sufficient to determine teaching order. We present possible study designs for exploring this more actionable research question.
... To efficiently and effectively address students' low level of participation in online collaborative learning at a large scale, researchers have adopted learning design principles with learning analytics. Learning design focuses on the design and development of reusable learning activities through the creation and application of a repertoire of pedagogical tools (eg, taxonomy, frameworks) (Koedinger et al., 2013). Investigations include but are not limited to alignment evaluation (Zheng et al., 2020), participation level prediction (Er et al., 2019), learning pattern identification (Holmes et al., 2019), and real-time reports with teacher dashboards (Martinez-Maldonado, 2019). ...
A discussion forum is a valuable tool to support student learning in online contexts. However, interactions in online discussion forums are sparse, leading to other issues such as low engagement and dropping out. Recent educational studies have examined the affordances of conversational agents (CA) powered by artificial intelligence (AI) to automatically support student participation in discussion forums. However, few studies have paid attention to the safety of CAs. This study aimed to address the safety challenges of CAs constructed with educational big data to support learning. Specifically, we proposed a safety‐aware CA model, benchmarked with two state‐of‐the‐art (SOTA) models, to support high school student learning in an online algebra learning platform. We applied automatic text analysis to evaluate the safety and socio‐emotional support levels of CA‐generated and human‐generated texts. A large dataset was used to train and evaluate the CA models, which consisted of all discussion post‐reply pairs (n = 2,097,139) by 71,918 online math learners from 2015 to 2021. Results show that while SOTA models can generate supportive texts, their safety is compromised. Meanwhile, our proposed model can effectively enhance the safety of generated texts while providing comparable support. Practitioner notes What is already known about this topic Online discussion forums have been plagued by a lack of interaction among students due to factors such as expectations to receive no response and perceptions of topic irrelevance which lead to low motivation to participate. AI‐based conversational agents can automatically support students' interactions in online discussion forums at a large scale, and their generated responses can be human‐like, contextually coherent and socio‐emotionally supportive. Unsafe discourse exchanges between students and conversational agents can be dangerous as identity attacks, aggravation and bullying behaviours embedded in discourses can disrupt students' knowledge inquiry and negatively influence student motivation and engagement. However, few educational studies have paid attention to the safety of conversational agents. What this paper adds This study proposes and synthesized strategies to build AI‐based conversational agents that automatically support online discussions with safe and supportive discourses. This study reveals the relationship between discourse safety and social support, suggesting supportive discourses can also be unsafe. This study enriches the literature on educational conversational agents by synthesizing a conceptual framework on discourse safety and social support, and by proposing concrete algorithmic strategies to improve the safety of conversational agents. Implications for practice and/or policy Researchers and practitioners can adopt strategies in this study such as generation control, open‐sourced models and public API services to evaluate students' discourse safety for early intervention or modify existing conversational agents to be safety‐aware. Practitioners can utilize the proposed conversational agent to automatically support students both safely and socio‐emotionally at a large scale. Practitioners should be cautious when examining social support with automatic analysis, as not all supportive texts are safe. While unsafe texts can provide emotional support, it does not justify their appropriateness in a learning environment. What is already known about this topic Online discussion forums have been plagued by a lack of interaction among students due to factors such as expectations to receive no response and perceptions of topic irrelevance which lead to low motivation to participate. AI‐based conversational agents can automatically support students' interactions in online discussion forums at a large scale, and their generated responses can be human‐like, contextually coherent and socio‐emotionally supportive. Unsafe discourse exchanges between students and conversational agents can be dangerous as identity attacks, aggravation and bullying behaviours embedded in discourses can disrupt students' knowledge inquiry and negatively influence student motivation and engagement. However, few educational studies have paid attention to the safety of conversational agents. What this paper adds This study proposes and synthesized strategies to build AI‐based conversational agents that automatically support online discussions with safe and supportive discourses. This study reveals the relationship between discourse safety and social support, suggesting supportive discourses can also be unsafe. This study enriches the literature on educational conversational agents by synthesizing a conceptual framework on discourse safety and social support, and by proposing concrete algorithmic strategies to improve the safety of conversational agents. Implications for practice and/or policy Researchers and practitioners can adopt strategies in this study such as generation control, open‐sourced models and public API services to evaluate students' discourse safety for early intervention or modify existing conversational agents to be safety‐aware. Practitioners can utilize the proposed conversational agent to automatically support students both safely and socio‐emotionally at a large scale. Practitioners should be cautious when examining social support with automatic analysis, as not all supportive texts are safe. While unsafe texts can provide emotional support, it does not justify their appropriateness in a learning environment.
Full-text available
The Coronavirus Disease 2019 (COVID-19) pandemic has catalyzed the expectations for technology-enhanced interactions with personalized educational materials. Adjusting the content of educational materials to the geographical location of a learner is a customization feature of personalized education and is used to develop the interest of a learner in the content. The educational content of interest in this report is bioinformatics, in which the knowledge spans biological science and applied mathematics disciplines. The Human Heredity and Health in Africa (H3Africa) Initiative is a resource suitable for use when obtaining data and peer-reviewed scholarly articles, which are geographically relevant and focus on authentic problem solving in the human health domain. We developed a computerized platform of interactive visual representations of curated bioinformatics datasets from H3Africa projects, which also supports customization, individualization and adaptation features of personalized education. We obtained evidence for the positive effect size and acceptable usability of a visual analytics resource designed for the retrieval-based learning of facts on functional impacts of genomic sequence variants. We conclude that technology-enhanced personalized bioinformatics educational interventions have implications in (1) the meaningful learning of bioinformatics; (2) stimulating additional student interest in bioinformatics; and (2) improving the accessibility of bioinformatics education to non-bioinformaticians.
Full-text available
Articles in this special issue on “Diverse Lenses on Improving Online Learning Theory, Research, and Practice” begin to address the gap between (1) research on psychological constructs that are too abstract to guide many instructional decisions and (2) empirically derived guidance that is quite concrete but limited in explanatory value and generalizability. Needed now is a multi-level framework for online learning that offers specific guidance for practitioners’ instructional decisions while also supporting a conceptual organization of accumulated research findings that fosters new insights and research questions. In this commentary, I describe a framework that would encompass multiple kinds of learning; different learning goals; discipline-specific ways of knowing and demonstrating knowledge; key technology features; and learner differences.
Hybrid systems combining artificial and human intelligence hold great promise for training human skills. In this paper, I position the concept of Hybrid Human-AI Regulation and illustrate this with an example of a first prototype of a Hybrid Human-AI Regulation (HHAIR) system. HHAIR supports self-regulated learning (SRL) in the context of adaptive learning technologies (ALTs) with the aim to develop learners' self-regulated learning skills. This prototype targets young learners (10–14 years) for whom SRL skills are critical in today's society. Many of these learners use ALTs to learn mathematics and languages every day in school. ALTs optimize learning based on learners' performance data, but even the most sophisticated ALTs fail to support SRL. In fact, most ALTs take over (offload) regulation from learners. In contrast, HHAIR positions hybrid regulation as a collaborative task of the learner and the AI which is gradually transferred from AI-regulation to self-regulation. Learners will increasingly regulate their own learning progressing through different degrees of hybrid regulation. In this way HHAIR supports optimized learning and the transfer and development of SRL skills for lifelong learning (future learning). The HHAIR concept is novel in proposing a hybrid intelligence approach training human SRL skills with AI. This paper outlines theoretical foundations from SRL theory, hybrid intelligence and learning analytics. A first prototype in the context of ALTs for young learners is described as an example of hybrid human-AI regulation and future advancement is discussed. In this way, foundational theoretical, empirical, and design work are combined in articulating the concept of Hybrid Human-AI Regulation which features forward adaptive support for SRL and transfer of control between human and AI over regulation.
Full-text available
Vier ontwerpprincipes voor het leren van historische begrippen door leerlingen Een realistische reviewstudie Om kritisch te leren denken hebben leerlingen domeinspecifieke kennis nodig. In het vak geschiedenis zijn historische begrippen een essentieel onderdeel van die kennis. In tegenstelling tot het gebruik van meta-historische begrippen heeft dit aspect van historisch denken de afgelopen jaren relatief weinig aandacht gekregen van vakdidactici. Dit artikel beoogt het synthetiseren van kennis uit de cognitieve ont-wikkelingspsychologie met het oog op de toepassing ervan in het geschiedenison-derwijs. Er wordt beargumenteerd dat kennis van historische begrippen bijdraagt tot het ontwikkelen van kritisch denken, en er worden didactische ontwerpprincipes voorgesteld voor het aanleren van historische begrippen. Realistische review-metho-dologie werd gebruikt om vier ontwerpprincipes te formuleren. De implicaties ervan voor het geschiedenisonderwijs worden aan de hand van voorbeelden geïllustreerd en bediscussieerd. Kernwoorden: historische begrippen, cognitieve ontwikkeling, instructiemethodes, curri-culumopbouw Wouter Smets (didacticus mens-en maatschappijvakken, Karel de Grote Hogeschool, Antwerpen) Introductie Historische begrippen aanleren als op-stap naar kritisch denken Kritisch leren denken is geen generieke competentie. Wanneer mensen kritisch denken dan gebruiken ze herkenbare pa-tronen of concepten uit hun voorkennis om oplossingen te zoeken voor
Full-text available
Online games can serve as research instruments to explore the effects of game design elements on motivation and learning. In our research, we manipulated the design of an online math game to investigate the effect of challenge on player motivation and learning. To test the "Inverted-U Hypothesis", which predicts that maximum game engagement will occur with moderate challenge, we produced two large-scale (10K and 70K subjects), multi-factor (2x3 and 2x9x8x4x25) online experiments. We found that, in almost all cases, subjects were more engaged and played longer when the game was easier, which seems to contradict the generality of the Inverted-U Hypothesis. Troublingly, we also found that the most engaging design conditions produced the slowest rates of learning. Based on our findings, we describe several design implications that may increase challenge-seeking in games, such as providing feedforward about the anticipated degree of challenge.
Full-text available
This article examines the effectiveness of a technology-based algebra curriculum in a wide variety of middle schools and high schools in seven states. Participating schools were matched into similar pairs and randomly assigned to either continue with the current algebra curriculum for 2 years or to adopt Cognitive Tutor Algebra I (CTAI), which uses a personalized, mastery-learning, blended-learning approach. Schools assigned to implement CTAI did so under conditions similar to schools that independently adopt it. Analysis of posttest outcomes on an algebra proficiency exam finds no effects in the first year of implementation, but finds evidence in support of positive effects in the second year. The estimated effect is statistically significant for high schools but not for middle schools; in both cases, the magnitude is sufficient to improve the median student's performance by approximately eight percentile points.
Full-text available
Many students are being left behind by an educational system that some people believe is in crisis. Improving educational outcomes will require efforts on many fronts, but a central premise of this monograph is that one part of a solution involves helping students to better regulate their learning through the use of effective learning techniques. Fortunately, cognitive and educational psychologists have been developing and evaluating easy-to-use learning techniques that could help students achieve their learning goals. In this monograph, we discuss 10 learning techniques in detail and offer recommendations about their relative utility. We selected techniques that were expected to be relatively easy to use and hence could be adopted by many students. Also, some techniques (e.g., highlighting and rereading) were selected because students report relying heavily on them, which makes it especially important to examine how well they work. The techniques include elaborative interrogation, self-explanation, summarization, highlighting (or underlining), the keyword mnemonic, imagery use for text learning, rereading, practice testing, distributed practice, and interleaved practice. To offer recommendations about the relative utility of these techniques, we evaluated whether their benefits generalize across four categories of variables: learning conditions, student characteristics, materials, and criterion tasks. Learning conditions include aspects of the learning environment in which the technique is implemented, such as whether a student studies alone or with a group. Student characteristics include variables such as age, ability, and level of prior knowledge. Materials vary from simple concepts to mathematical problems to complicated science texts. Criterion tasks include different outcome measures that are relevant to student achievement, such as those tapping memory, problem solving, and comprehension. We attempted to provide thorough reviews for each technique, so this monograph is rather lengthy. However, we also wrote the monograph in a modular fashion, so it is easy to use. In particular, each review is divided into the following sections: General description of the technique and why it should work How general are the effects of this technique? 2a. Learning conditions 2b. Student characteristics 2c. Materials 2d. Criterion tasks Effects in representative educational contexts Issues for implementation Overall assessment The review for each technique can be read independently of the others, and particular variables of interest can be easily compared across techniques. To foreshadow our final recommendations, the techniques vary widely with respect to their generalizability and promise for improving student learning. Practice testing and distributed practice received high utility assessments because they benefit learners of different ages and abilities and have been shown to boost students’ performance across many criterion tasks and even in educational contexts. Elaborative interrogation, self-explanation, and interleaved practice received moderate utility assessments. The benefits of these techniques do generalize across some variables, yet despite their promise, they fell short of a high utility assessment because the evidence for their efficacy is limited. For instance, elaborative interrogation and self-explanation have not been adequately evaluated in educational contexts, and the benefits of interleaving have just begun to be systematically explored, so the ultimate effectiveness of these techniques is currently unknown. Nevertheless, the techniques that received moderate-utility ratings show enough promise for us to recommend their use in appropriate situations, which we describe in detail within the review of each technique. Five techniques received a low utility assessment: summarization, highlighting, the keyword mnemonic, imagery use for text learning, and rereading. These techniques were rated as low utility for numerous reasons. Summarization and imagery use for text learning have been shown to help some students on some criterion tasks, yet the conditions under which these techniques produce benefits are limited, and much research is still needed to fully explore their overall effectiveness. The keyword mnemonic is difficult to implement in some contexts, and it appears to benefit students for a limited number of materials and for short retention intervals. Most students report rereading and highlighting, yet these techniques do not consistently boost students’ performance, so other techniques should be used in their place (e.g., practice testing instead of rereading). Our hope is that this monograph will foster improvements in student learning, not only by showcasing which learning techniques are likely to have the most generalizable effects but also by encouraging researchers to continue investigating the most promising techniques. Accordingly, in our closing remarks, we discuss some issues for how these techniques could be implemented by teachers and students, and we highlight directions for future research.
Full-text available
Although the "science of science communication" usually refers to the flow of scientific knowledge from scientists to the public, scientists direct most of their communications not to the public, but instead to other scientists in their field. This paper presents a case study on this understudied type of communication: within a discipline, among its practitioners. I argue that many of the contentious disagreements that exist today in the field in which I conduct my research-early science education-derive from a lack of operational definitions, such that when competing claims are made for the efficacy of one type of science instruction vs. another, the arguments are hopelessly disjointed. The aim of the paper is not to resolve the current claims and counterclaims about the most effective pedagogies in science education, but rather to note that the assessment of one approach vs. the other is all too often defended on the basis of strongly held beliefs, rather than on the results of replicable experiments, designed around operational definitions of the teaching methods being investigated. A detailed example of operational definitions from my own research on elementary school science instruction is provided. In addition, the paper addresses the issue of how casual use of labels-both within the discipline and when communicating with the public-may inadvertently "undo" the benefits of operational definitions.
Full-text available
Participants in 2 experiments interacted with computer simulations designed to foster understanding of scientific principles governing complex adaptive systems. The quality of participants' transportable understanding was measured by the amount of transfer between 2 simulations governed by the same principle. The perceptual con- creteness of the elements within the first simulation was manipulated. The elements either remained concrete throughout the simulation, remained idealized, or switched midway into the simulation from concrete to idealized or vice versa. Transfer was better when the appearance of the elements switched, consistent with theories pre- dicting more general schemas when the schemas are multiply instantiated. The best transfer was observed when originally concrete elements became idealized. These results are interpreted in terms of tradeoffs between grounded, concrete construals of simulations and more abstract, transportable construals. Progressive idealization ("concreteness fading") allows originally grounded and interpretable principles to become less tied to specific contexts and hence more transferable. Cognitive psychologists and educators have often debated the merits of concrete versus idealized materials for fostering scientific understanding. Should chemical molecules be represented by detailed, shaded, and realistically illuminated balls or by simple ball-and-stick figures? Should a medical illustration of a pancreas in- clude a meticulous rendering of the islets of Langerhans or convey in a more styl- ized manner the organ's general form? Our informal interviews with mycologists at the Royal Kew Gardens (personal communication, Brian Spooner and David Pegler, May 1998) indicate a schism between authors of mushroom field guides.
Full-text available
The authors present three studies (two randomized controlled experiments and one embedded quasi-experiment) designed to evaluate the impact of replacement units targeting student learning of advanced middle school mathematics. The studies evaluated the SimCalc approach, which integrates an interactive representational technology, paper curriculum, and teacher professional development. Each study addressed both replicability of findings and robustness across Texas settings, with varied teacher characteristics (backgrounds, knowledge, attitudes) and student characteristics (demo-graphics, levels of prior mathematics knowledge). Analyses revealed statistically significant main effects, with student-level effect sizes of .63, .50, and .56. These consistent gains support the conclusion that SimCalc is effective in enabling a wide variety of teachers in a diversity of settings to extend student learning to more advanced mathematics.
Full-text available
We argue herein that typical training procedures are far from optimal. The goat of training in real-world settings is, or should be, to support two aspects of posttraining performance: (a) the level of performance in the long term and (b) the capability to transfer that training to related tasks and altered contexts. The implicit or explicit assumption of those persons responsible for training is that the procedures that enhance performance and speed improvement during training will necessarily achieve these two goals. However, a variety of experiments on motor and verbal learning indicate that this assumption is often incorrect. Manipulations that maximize performance during training can be detrimental in the long term; conversely, manipulations that degrade the speed of acquisition can support the long-term goals of training. The fact that there are parallel findings in the motor and verbal domains suggests that principles of considerable generality can be deduced to upgrade training procedures.
Full-text available
Knowing how to manage one's own learning has become increasingly important in recent years, as both the need and the opportunities for individuals to learn on their own outside of formal classroom settings have grown. During that same period, however, research on learning, memory, and metacognitive processes has provided evidence that people often have a faulty mental model of how they learn and remember, making them prone to both misassessing and mismanaging their own learning. After a discussion of what learners need to understand in order to become effective stewards of their own learning, we first review research on what people believe about how they learn and then review research on how people's ongoing assessments of their own learning are influenced by current performance and the subjective sense of fluency. We conclude with a discussion of societal assumptions and attitudes that can be counterproductive in terms of individuals becoming maximally effective learners. Expected final online publication date for the Annual Review of Psychology Volume 64 is November 30, 2012. Please see for revised estimates.
Full-text available
Can cognitive research generate usable knowledge for elementary science instruction? Can issues raised by classroom practice drive the agenda of laboratory cognitive research? Answering yes to both questions, we advocate building a reciprocal interface between basic and applied research. We discuss five studies of the teaching, learning, and transfer of the “Control of Variables Strategy” in elementary school science. Beginning with investigations motivated by basic theoretical questions, we situate subsequent inquiries within authentic educational debates—contrasting hands-on manipulation of physical and virtual materials, evaluating direct instruction and discovery learning, replicating training methods in classroom, and narrowing science achievement gaps. We urge research programs to integrate basic research in “pure” laboratories with field work in “messy” classrooms. Finally, we suggest that those engaged in discussions about implications and applications of educational research focus on clearly defined instructional methods and procedures, rather than vague labels and outmoded “-isms.”
The issues of skill specificity and transfer of training were examined from an aptitude–treatment interaction approach. The current investigations extended A. M. Sullivan's (1964) approach by using a procedural transfer task and training conditions that differed in amount of training task practice and the degree of training task similarity to the transfer task. Two experiments were conducted with 232 college students. Experiment 1 examined the effects of a length-of-training manipulation on reasoning ability and transfer task performance relationships, and on the amount of transfer. Experiment 2 evaluated the effects of 2 training tasks that differed in terms of similarity to the transfer task on ability-performance relationships and the amount of transfer. Results suggest that Sullivan's approach partially generalizes to the acquisition of procedural knowledge. (PsycINFO Database Record (c) 2012 APA, all rights reserved)