ArticlePDF Available

Abstract and Figures

Because science is a modeling enterprise, a key question for educators is: What kind of repertoire can initiate students into the practice of generating, revising, and critiquing models of the natural world? Based on our 20 years of work with teachers and students, we nominate variability as a set of connected key ideas that bridge mathematics and science and are fundamental for equipping youngsters for the posing and pursuit of questions about science. Accordingly, we describe a sequence for helping young students begin to reason productively about variability. Students first participate in random processes, such as repeated measure of a person’s outstretched arms, that generate variable outcomes. Importantly, these processes have readily discernable sources of variability, so that relations between alterations in processes and changes in the collection of outcomes can be easily established and interpreted by young students. Following these initial steps, students invent and critique ways of visualizing and measuring distributions of the outcomes of these processes. Visualization and measure of variability are then employed as conceptual supports for modeling chance variation in components of the processes. Ultimately, students reimagine samples and inference in ways that support reasoning about variability in natural systems.
Content may be subject to copyright.
Bulletin of Mathematical Biology (2020) 82:106
Getting a Grip on Variability
Richard Lehrer1·Leona Schauble1·Panchompoo Wisittanawat1
Received: 27 March 2020 / Accepted: 17 July 2020
© Society for Mathematical Biology 2020
Because science is a modeling enterprise, a key question for educators is: What kind of
repertoire can initiate students into the practice of generating, revising, and critiquing
models of the natural world? Based on our 20 years of work with teachers and students,
we nominate variability as a set of connected key ideas that bridge mathematics and
science and are fundamental for equipping youngsters for the posing and pursuit of
questions about science. Accordingly, we describe a sequence for helping young
students begin to reason productively about variability. Students first participate in
random processes, such as repeated measure of a person’s outstretched arms, that
generate variable outcomes. Importantly, these processes have readily discernable
sources of variability, so that relations between alterations in processes and changes in
the collection of outcomes can be easily established and interpreted by young students.
Following these initial steps, students invent and critique ways of visualizing and
measuring distributions of the outcomes of these processes. Visualization and measure
of variability are then employed as conceptual supports for modeling chance variation
in components of the processes. Ultimately, students reimagine samples and inference
in ways that support reasoning about variability in natural systems.
Keywords Education ·Modeling variability ·Random processes ·Sampling
variability ·Informal inference
Given the diversity of the sciences and the many goals and forms of scientific practice,
science education needs to be driven by a coherent vision of what is most worthwhile
for students to learn. Moreover, that vision not only should encompass ultimate goals,
but must also include an account of how instruction from the earliest grades can provide
a strong foundation for ideas that may not come to fruition until years later. We start
from the premise that science is inherently a modeling enterprise (Nersessian 2008;
National Research Council 2012; Windschitl et al. 2008), a premise that leads to the
question: What kind of conceptual repertoire best initiates students into the modeling
game (Hestenes 1982)? We have been pursuing this question for the past 20 years in
BRichard Lehrer
1Vanderbilt University, Nashville, USA
0123456789().: V,-vol 123
106 Page 2 of 26 R. Lehrer et al.
partnership with participating school districts in three different states. Elsewhere we
describe how we introduce children to inventing and revising models as conceptual
means for understanding the functioning of natural systems (Lehrer and Schauble
2019). Here, we focus attention on a set of conceptual core ideas that have proven to
be especially powerful in children’s developing modeling repertoire. In particular, we
seek to describe and illustrate how variability and the associated idea of uncertainty
can equip even young students to more deeply understand the world.
Characterizing and accounting for the variability of a system are often consequen-
tial for theory development and model building, especially in the biological sciences.
For instance, evolutionary accounts of diversity require a firm grasp of distinctions
between directed and random variation in populations of organisms. Understanding
many ecosystem processes is accomplished by sampling in space and time, which,
in addition to practical matters of instrumentation, measures, and methods, is also
distinguished by variability, both within and between samples (Lehrer and Schauble
2017). Consequently, interpretations based on samples should account for sampling
variability. Understanding random sampling variability hinges on a hierarchical image
of sample in which any sample is simultaneously composed of outcomes that vary and
is also a member of an imagined collection that varies from sample to sample (Thomp-
son et al. 2007). Of course, these ideas are prominent in other scientific domains as
well and are key in both observational and experimental methods. The ubiquity and
prominence of variability in scientific inquiry suggest that students should get some
mental grip on the challenges of variability, such as a hierarchical image of sample,
during the course of schooling. Yet, developing a conceptual and practical grasp of
variability is often challenging, even for students at university level.
How, then, is it possible to provide comparatively young students with an entrée to
some of the conceptual foundations needed to reason productively about variability?
We illustrate supports for learning that emphasize student participation in visualiz-
ing, measuring, and modeling variability. These practices are accessible to students
throughout schooling, and they can provide a good foundation for developing concep-
tions of variability and uncertainty. To illustrate what this kind of instruction looks
like and to explicate the ideas about variability that students can grasp, we describe
a sequence of instruction with grade 5/6 (ages 10–11) students and report the typical
forms of student reasoning that we observed. First, we outline theoretical orientations
that guided the design of instruction. Then, we explain how we staged student partici-
pation in practices of visualizing and measuring variability and describe what students
tended to learn about variability by participating in these practices. After doing so,
we zoom in on the main empirical focus of this report, demonstrating how students’
invention, revision, and contest of models of random variability helped them develop
more disciplined views of sample and inference. As we will describe, our inquiry cen-
ters on how students tended to reason about inference in light of uncertainty, including
their image of sample and their aesthetic preferences related to forms of modeling.
Getting a Grip on Variability Page 3 of 26 106
1 Instructional Design
Our overall goal was to help comparatively young students encounter some of the
epistemic issues involved in professional inquiry about and warrant of claims about
variability, especially under conditions of uncertainty (Pfannkuch and Wild 2000).
For instance, how do particular mathematical choices about ways to construct data
visualization result in different shapes of the data? In light of sample-to-sample vari-
ation, what does a difference between samples suggest about a claim? The design
of instruction was informed by guiding principles (Bakker 2018) that influenced our
generation and selection of tasks, tools, and formats of classroom activity. However,
guiding principles are not scripts. As in any instructional endeavor, our choices were
also influenced by the accessibility of materials and tools in everyday classrooms,
teacher preferences, and related contingencies that often could not be predicted in
advance, and therefore, were generated during implementation.
Learning by Participating in Approximations of Professional Practice
The most fundamental principle that guides our instruction is that students should
receive opportunities to be inducted into forms of activity that STEM fields employ to
generate, contest, and revise knowledge (National Research Council 2012). Modeling
renders student thinking public, visible, and testable and, hence, subject to critique
and revision by other members of the community. Students also experience that even
when models are provisionally accepted, they remain open-ended; today’s solutions
become grist for tomorrow’s challenges. When students become engaged in the inven-
tion and critique of models, they experience a tension that also confronts scientific
professionals—a tension between individual and collective work (Ford 2015;Ford
and Forman 2006). That is, individuals innovate in anticipation of critique by the col-
lective. The intelligibility of the work that individuals conduct depends on their ability
to align their activity with the forms of activity taken as normative by other members
of the community who are trying to accomplish similar goals. Professional critique
relies on collective consensus, however imperfectly achieved. And critique itself is
not static, either—occasionally, what individuals invent will reset the norms for what
the community takes as valued.
This perspective on the development of knowledge demands a classroom ecology
that has particular characteristics. We do not expect students to copy what professionals
do, because they do not live in professional cultures. Nonetheless, there are key aspects
of disciplinary epistemology that we consider critical to carry over into classrooms.
First, students should have firsthand experience with the productive tension between
innovation and critique. Inventing models helps students directly experience some of
the challenges that practitioners confront. Critiquing, on the other hand, confronts
students with other perspectives on those challenges. The result might be revisions to
one’s model, one’s reasoning, or even the norms one accepts for appropriate forms
of activity. The models that the field considers conventional or canonical can come
to be understood by students as perspectives that others in the discipline eventually
developed as they confronted challenges like those the students experienced (Jones
et al. 2017).
106 Page 4 of 26 R. Lehrer et al.
Second, the duration and nature of participation must be sufficient so that students
grasp the relationships among the apparently discrete activities they are pursuing.
Another way of saying this is that students must come to see how the particulars of
their activities make sense cumulatively in light of their overarching goal of generat-
ing and critiquing knowledge. Often, science involves the conduct of extended chains
of reasoning that are constructed over time as one engages in related activities, such
as refining questions, designing measures, collecting data, displaying it, and drawing
interpretations. Yet, students do not automatically grasp these relations as activities are
encountered. They need sufficient time and support to come to see how otherwise dis-
tinct performances are blended to accomplish greater conceptual articulation (Lehrer
and English 2018). For example, as we will see shortly, students’ initial practices of
visualizing and measuring variability can later be employed as conceptual tools for
developing and testing models of variability.
These concerns guide the approach that our participating teachers adopt in their
instruction, and this approach to instruction will become visible as we begin to describe
the conditions that we collectively arranged as students were introduced to ideas about
variability. First, however, we briefly describe the sequence of concepts as students
encountered them, to provide a sense of how later ideas built on earlier experiences.
Participating in Variability-Generating Processes
Instruction about variability often leads with preselected samples or involves stu-
dents with samples of natural variability, such as the heights of students in their
classroom. Yet, statistical conceptions characterize samples as representatives of pop-
ulation processes. Treatments of probability also rely on an image of a long-term,
random process. For example, a relative frequency of a target event is considered to
inform the probability of its occurrence over repeated trials. Unfortunately, students
typically understand probabilities in terms of specific, single-occasion outcomes. For
instance, an estimate of the probability of rainfall on a particular day is interpreted
as meaning that it will or will not rain that day (Konold 1989). Our focus in instruc-
tion, therefore, is to help students understand that statistical approaches to sample and
chance rely on assumptions about long-term process (Saldanha and Thompson 2002,
2014). This is a challenging concept that typically eludes students, partly because it
is based on reading into a collection of cases to conceive of the collection as a sam-
ple. For this reason, instructional design needs to make stochastic process visible and
tangible to students, so that a sample comes to be regarded as one scoop taken from a
river of ongoing outcomes. To make this conception more concrete, students should
experience how a sample emerges from a repeated process.
We have learned that an effective way of providing students with entrée to the
generation of random outcomes is to begin by exposing them to processes that are
characterized by clear signal and readily interpreted sources of noise (error). Contexts
in which students can readily propose causes associated with signal and noise help
students conceive of an outcome as composed of the contribution of each (Konold
and Lehrer 2008; Petrosino et al. 2003). This helps students envision how distribution,
a collective property, can arise from the repetition of a process (Konold and Lehrer
op cit.). As we will describe shortly, we began instruction first by enacting processes
in which both signal and noise were readily interpretable. Experiences with these
Getting a Grip on Variability Page 5 of 26 106
processes, coupled with investigation of the behavior of random devices, served as a
bridge to natural variability, which students typically find more difficult to understand
(Lehrer and Schauble 2007). Having outlined this general framework, we next turn to
sixth graders and describe how they typically progress as they learn to use statistical
practices in modeling contexts.
2 Visualizing Variability
Each student is asked to measure the same attribute, such as the arm-span of the teacher,
first with a 15-cm ruler and then again, with a meter stick. In this process of repeated
measurement, the signal is the actual length of the teacher’s arm-span and readily
distinguishable sources of error include small slips such as gaps or overlaps that occur
as the student iterates the ruler and meter stick. Each student produces a measure
with both tools. (To maintain the assumption of independence of trials, we ensure
that students do not yet know the value found by other measurers as they conduct
their own measures). To the surprise of the students, the collection of measures that
emerges from different individual repetitions of the measurement process has a new
property—variability. Sources of error, such as those due to iteration, help students
account for the variability they observe, and the tendency of the measurements to
cluster roughly symmetrically about a center is readily attributed to a fixed portion of
the process—the “true” measure of the length. A switch in process—when the same
measurers use a meter stick—results in less variability, which can be readily attributed
to less error in iteration. A critical feature of this context is that changes in variability
are easily interpreted as causal.
To make these ideas visible to students, we encourage them to work together in
small groups to invent displays of the class collection of measures of the teacher’s
arm-span (Petrosino et al. 2003). Directions are deliberately left ambiguous. Students
are told that their visualizations should show “some pattern or something that you
notice about the measurement,” and the sole constraint mentioned to children is that
the display should be constructed to communicate, for example, “….so that someone
else just walking into the classroom could look at your display and see what you
By this point in their schooling, students typically have already experienced canon-
ical forms of display, such as bar charts and line plots, but we find that these are rarely
invoked. Instead, the inventions of small groups reliably reflect a range of orientations
to displaying the collection of measurements (Konold et al. 2015). Some students tend
to be oriented to cases, as reflected by displays that highlight individual cases, such
as those displayed in Fig. 1a, b. In Fig. 1a, students have displayed the collection as
an ordered list, and they took advantage of the plane to create a curve. In Fig. 1b,
each case value is ordered and represented as a length. Although the measurements
are ordered in magnitude, students who focus on case values often neglect to order the
Other students tend to classify data, generating intervals of values, as in Fig. 1c.
In Fig. 1c, the intervals are ordered by frequency, although this is tacit in the display.
106 Page 6 of 26 R. Lehrer et al.
Fig. 1 aAscending list
representation, bcase value plot,
cintervals of similar values,
dscale of measure and
frequency (color figure online)
Attention to frequency is often motivated by expecting that repetition of the same value
or nearby values indicates proximity to the signal value of the process. In Fig. 1c, the
intervals are uniform, but students often employ intervals of different sizes. Also in
Fig. 1c, adjacent intervals do not reflect the scale of the measure, so holes or intervals
Getting a Grip on Variability Page 7 of 26 106
without values are not visible. Many students tend to leave out these gaps in the data,
assuming that lack of data means there is no information to communicate.
Finally, some students tend to reason about the aggregate and employ the measure-
ment scale along with frequency to display the data. These solutions less frequently
occur (Lehrer et al. 2014). Other, more whimsical strategies also sometimes appear,
usually because students are deliberately trying to create artistically unique construc-
tions. However, we find that the overwhelming majority of students’ invented displays
reflect the range that we described, from case value to aggregate.
The student invention process is followed by whole-class critique. A student who
was not the inventor of one of the displays uses that display to infer what the inventor
seemed to notice about the collection of measured values. This is referred to as decid-
ing “what the display shows and hides” about the measurements. Conversations like
these provoke students to notice the aspects that are highlighted or backgrounded by
the display in question and invite discussion about how mathematical concepts such
as count (frequency), order, interval, and scale result in particular visualizations of the
data. Teachers seek to help students understand that the shape and other visual qualities
of the display are not inevitably inherent in the data, but rather, result from choices
that designers make, especially the mathematical concepts they employ to highlight
some characteristics while de-emphasizing others. These critique sessions also seem
to attune students to issues of personal choice and aesthetics. For example, one-sixth-
grade student related how she became “addicted to frequency” after initially puzzling
about how other students had created displays that looked very different from the
case-value display that her team had produced (Shinohara and Lehrer 2017). Teachers
support these critiques by encouraging students to take on perspectives other than
their own, by asking students to trace a set of values from one display into another
to emphasize how the shape of the subset changes with shifts in the conceptual tools
employed to visualize them, and by posing questions or highlighting student contri-
butions about what would happen to the shape of the data under different imagined
scenarios (e.g., if more measurements were closer to the center value, if there were
more extreme values). Teachers ask students to relate the shapes of the data displays
to the data-generating process: Why are some displays (nearly) symmetric? Why is
there a central tendency? What could be producing different values of measure, given
that the object being measured is a constant length? What could happen if the process
were repeated (for example, if the same object were measured by other students in the
In summary, students are introduced to visualizing distribution by inventing rep-
resentations of the outcomes of a firsthand, repeated process: measuring the same
attribute of an object. Invented representations are compared to discern how the mathe-
matical concepts of order, frequency, interval, and measurement scale result in different
shapes for the same collection of values. As instruction proceeds, students’ ways of
generating visualizations of data stabilize. Nonetheless, as with statisticians, there are
always elements of personal preference when students decide what to highlight and
background in a data visualization.
106 Page 8 of 26 R. Lehrer et al.
3 Statistics as Measures of Distribution
Students are introduced to statistics as measures of distribution by again inventing and
critiquing, albeit with the emphasis now on developing methods to arrive at estimates
of the true length (or other attribute measured) and of the tendency of the measured
values to agree (variability). The former corresponds to statistics of center and the
latter to statistics of variance. During critique, students who did not create the method
try to employ it to obtain a measure, with an eye again on what the invented methods
attend to about the collection of values. Students typically rely on visualizations of the
data that either they or others have previously developed to guide their construction
of measures (Lehrer et al. 2011).
Invented measures of center usually include some of the canonical approaches, such
as the most frequently occurring value (mode) or the middle-ranked value (median).
The mean is not usually generated unless students have already learned about it.
Other, less conventional solutions, such as the mid-range, are sometimes proposed.
Student-invented methods for measure of variability also tend to bear some relation
to canonical measures. For instance, some student methods emphasize deviation from
a central value, such as the sample median. Others rely on counts of cases in regions
that define a center clump of data, rather like a conventional interquartile range, or
on the difference between extreme case values of the distribution, a kind of range
statistic. Most student solutions bear some thematic relation to these classes of con-
ventional solution. Unsurprisingly, no student has ever created an area model, as in
the conventional definition of sample variance with squared deviations.
During class critique of these inventions, values of clarity and robustness often
emerge. Students who are not authors struggle to follow the methods invented by
peers and to imagine transformations of the distribution that will prove problematic
for the invented measure (Lehrer and Kim 2009). For example, one student solution
employed the sum of deviations from the median, but this measure did not result in
values that reflected the changes in the variability that students observed when they
switched from using the 15-cm ruler to the meter stick. The authors of the measure
responded to this critique by revising their method so that it now employed the absolute
value of deviations. This new solution reflected their intuition that the direction of
difference from the median did not really matter. After this modification, a teacher
asked what would happen if there were an increase in the sample size (e.g., if there
were more measures) taken only with the meter stick. (The number of measures with
the ruler would remain the same.) The invented statistic of variability again failed to
conform to students’ expectation that there would be very different variabilities in the
distributions of values for the meter stick vs the ruler. Eventually, the class settled
on a “per” solution that involved dividing the sum of deviations by the number of
measures. Now the values of the invented statistic once again corresponded to the
state of variability that the displays seemed to communicate.
Other student critiques that we commonly hear are sensitive to the context of
repeated measures. For instance, many students criticize the range statistic for its
reliance on the “two worst measurers.” Teachers again play an instrumental role by
inviting students to consider issues such as: What it is about a distribution that is indi-
cated by a statistic, whether values of a statistic change in sensible ways with observed
Getting a Grip on Variability Page 9 of 26 106
changes in the shape of a distribution, and whether the statistic is robust in the sense
of generalizing to new imagined distributions.
Both invention and critique are critical to inducting students into the mathematical
practice of measuring characteristics of distributions. Although statistics of central
tendency and variability are usually taught directly, students often fail to understand
the foundation for these measures. Accordingly, statistics come to be regarded as
calculations to be performed routinely on batches of data. In contrast, the invention-
and-critique routine invites students to grapple with the characteristics of distributions
that are measured by statistics. Once students have done so, we introduce conventional
measures, such as the mean, interquartile range, and average deviation. But students’
experiences with trying to generate their own statistics help them regard these canon-
ical measures as solutions to the problem of describing properties of distribution as
quantities. These quantities are regarded as supplementing what is communicated by
the visualizations of the distribution.
To continue the development of visualization and measure of distribution, we intro-
duce a new signal and noise process, based on contexts of production (Konold and
Harradine 2014). For example, students attempt to manufacture “candies” of a standard
size out of clay. They employ different methods of fabrication (such as a machined
form for extrusion vs hand crafting) (Fig. 2a). Or, students might use different methods
for producing packages intended to hold a standard number of toothpicks. Manufac-
turing processes like these provide another example of a signal and noise process.
Here, measures of central tendency are interpreted as target values of the process
(e.g., number of toothpicks in each package), and measures of variability indicate the
consistency of the products generated (e.g., variability in the number or mass of tooth-
picks across packages). These shifts in context for interpreting statistics help students
understand that the same measures of distribution can be interpreted differently when
the long-term generating process changes. Grasping this commonality is an example
of mathematical generalization, in this case, extending the meaning of measures of
Students use a digital tool, TinkerPlots (Konold 2007; Konold and Miller 2011),
to structure their data in ways that they have previously invented with paper and
pen. The software tool provides greater flexibility and efficiency, now that students
have had the opportunity to recognize some of the challenges of visualization that
TinkerPlots resolves. For example, the tool has a drag operation that renders the batch
of measurements continuously, so that gaps in the data are immediately visible. Recall
that the value of this form of rendering is not always appreciated by students in
their initial inventions of data visualizations. TinkerPlots can also be employed to
generate summary statistics of central tendency and variability. Now these statistics
are interpreted as measures of the target value of the process (e.g., the target value of
each package) and of the consistency of product (how much does the mass of each box
vary?). Figure 2b displays one of these comparisons between methods of production.
Teachers often initiate this context by introducing commercial standards that gov-
ern the production of consumables. For instance, state regulations require that the
average mass of a sample of products is the target mass displayed on each package.
No single package may deviate by more than five percent from this target value. The
106 Page 10 of 26 R. Lehrer et al.
Fig. 2 aProduction of Playdough “Fruit Bars,” bTinkerPlots visualization of different methods of production
(color figure online)
notion of quality control further contextualizes practices of visualizing and measuring
4 Investigating Chance
Next teachers introduce the role of chance in signal and noise processes. These are
not easy ideas for students to grasp. Their initial reasoning about probability is often
based on single outcomes rather than repetition of a long-term process. Students also
typically believe that random implies a complete absence of structure, rather than an
inability to predict the outcome of any particular trial (Metz 1998). When they are
deciding whether it is legitimate to accumulate outcomes, students often accept trials
that are governed by different probabilities as equivalent, reasoning that because the
Getting a Grip on Variability Page 11 of 26 106
outcomes in each trial are “random,” they can be treated as sufficiently alike (Horvath
and Lehrer 1998). For example, some students will happily accumulate the outcome
of a vigorous throw of a die with one in which the thrower attempts and occasionally
succeeds in adjusting the throw to achieve a predetermined outcome. They argue
that this is acceptable because the biased throw is not always successful. These prior
conceptions and others like them suggest that students also need experiences with
chance and probability that involve stochastic processes.
Therefore, we involve students in investigating the behavior of simple random
devices to develop stronger intuitions about the meanings of chance as a prelude to
introducing chance as a model of the variability they have observed in the measure
and production contexts. Teachers initiate this process with simple, hand-held, two-
color (e.g., red, green) equal-area spinners. We ask each student to predict the result
of applying hand force to the needle of a spinner ten times. Students also are asked to
predict whether the direction of spin (clockwise or counterclockwise) will matter.
Typical student predictions include that exactly five of each color will result, that
nine of their favorite color will result, that direction of spin does matter, and that
the starting color will influence the final outcome. But as students experiment with
spinners, they notice that not all of their predictions are sustained. We aggregate batches
of ten spins, thus growing the sample, and ask students what they notice (Horvath and
Lehrer 1998). The aim is to help students see that although they cannot predict any
particular outcome with confidence, the longer-term pattern of outcomes is predictable
and reflects the structure of the spinner. Noticing that changing the direction of spin
does not affect the pattern of outcomes makes the concept of trial more visible and
invites more careful consideration of what constitutes a repetition of a chance process.
We do not recommend shortchanging the work with physical spinners in favor of
computational simulations. Many of children’s intuitions about chance are rooted in
ideas about their own effort, so it is important to provide time for some of these ideas to
lose their persuasiveness. But once students have conducted a series of investigations
with spinners, we introduce TinkerPlots’ random generators, which include a spinner
representation, as analogs of the processes that students have just experienced phys-
ically. We explain to students that these random generators work like their spinners,
but are faster and include built-in records of outcomes. We again explore two-region
spinners but now encourage students to design regions with different areas. Students
design “mystery” spinners with invisible partitions, and a partner tries to infer the
structure of the spinner from repetitions of its behavior. Students find counterparts of
their previous experiences with signal and noise in the structure of the spinner (signal)
and chance departures (noise). Figure 3shows a mystery spinner and a display of the
outcomes of ten repetitions of its spins.
As students become familiar with the operation of spinners, we introduce the idea
of using a statistic, such as percent red, to summarize the outcomes of a sample,
as indicated in Fig. 3. Doing so sets the stage for investigating the properties of an
empirical sampling distribution. For instance, students investigate the distribution of
sample statistics over repeated experiments (e.g., 300 repetitions) and with different
sample sizes (e.g., 10, 100). A result of one such investigation for a 60–40 spinner
is displayed in Fig. 4a, b. Students redeploy their knowledge of statistics to indicate
characteristics of the sampling distribution. They compare the central tendency and
106 Page 12 of 26 R. Lehrer et al.
Fig. 3 Mystery spinner and outcomes of ten repetitions (color figure online)
variability of the two sampling distributions by finding the percentage of samples
that lies between two values of the statistic, percent red. Teachers again play a key
supporting role, asking students to relate a sample statistic to particular outcomes in
a sample (e.g., 60% red means that six of the ten outcomes were red), to consider
the meaning of particular cases (e.g., a dot in the display represents the percent red
obtained in one sample of size 10 or 100), and to explain why the center remains
invariant but the variability changes. Students typically relate the invariance of center
to the unchanging structure of the spinner—a signal. But accounting for shifts in
variability is more challenging. For example, one student explained that the chances
of a statistic of 20% red for a 60% red spinner was low for a small sample (10 spins),
but if one thought about it for a larger sample (size 100), then this low probability
would now have to be repeated an additional nine times to obtain a sample of size 100
consisting of 20% red outcomes. Although this explanation parses trials that constitute
the sample of 100 in ways that contravene conventional ways of thinking about this
probability, nonetheless it does provide purchase for the student on a challenging idea
(Lehrer 2017).
Getting a Grip on Variability Page 13 of 26 106
Fig. 4 aEmpirical sampling distribution from 300 samples with sample size 10, bempirical sampling
distribution from 300 samples with sample size 100 (color figure online)
In sum, as students investigate the behavior of chance devices, they begin to develop
an appreciation of chance that is based on repetition of a process and probability as an
estimate of the signal produced by a chance process. They explain effects of sample
106 Page 14 of 26 R. Lehrer et al.
size on statistics of center and variability of a sampling distribution as reflecting the
operation of signal, which tends to produce an invariant center, and change in variability
due to the role of larger sample sizes in reducing noise around the signal. As they think
about samples nested within sampling distributions, they begin to recast single samples
as outcomes of a particular segment of an ongoing process—and simultaneously, as
one of many such segments. This image of sample is further developed and stabilized
as students begin to model variability.
5 Modeling Variability
After students have received extended experience with more routine practices of visu-
alizing and measuring variability of sample and sampling distributions, including those
generated by random devices, we introduce them to modeling variability. As in earlier
activities, students innovate by inventing and revising models and participate in cri-
tique by judging the rationale and behavior of the models invented by other students.
Students re-envision the variability in repeated measure and production processes they
have experienced firsthand as being due to the composition of fixed signal and random
error. For example, Fig. 5shows a student model that accounts for the variability in
their class sample of repeated measures of their teacher’s arm-span. The outcome of
the first device is the sample median, intended to estimate (the “best guess of”) the true
value of her arm-span. The second random device designates magnitudes and prob-
abilities (areas of the spinner) of “gaps” that occurred when the student iterated the
ruler. Smaller magnitudes of gaps are judged as more likely than larger magnitudes.
The magnitudes are assigned a negative value because gaps are unmeasured space that
result in underestimates of the measure. The third random device designates magni-
tudes and probabilities of “overlaps” that occur when the endpoint of one iteration
of the ruler overlaps with the starting point of the next iteration. The magnitudes are
assigned a positive value reflecting that this form of error creates over-estimates of
the measure, because the same space is counted more than once. The fourth random
device represents both under- and over-estimates that result from the teacher becoming
fatigued and repositioning her outstretched arms, an attribute that the students termed
“slack.” The last device represents probabilities and magnitudes of mis-calculations.
The outcomes from each device are added to generate a simulated measurement value,
and the process is repeated 30 times to represent a classroom of student-measurers
and other participants (a few adult volunteers).
Following these and other challenges in model invention and revision, students
participate in a psychophysics experiment. They mark the midpoint of the equal-
length line segments portrayed in Fig. 6. The arrows at the ends of the line segments
reliably bias the judgment of midpoint, and the challenge is to determine whether or
not students in the class can overcome the effects of the (Mueller) illusion. The sixth
graders, of course, are usually convinced that they can do so. To establish a baseline
for inference, students first model the individual differences in an unbiased condition
(the same length line segments are presented without any arrows), and in light of
this model, make an inference about whether or not the class, on average, has indeed
overcome the illusion.
Getting a Grip on Variability Page 15 of 26 106
Fig. 5 Student model of fixed and random components of repeated measure of their teacher’s arm-span
(color figure online)
Fig. 6 Equal-length segments
and the Mueller illusion
The Mueller illusion establishes a transition to natural variation. This shift is next
continued by asking students to consider data from a plant laboratory where there was
a concern about whether the day of planting was influencing the growth of batches of
plants. Given standardized conditions of growth in the laboratory, the day of growth
should not matter, and indications that growth was dependent on day of planting would
necessitate careful scrutiny of standard conditions. Students are enlisted as “consul-
tants” to the laboratory director, and they develop models that simulate outcomes that
can be attributed to chance to make an inference about whether or not the day of
planting matters. Modeling and inference now require envisioning plant growth as a
repeated process, albeit in the case of natural variation, the process is governed by
variables that are hidden from students’ view.
The teacher again plays a key role by asking students to explain how their beliefs
about the process are reflected in their model, to use their model to trace how one
simulated value is obtained, to consider what will happen if they run their model
again, and to decide how they know whether their model is “good” (Wisittanawat and
Lehrer 2018). During their participation in this sequence of model eliciting activities,
students’ judgments about the quality of their models increasingly incorporate model-
based sampling distributions of model parameters, such as the median of the simulated
sample generated by the model and the interquartile range as an indicator of sample
variability. Models considered good are those in which the values of the statistics of
the empirical sample fit within the centers of the sampling distribution of the statistics
generated by the model. For instance, the value of the IQR in the class sample is within
the center clump of the simulated sample IQRs generated by the model.
To illustrate how engaging in modeling influences students’ perspectives on sample
and inference in light of uncertainty (Ben-Zvi et al. 2012; Makar and Rubin 2017),
106 Page 16 of 26 R. Lehrer et al.
we next present episodes from a public school sixth-grade classroom (the families of
most students qualified for lower cost or free lunch).
Constructing Models
As noted, model construction was supported by students’ participation in
variability-generating processes, partly because processes involving signal and noise
were intelligible to students and thus supported their reasoning about correspondences
between the structure of a model and the process. For example, two students measured
the perimeter of a table and discussed potential resulting sources of error. They agreed
that ruler iteration error, “gaps and laps, should be included in their model, and then
went on to consider other sources of error (student names are pseudonyms).
Brianna: And then we also have, like, calculation errors and false starts.
Cameron: How would we graph—, I mean, what is a false start, anyway?
Brianna: Like you have the ruler, but you start at the ruler edge, but the ruler might
be a little bit after it, so you get, like, half a centimeter off.
Cameron: So, then it would not be 33, it’d be 16.5, because it’d be half a centimeter
Brianna: Yeah, it might be a whole one, because on the ruler that we had, there was
half a centimeter on one side, and half a centimeter on the other side, so it
might be 33 still, and I think we subtract 33.
Cameron: Yeah, because if you get a false start, you’re gonna miss.
This example illustrates how making sense of sources of error (e.g., what is a false
start?) with which participants had firsthand experience assisted generation of com-
ponents of the model, typically resulting in variations of models like the one depicted
in Fig. 5.
Possible Values and the Development of a Hierarchical Image of Sample
One of the affordances of models like those depicted in Fig. 5is that with repeated
runs of the model, students observed that the simulation sometimes generated out-
comes that were consistent with the variability-generating process, but were not case
values in the empirical sample. For example, we next see students contesting the design
a model of a fictitious production process that involved outputs of cookies of varying
diameters. The argument concerned whether the model could include an output, a
cookie diameter of 9 cm, that was not represented in the empirical sample of cookies.
Students: Take away 1 in the spinner
Teacher: Why?
Joash: Because there’s no 9 in this [the original sample].
Garth: Yeah, but that doesn’t mean 9 is impossible.
Several students urged the inventors of the model to revise it so that magnitudes
and probabilities of overshooting or undershooting the targeted diameter (the errors)
would generate only simulated values that also appeared in the empirical sample.
Garth (an inventor of the model under scrutiny) and a few other students argued that
the simulated value under contest was plausible—it could have been produced, but
by chance, it wasn’t. The model “focused on the probability of messing up, so to the
extent to which error magnitudes and probabilities were plausible, one would have to
accept simulated values generated by the model as possible values.
Getting a Grip on Variability Page 17 of 26 106
“Possible values” eventually assumed increasing prominence with students, to a
point where they began to consider empirical samples as simultaneously a collection
of outcomes observed in the world and a member of a potentially infinite collection
of samples. We see another discussion of this relationship as students modeled the
first, “no illusion” condition of the eye illusion experiment. The students were trying
to account for individual difference variability and a remarkably accurate central ten-
dency (a median of 100u estimated as the midpoint of a 200u line). During critique,
the class considered a student invention that treated the individual differences as noise.
Once again, this invented model-generated values that were not present in the empir-
ical sample. Seizing on this, the teacher highlighted the generation of one such value
(95) and asked whether it was a flaw in the model that the value did not exist in the
sample. In response, one of the model inventors appealed to the plausibility of the
value as a possible value (of the random process), a position that was ratified by many
of his classmates.
Teacher: Okay, but there is a gap at 95, and they have a 5, which means they can
James: Because it’s possible. There’s a chance.
Teacher: But it didn’t happen when we really did it, so it shouldn’t be up there,
should it?
Students: It should, because it’s still possible.
The teacher playfully continued to push her objection in an attempt to elicit further
Teacher: Let’s vote. Who thinks they should not have 5 up there? Who’s with
me? [A few students raised their hands.] Because it didn’t happen?
Student: So?
Abe: It’s possible.
Savannah: It’s still possible.
Student: It’s plausible.
Student: Just because it didn’t happen–
Teacher: Based on the data, it’s impossible. It didn’t happen [pointing at the empty
spot at 95 in the observed sample].
Savannah: Well, it’s between the two numbers that did happen, so it’s possible.
At this point in the conversation, a student framed the empirical sample as just one
realization of the variability generating process instantiated by the model:
Brianna: That’s just one data set.
Teacher: Oh, so Brianna said this is just one data set.
Students: Yes.
Teacher: So, Brianna, why do I care that that’s just one data set? What does that tell
Brianna: Because not everyone is gonna get every single number. If we run it more
times, there’re more possibilities.
Teacher: What do you mean run it more times? Run the model?
Brianna: Well, if we were to do the line more times, again and again, if Mr. B’s class
does it, then it will be–
106 Page 18 of 26 R. Lehrer et al.
Fig. 7 A resampling model of cookie production (color figure online)
James: Like if we do the line again.
Teacher: Oh, you mean if I collect the real data again, it’s possible to get 95, so since
it’s possible for it to really happen in real life, my model might want to
represent that happening?
Students: Yes.
Alternative Perspectives: Resampling Models
Although the majority of student pairs constructed signal–noise models through
analysis of sources of variability, some teams adopted an alternative perspective of
resampling the empirical sample to account for the outcomes of a production, as
suggested earlier by their objections to models that generated values not included in
the empirical sample. For example, Fig. 7displays the model constructed by a pair who
generated signal and random error in a way that reflected the proportion of outcomes
in a fictitious empirical sample of cookies.
This model provoked immediate contest, with some students objecting on the basis
of the model’s exclusive reliance on the empirical sample, which ruled out other
possible outcomes consistent with the process. For example,
Joash: They copied down this [the original data] which makes you wonder how
did they get this data.
Jean: That’s one batch of cookies. You can get 9 at any other time, because your
scoop is not mathematically [perfect].
After further discussion of this perspective, the teacher continued:
Teacher: Okay, so, what is good about this, though? Abe?
Abe: Well, what’s good about it is that it expresses the true probability that it can
get whatever numbers that are on there [the original sample].
…. [continued conversation]
Teacher: Well, can I ask what the true probability of getting 10 is, Abe?
Getting a Grip on Variability Page 19 of 26 106
Abe: Well there’s five 0’s, so it should be 33%.
Students: 1/3.
…. [continued conversation]
Teacher: So, my question is, is every time I make a batch, I’m gonna have exactly
five out fifteen 10’s?
Students: No.
Garth: Not every time, that could happen.
Teacher: Okay.
The teacher established that the students did not expect the model to copy, but rather
to approximate the empirical sample, even though it was guided exclusively by the
estimated probabilities of outcomes in the sample. We speculate that the appeal of this
model was based in part on the fictitious nature of the sample, so even though sources
of error were identified in the problem scenario, students did not have firsthand expe-
rience of them. The resampling perspective on modeling gained traction as the class
confronted problems of natural variation. For example, a student creating a resampling
model of the estimates of midpoint of the line in the unbiased condition of the eye
illusion experiment clarified that ignorance of the process motivated consideration of
resampling: “We kinda did like what they did yesterday, because there’s not really a
process for doing this, so we just sampled it, and we did the values of all the errors of
the original” (an analog to the resampling model of the production process invented
by peers). Knowledge limitations about process were increasingly prominent in the
challenge of modeling the difference between the number of leaves after 18 days of
growth for two batches of plants, one planted earlier than the other. Now students typ-
ically resorted to putting all cases into a common “mixer” and sampling with random
replacement to arbitrarily assign a case value to one time of planting or to the other.
The following episode exemplifies some of the conversation about students’ rationale:
Teacher: So, is there chance involved in whether our plant grows 5 leaves or 8 leaves
by the 18th day?
Students: Yes.
Teacher: So, we are assuming that our original data set, there’re chance involved
in that, right? So, we’re continuing to assume that there’s some chance
involved, because we’re using a chance device.
Student: Right.
Teacher: And, we’re using the original data set, we poured it all into the mixer. Why?
Joash: So we can see what happens at different times with the same numbers.
Teacher: Shreya, why do we pour all that information back into the mixer, and not
create a spinner with errors?
Shreya: That’s all we know.
Teacher: That’s all we know, right?
Joash: We don’t know the errors, if there are any.
Teacher: Yeah, and what we do know is really complicated, and so we don’t really
want to go through the process of quantifying it.
Resampling was not met with unanimous acceptance. Drawing upon their experiences
with constructing sampling distributions, some students pointed out that the samples
were relatively small:
106 Page 20 of 26 R. Lehrer et al.
Cameron: Um, like, you have to do this again. You can’t, it’s like with the boy and the
girl thing [reference to a problem posed using birth data from two counties
in North Carolina]. If you have one kid born, and it’s a boy, it’s gonna be
100%, but then if you do it like 100 times, and then if you have 100 births
then you have a [gesturing], but if you only do the experiment once
and then you put all that data into a mixer, then your probability, your
probability of getting something is kind of–
Teacher: So, are you saying if we had done more plants, we’d grown more plants,
instead of just 36, we would have better numbers?
Cameron: Do the experiment again or something.
Teacher: Do another experiment. Do the same experiment multiple times.
Students went on to consider some of the reasons the scientists had not (yet) repeated
the experiment but they eventually settled on employing the data that were available
to construct an empirical sampling distribution of mean differences to infer whether
the day of planting did make a difference in this measure of plant growth. We next
illustrate how students employed models and model-generated simulation of sampling
distributions to guide inference.
Model-Based Inference
After students decided whether or not a model was good, generally through recourse
to the behavior of the model-generated sampling distribution of statistics of central
tendency and of variability, such as the IQR, the teacher and researcher (RL) asked
students to consider the likelihood of values of these statistics at or in excess of some
threshold value. The aim was to help students establish routines for considering the
probability of a value of a statistic by locating it within the model’s simulated sampling
distribution of that statistic, as illustrated in the following exchange.
RL: So how likely is it that the median, just by chance, would be more than
Savannah: Still a little bit.
Joash: Out of this?
RL: Ah ha.
John: Turn on the percent and I’ll tell you.
RL: You want to guess?
Teacher: Let’s guess before we turn it on.
…. [continued conversation as students estimate]
Teacher: Write down your vote in your journal. What’s the chance that it lands,
that a class, that we get a median of higher than 510. 7%? 8?
Joash: Yeah it is 5%. I found out that it’s 5%.
Renee: Yeah, I did the math. It’s 5%
To initiate the logic of inference, the teacher went on to pose a scenario in which another
class claimed that they measured the same table and the median of their sample of
measurements was an extreme value in the sampling distribution generated by this
class’s model.
Teacher: Do you think Mr. B’s class got mixed up and measured a different table, or
do you think that just by chance, they could have gotten 517?
Getting a Grip on Variability Page 21 of 26 106
Joash: On this data [looking at sampling distribution], it looks like they messed
up… So how will the median be 517 if that will be an outlier [in the sampling
Other students imagined a shift or translation of the distribution of measurements in
the second class, further buttressing their rejection of the claim that both classes had
measured the same table’s perimeter.
Joash: If this is the correct model—
Teacher: We’re assuming it is, yeah.
James B: On there, our big pile is around like 503, 501, but theirs would be like their
middle part would be around 517.
Teacher: Oh so not—
Joash: Theirs shifted [gesturing shifting to the right].
Garth: Their whole data just shifted.
As students continued to construct models of different scenarios, the teacher cul-
tivated reasoning about the sampling distribution of statistics of simulated samples
generated by models as a way of deploying chance to inform inference. Consequently,
this approach gradually achieved more prominence, as illustrated in the following
episode. Here, a teacher invited students to consider sample values in the bias con-
dition that they would take as evidence of overcoming the Muller illusion. In this
episode, the students appeal to any value of the median within the simulated values of
the median generated by their model of unbiased judgment.
Teacher: Okay, so Lily, if Mr. B’s class says we’ve got a median of 102, do you think
they overcame the illusion?
Lani: Yes.
Teacher: Yeah? 101?
Lani: Yeah.
Teacher: 103?
Lani: Yes.
Renee: Pushing it, but yeah.
Teacher: You think they overcame the illusion? 97?
Lani: Yeah.
Attempting to shift students’ focus from inclusion or exclusion in the distribution to
probability, the teacher then challenged students to think of the probability of a value
that was in the range generated by the simulated sampling distribution of the medians
Teacher: So, in your opinion, if they get any of those numbers that you got, they
overcame the illusion. Okay. Even this one that only happens, I just want
to see, 97 happens less than a percent of the time? If they got 97, you think
they overcame the illusion or they fell for the illusion? Did it shift it?
Lani: They’re probably affected.
Teacher: Ha, I don’t know. Talk about it. What would make you confident? What
percent would have to be there for you to feel confident that they overcame
the illusion?
The teacher’s efforts to consider a statistic in light of its probability in a sampling
distribution began to be appropriated by students. They continued to model natural
106 Page 22 of 26 R. Lehrer et al.
variation to determine whether or not the difference between sample means observed
in number of leaves of plants in batches 18 days after initial planting, but at two
different times, was likely merely a manifestation of chance.
Teacher: What was our original data set’s difference?
Students: .72
Teacher: .72 leaves. Do we think that happened by chance or do you think the original
assumption, day doesn’t matter is [wrong]?
[continued conversation]
Teacher: What does it mean for it to have only happened 4% of the time by chance?
What does that mean?
Renee: Probably not a chance happening [the difference is due to day of planting].
Savannah: It’s a very low chance.
Reflecting concerns raised earlier during the discussion about sample size, several
students suggested that perhaps the low probability of a difference as large or larger
than that observed might be erased if the samples were larger:
Brianna: If I were telling the scientist what to do, then I would tell them to collect
more data, because it’s still possible to get 0.72, but it’s highly unlikely, and
I think would be more helpful to collect more data.
Joash: It’d get knocked out because of the larger sample size.
Brianna: Yeah.
Renee put it more gently, but she also anchored the model to what she understood
about the situation, questioning the inference based on the model:
Renee: I’m thinking, well, the data, it looks like day should matter, because it’s such
a small percentage of it happening by chance, like there’s 4% chance that it
happened by chance. But the day really shouldn’t matter if all the conditions
are the same and they are growing inside, the same temperature, humidity,
and everything, but I want to see like three more testings or something. Just
a little bit more.
Across these episodes of classroom interaction, it appears evident that engaging
students in modeling tangible processes involving signal and noise provided a produc-
tive entrée to conceiving of variability as generated by a long-term process with fixed
and random components. The insights that students generated through firsthand expe-
rience of signal and noise, in contexts of repeated measure and manufacture, helped
them navigate toward new understandings of sample and sampling distributions and
of inference in light of uncertainty (Manor Braham and Ben-Zvi 2015). The bridge
between these statistical concepts and students’ experiences was built by representing
these experiences as models that simulated the variability of outcomes observed in
the world as compositions of signal and noise. Modeling processes with “possible
values” anchored images of sample as simultaneously representing both one set of
observations of the outcomes of a process and as one of a potentially infinite series of
such samples (Lehrer 2017). Students’ invention of models involving resampling as a
stand-in for unobservable process involved in natural variability helped bridge direct,
firsthand experience of variability with variability in the natural world.
Getting a Grip on Variability Page 23 of 26 106
6 Discussion
Practices like modeling are the generative means by which disciplines create and revise
knowledge. We have described one pathway through which comparatively young stu-
dents participated in the kinds of practices that allow professionals to get a grip on
variability. Much of the process of data modeling (Lehrer and Romberg 1996; Lehrer
and English 2018) can be made accessible to students by engaging them in the inven-
tion, revision, and contest of ways to visualize, measure and model variability. Through
this work they can come to grasp some of the central concepts and forms of activity
that generations of scientists and statisticians have developed. We described a progres-
sion of variability-generating processes that helps students form images of variability
as emerging from a long-term, stochastic process. Students come to regard visualiza-
tions of variability as being determined by choices made by designers about count,
order, interval, and scale. Increasingly, students conceive of variability as measured by
attending to particular characteristics of the cases that comprise a collection, modeled
by compositions of fixed and random components.
This modeling sequence provided students with some useful tools for pursuing
understanding of the natural world. For instance, sixth-grade students subsequently
employed these ideas to pursue questions about the ecology of a local pond. They
sought to estimate the extent to which the relative abundance of target organisms (e.g.,
caddisfly, dragonfly nymphs) could be considered as arising merely from chance vari-
ation. They wondered: Were apparently higher counts of the organism found in the
same volume of pond water due to the location of the sample, or were they perhaps due
only to the random variation one would expect from sample to sample? Confronted
with warranting claims about the relative abundance of organisms in particular parti-
tions of the pond, students decided to pool all their counts of the organism they had
observed throughout the pond in a random mixer. Then, they repeatedly drew a sample
at random (with replacement) and created an empirical sampling distribution. Students
then made judgments about new samples drawn from different locations in relation to
this empirical distribution. Although this approach has its limits, it nonetheless signals
that students were seeking to extend their experiences of modeling to new realms of
experience. We do not argue that the sequence delivers a complete toolkit for modeling
variability, but it does afford young students with a basic conceptual repertoire that
can support them in pursuing their own questions, rather than being constrained to
those dictated by curricula or educators.
The sixth-grade teacher’s support of student invention and revision of models, and
her efforts to help students deploy modeling to support inference, exemplify some
of the instructional practices through which teachers bring data modeling to life in
classrooms. As teachers accumulate experience, they also contribute to curricular
innovation by contributing to a “Teachers Corner” that features additions and amend-
ments to the baseline curriculum. Some of these contributions include:
guidelines for articulating gesture to highlight mathematically relevant characteris-
tics of data displays for ESL students,
organizers for expanding the instructional reach of formative assessments by attend-
ing to typical patterns in students’ ways of thinking,
106 Page 24 of 26 R. Lehrer et al.
using TinkerPlots to help students think about the mean as a fulcrum of deviations
of case values from it,
contrasts between measurement precision and accuracy (i.e., bias),
guiding exploration and discussion of students’ beliefs about personal agency and
imports of sports data (e.g., fantasy football) to extend ideas of statistics as measures,
extensions to new production processes, such as making candies.
Teacher innovation also includes authoring for professional publication. For
instance, some schools expanded the cycle of innovation and critique of data dis-
plays in the first phase of instruction to engage in school-wide norm-setting about
how to conduct mathematically productive conversation. Teachers coauthored an arti-
cle about their experiences for a wider professional audience (Tapee et al. 2019). The
lead authors were teachers we never interacted with, evidence of the kind of appropri-
ation that indicates a robust design. These innovations and extensions are but a small
sample, but they illustrate how our initial design has been elaborated and revised in
response to teacher feedback and improvement. As the examples suggest, the design is
not fixed but is continually shaped by participants, especially as we learn more about
students’ propensities and characteristic ways of thinking, and as the social worlds of
students and teachers change. When we first initiated a variant of this design, sport
statistics were not easily accessed; now it is difficult to avoid being enveloped by
them. Similarly, data generated by students could not be readily shared outside of
the confines of school, and now large-scale data, including those of scientists, are
more readily available. Both of these changes have generated more points of contact
between student-generated, small-scale data and so-called big data. These changes in
access and availability suggest augmented possibilities for data modeling. Nonethe-
less, we conjecture that the firsthand experiences of variability and chance that we
have described will continue to be critical and will provide an important foundation
to children’s subsequent work with data farther from their own personal experience.
Acknowledgements Funding was provided by Australian Research Council (Grant No. DP180102333)
and Institute of Education Sciences (Grant No. R305A110685).
Bakker A (2018) Design research in education. Routledge, New York
Ben-Zvi D, Aridor K, Makar K, Bakker A (2012) Students’ emergent articulation of uncertainty while
making informal statistical inference. ZDM 44(7):913–925
Ford MJ (2015) Educational implications of choosing “practice” to describe science in the next generation
science standards. Sci Educ 99(6):1041–1048.
Ford MJ, Forman EA (2006) Redefining disciplinary learning in classroom contexts. Rev Res Educ 30:1–32
Hestenes D (1982) Modeling games in the Newtonian world. Am J Phys I 60(8):732–748
Horvath JK, Lehrer R (1998) A model-based perspective on the development of children’s understanding of
chance and uncertainty. In: Lajoie SP (ed) Reflections on statistics: learning, teaching, and assessment
in grades K-12. Routledge, New York, pp 121–148
Jones RS, Lehrer R, Kim M-J (2017) Critiquing statistics in student and professional worlds. Cogn Instr
Getting a Grip on Variability Page 25 of 26 106
Konold C (1989) Informal conceptions of probability. Cogn Instr 6(1):59–98.
Konold C (2007) Designing a data tool for learners. In: Lovett MC, Shah P (eds) Thinking with data.
Lawrence Erlbaum Associates, Taylor and Francis, New York, pp 267–292
Konold C, Harradine A (2014) Contexts for highlighting signal and noise. In: Wassong T, Frischemeier D,
Fischer PR, Hochmuth R, Bender P (eds) Mit Werkzeugen Mathematik und Stochastik lernen: using
tools for learning mathematics and statistics. Springer, Wiesbaden, pp 237–250
Konold C, Lehrer R (2008) Technology and mathematics education: an essay in honor of Jim Kaput. In:
English LD (ed) Handbook of international research in mathematics education, 2nd edn. Taylor &
Francis, Philadelphia, pp 49–72
Konold C, Miller CD (2011) TinkerPlots. Dynamic data exploration. Key Curriculum Press, Emeryville
Konold C, Higgins T, Russell SJ, Khalil K (2015) Data seen through different lenses. Educ Stud Math
Lehrer R (2017) Modeling signal-noise processes supports student construction of a hierarchical image of
sample. Stat Educ Res J 16(2):64–85
Lehrer R, English LD (2018) Introducing children to modeling variability.In: Ben-Zvi D, Makar K, Garfield
J (eds) International handbook of research in statistics education. Springer International Publishing,
Cham, pp 229–260
Lehrer R, Kim MJ (2009) Structuring variability by negotiating its measure. Math Educ Res J 21(2):116–133
Lehrer R, Romberg T (1996) Exploring children’s data modeling. Cogn Instr 14(1):69–108.
Lehrer R, Schauble L (2007) Contrasting emerging conceptions of distribution in contexts of error and
natural variation. In: Lovett MC, Shah P (eds) Thinking with data. Lawrence Erlbaum Associates,
Taylor and Francis, New York, pp 149–176
Lehrer R, Schauble L (2017) Children’s conceptions of sampling in local ecosystems investigations. Sci
Educ 101(6):968–984.
Lehrer R, Schauble L (2019) Learning to play the modeling game. In: Upmeirer zu Belzen A, Kruger D,
van Driel J (eds) Toward a competence-based view on models and modeling in science education.
Springer, Cham, pp 221–236
Lehrer R, Kim M-J, Jones RS (2011) Developing conceptions of statistics by designing measures of distri-
bution. ZDM Math Educ 43(5):723–736.
Lehrer R, Kim M-J, Ayers E, Wilson M (2014) Toward establishing a learning progression to support the
development of statistical reasoning. In: Confrey J, Maloney AP, Nyuyen KH (eds) Learning over time:
learning trajectories in mathematics education. Information Age Publishers, Charlotte, pp 31–60
Makar K, Rubin A (2017) Learning about statistical inference. In: Ben-Zvi D, Makar K, Garfield J (eds)
International handbook of research in statistics education. Springer International Publishing, Cham,
pp 261–294
Manor Braham H, Ben-Zvi D (2015) Students’ articulations of uncertainty in informally exploring sampling
distribution. In: Zieffler A, Fry E (eds) Reasoning about uncertainty: learning and teaching informal
inferential reasoning. Catalyst Press, Minneapolis, pp 57–94
Metz KE (1998) Emergent understanding and attribution of randomness: comparative analysis of the rea-
soning of primary grade children and undergraduates. Cogn Instr 16(3):285–265.
National Research Council (2012) A framework for K-12 science education: practices, crosscutting con-
cepts, and core ideas. The National Academy of the Sciences, Washington, DC
Nersessian NJ (2008) Creating scientific concepts. The MIT Press, Cambridge
Petrosino AJ, Lehrer R, Schauble L (2003) Structuring error and experimental variation as distribution in
the fourth grade. Math Think Learn 5(2–3):131–156.
Pfannkuch M, Wild CJ (2000) Statistical thinking and statistical practice: themes gleaned from professional
statisticians. Stat Sci 15(2):132–152.
Saldanha LA, Thompson PW (2002) Conceptions of sample and their relationship to statistical inference.
Educ Stud Math 51(3):257–270
Saldanha LA, Thompson PW (2014) Conceptual issues in understanding the inner logic of statistical infer-
ence: insights from two teaching experiments. J Math Behav 35:1–30
106 Page 26 of 26 R. Lehrer et al.
Shinohara M, Lehrer R (2017, July) Narrating lines of practice: students’ views of their participation in
statistical practice. Paper presented at the tenth international research forum on statistical reasoning,
thinking, and literacy (SRTL 10). Rotorua, New Zealand
Tapee M, Cartmell T, Guthrie T, Kent LB (2019) Stop the silence. How to create a strategically social
classroom. Math Teach Middle Sch 24(4):210–216
Thompson PW, Liu Y, Saldanha L (2007) Intricacies of statistical inference and teachers’ understanding
of them. In: Lovett MC, Shah P (eds) Thinking with data. Lawrence Erlbaum Associates, Taylor and
Francis, New York, pp 207–231
Windschitl M, Thompson J, Braaten M (2008) Beyond the scientific method: model-based inquiry as a new
paradigm of preference for school science investigations. Science Education 92(5):941–967. https://
Wisittanawat P, Lehrer R (2018) Teacher assistance in modeling chance processes. In: Sorto MA (ed)
Looking back, looking forward. (Proceedings of the 10th international conference on the teaching of
statistics, Kyoto, Japan, July). Voorburg, The Netherlands: International Statistical Institute. Accessed
on 26 Feb 2020 from
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.
... The cornerstone of the scientific process is the reliance on empirical data that are formally analyzed [5,8]. Research that has focused on understanding the cognitive processes that underlie scientific reasoning includes studies of generating hypotheses [1,5,9,10], making predictions [11], deciding how to measure variables [12][13][14], and interpreting data in light of theory and prior beliefs [2,3,15,16]. We focus on one area that has gotten less attention, specifically on how people make sense of numerical data. ...
... Another numerical set characteristic is variability. The critical role of variability in empirical investigations has been noted for decades (e.g., [85]) and recently, Lehrer et al. [12] argued that variability is one of the fundamental issues it is necessary for students to understand when reasoning effectively about science. Functionally, it is only possible to measure the variability of a set of data, not a single data point; there must be data that vary to measure variability [86]. ...
... These intuitions of data sensemaking can influence reasoning from data, but the understanding that inferences from data must be probabilistic is necessary for effective scientific reasoning, despite its challenges (e.g., [12,53]). Even when adding inferential statistics to the toolkit, there can still be a wide range of approaches taken by experts in the field [145]. ...
Full-text available
Data reasoning is an essential component of scientific reasoning, as a component of evidence evaluation. In this paper, we outline a model of scientific data reasoning that describes how data sensemaking underlies data reasoning. Data sensemaking, a relatively automatic process rooted in perceptual mechanisms that summarize large quantities of information in the environment, begins early in development, and is refined with experience, knowledge, and improved strategy use. Summarizing data highlights set properties such as central tendency and variability, and these properties are used to draw inferences from data. However, both data sensemaking and data reasoning are subject to cognitive biases or heuristics that can lead to flawed conclusions. The tools of scientific reasoning, including external representations, scientific hypothesis testing, and drawing probabilistic conclusions, can help reduce the likelihood of such flaws and help improve data reasoning. Although data sensemaking and data reasoning are not supplanted by scientific data reasoning, scientific reasoning skills can be leveraged to improve learning about science and reasoning with data.
... This system must include descriptions of learning informed by an epistemic view of a discipline, the means to support these forms of learning, wellarticulated schemes of assessment, and professional development that produces pedagogical capacities oriented toward sustaining student progress. As we later describe more completely, in this article we concentrate on a learning progression that was developed to support transitions in students' conceptions of data, chance, and statistical inference (Lehrer et al., 2020). Conceptual change was promoted instructionally by inducting students into approximations of core professional practices of statisticians. ...
... Modeling-related concepts emerge as students participate in curricular activities designed to make the role of models in statistical inference visible and tractable. In professional practice, models guiding inference rely on probability density functions, but we take a more informal approach (see Lehrer et al., 2020 for a more complete description). One building block of inference is an image of variable outcomes that are produced by the same underlying process. ...
... Brianna: Yeah, it might be a whole one, because on the ruler that we had, there was half a centimeter on one side, and half a centimeter on the other side, so it might be 33 still, and I think we subtract 33. Cameron: Yeah, because if you get a false start, you're gonna miss (Lehrer et al., 2020). ...
Full-text available
We describe the development and implementation of a learning progression specifying transitions in reasoning about data and statistics when middle school students are inducted into practices of visualizing , measuring , and modeling the variability inherent in processes ranging from repeated measure to production to organismic growth. A series of design studies indicated that inducting students into these approximations of statistical practice supported the development of statistical reasoning. Conceptual change was supported by close coordination between assessment and instruction, where changes in students’ ways of thinking about data and statistics were illuminated as progress along six related constructs. Each construct was developed iteratively during the course of design research as we became better informed about the forms of thinking that tended to emerge as students were inducted into how statisticians describe and analyze variability. To illustrate how instruction and assessment proceeded in tandem, we consider progress in one construct, Modeling Variability. For this construct, we describe how learning activities supported the forms of conceptual change envisioned in the construct, and how conceptual change was indicated by items specifically designed to target levels of the construct map. We show how student progress can be monitored and summatively assessed using items and empirical maps of items’ locations compared to student locations (called Wright maps), and how some items were employed formatively by classroom teachers to further student learning.
... The science education research communities are seeing the importance of smooth continuous variational and covariational reasoning in physics (Boudreaux et al., 2020;Brahmia et al., 2021;Carli et al., 2020;Christensen & Thompson, 2012;Doughty et al., 2014;Fuad et al., 2019;Lucas & Lewis, 2019;McDermott et al., 1987;Sokolowski, 2020;Thompson, 2006), chemistry (Rabin et al., 2021;Rodriguez et al., 2018), biology (Bressoud, 2020;Lehrer et al., 2020;Roth & Temple, 2014), geosciences (González, 2021), and economics (Feudel & Biehler, 2020). This is not surprising to us. ...
Existing literature reviews of calculus learning made an important contribution to our understanding of the development of mathematics education research in this area, particularly their documentation of how research transitioned from studying students’ misconceptions to investigating students’ understanding and ways of thinking per se. This paper has three main goals relative to this contribution. The first goal is to offer a conceptual analysis of how students’ difficulties surveyed in three major literature review publications originate in the mathematical meanings and ways of thinking students develop in elementary, middle and early high school. The second goal is to highlight a contribution to an important aspect that the articles in this issue make but was overshadowed by other aspects addressed in existing literature reviews: the nature of the mathematics students experience under the name “calculus” in various nations or regions around the world, and the relation of this mathematics to ways ideas foundational to it are developed over the grades. The third goal is to outline research questions entailed from these articles for future research regarding each of them.
... Often students need to mathematize, structure, and link complex forms of data (which may take time to collect and structure) to an initial question, and typically, the results include margins of uncertainty (which can sometimes be quantified)." Thus, in their current work in this issue: "Getting a Grip on Variability," Lehrer and Wisittanawat (2020) investigate directly what happens when students and their teachers move into experiences that stress the importance of active learning through modeling and "develop an epistemology of modeling." ...
In both K-12 and university settings, instructors and curriculum developers need to create learning experiences that provide students opportunities to engage in the disciplinary practices of science and engineering. However, instructional contexts pose challenges in developing such tasks. Using a framework focusing on conceptual and material epistemic practices and tools, we used a within-subjects comparison to explore two different types of task designs, given realistic constraints posed by a large-enrollment university engineering laboratory course. We investigated students’ modeling and experimental activity at scale using an innovative data analysis tool, Model Maps, looking for evidence of epistemic practices in student laboratory notebooks and in their post-project presentations. We compared inscriptions of 29 teams across three laboratory projects and found a greater number and diversity of model components in a virtual laboratory project than in two physical laboratory projects. While fitting into the same instructional “space,” the virtual laboratory task, compared to the physical tasks, better afforded iteration. The iterative development engaged students in two important features of engineering epistemic practice. First, teams had the opportunity to engage in interlocking conceptual and material practices. Second the need to reconcile experimental data with their process models elicited the authorship of free moves together with the accountability of forced moves. Implications for designing instruction to incorporate epistemic practices in engineering and science education are described.
We live in a data-rich world with rapidly growing databases with zettabytes of data. Innovation, computation, and technological advances have now tremendously accelerated the pace of discovery, providing driverless cars, robotic devices, expert healthcare systems, precision medicine, and automated discovery to mention a few. Even though the definition of the term data science continues to evolve, the sweeping impact it has already produced on society is undeniable. We are at a point when new discoveries through data science have enormous potential to advance progress but also to be used maliciously, with harmful ethical and social consequences. Perhaps nowhere is this more clearly exemplified than in the biological and medical sciences. The confluence of (1) machine learning, (2) mathematical modeling, (3) computation/simulation, and (4) big data have moved us from the sequencing of genomes to gene editing and individualized medicine; yet, unsettled policies regarding data privacy and ethical norms could potentially open doors for serious negative repercussions. The data science revolution has amplified the urgent need for a paradigm shift in undergraduate biology education. It has reaffirmed that data science education interacts and enhances mathematical education in advancing quantitative conceptual and skill development for the new generation of biologists. These connections encourage us to strive to cultivate a broadly skilled workforce of technologically savvy problem-solvers, skilled at handling the unique challenges pertaining to biological data, and capable of collaborating across various disciplines in the sciences, the humanities, and the social sciences. To accomplish this, we suggest development of open curricula that extend beyond the job certification rhetoric and combine data acumen with modeling, experimental, and computational methods through engaging projects, while also providing awareness and deep exploration of their societal implications. This process would benefit from embracing the pedagogy of experiential learning and involve students in open-ended explorations derived from authentic inquiries and ongoing research. On this foundation, we encourage development of flexible data science initiatives for the education of life science undergraduates within and across existing models.
Full-text available
The chapter discusses the challenges and opportunities of initiating elementary school children into the practice of scientific modeling. Understanding modeling depends on conceiving science as a process of knowledge construction and critique and of pragmatic engagement with material and conditions that enable and mediate scientific ways of knowing. Our 25 years of research with children indicates the necessity of inviting students to invent models to address questions they care about, test their models in contexts that provide feedback about model fit and provoke model revision, and participate in a community engaged in building knowledge together. An accessible entrée to modeling is through developing representational competence. As students begin to appreciate the purposes and trade-offs of various representational conventions, they also begin to address more sophisticated questions about model fit, including whether and how a particular model can be accepted as a valid representation of a target phenomenon.
Full-text available
This article compares students' critiques within a class discussion about an invented statistic to STEM professionals' critiques from interviews to better understand how the situated meanings of a statistic are similar and different across student and professional worlds. We discuss similarities and differences in how participants constructed meaning for the statistic, and argue that disciplinary practices in schooling will always have an approximate, and somewhat indeterminate, relation to the figured worlds of professional practice.
Full-text available
During the past several years, we have conducted a number of instructional interventions with students aged 12 – 14 with the objective of helping students develop a foundation for statistical thinking, including the making of informal inferences from data. Central to this work has been the consideration of how different types of data influence the relative difficulty of viewing data from a statistical perspective. We claim that the data most students encounter in introductions to data analysis—data that come from different individuals—are in fact among the hardest type of data to view from a statistical perspective. In the activities we have been researching, data result from either repeated measurements or a repeatable production process, contexts which we claim make it relatively easier for students to view the data as an aggregate with signal-and-noise components.
Full-text available
Technologies of writing have long played a constitutive role in mathematical practice. Mathematical reasoning is shaped by systems of inscription (i.e., writing mathematics) and notation (i.e., specialized forms of written mathematics), and systems of inscription and notation arise in relation to the expressive qualities of mathematical reasoning. Despite this long history, something new is afoot: Digital technologies offer a significant expansion of the writing space. In this essay, we begin with a view of the developmental origins of the coordination of reasoning and writing, contrasting notational systems to more generic forms of inscription.Then, following in the footsteps of Jim Kaput, we portray dynamic notations as a fundamental alteration to the landscape of writing afforded by digital technology. We suggest how dynamic notations blend the digital character of notation with the analog qualities of inscription, resulting in a hybrid form that is potentially more productive than either form in isolation. We conclude with another proposition spurred by Jim Kaput: Digital technologies afford new forms of mathematics. We illustrate this proposition by describing children’s activities with TinkerPlots™ 2.0, a tool designed to help students organize and structure data, and to relate their understandings of chance to these patterns and structures in data.
By orchestrating social interactions, students learn from one another about data and measures of center.
Design Research in Education is a practical guide containing all the information required to begin a design-research project. Providing an accessible background to the methodological approaches used in design research as well as addressing all the potential issues that early career researchers will encounter, the book uniquely helps the early career researcher to gain a full overview of design research and the practical skills needed to get their project off the ground. Based on extensive experience, the book also contains multiple examples of design research from both undergraduate and postgraduate students, to demonstrate possible projects to the reader. With easy to follow chapters and accessible question and response sections, Design Research in Education contains practical advice on a wide range of topics related to design research projects including: The theory of design research, what it entails, and when it is suitable The formulation of research questions How to structure a research project The quality of research and the methodological issues of validity and reliability How to write up your research The supervision of design research Through its theoretical grounding and practical advice, Design Research in Education is the ideal introduction into the field of design-based research and is essential reading for Bachelors, Masters and PhD students new to the field, as well as to supervisors overseeing projects that use design research.
Grade 6 (modal age 11) students invented and revised models of the variability generated as each measured the perimeter of a table in their classroom. To construct models, students represented variability as a linear composite of true measure (signal) and multiple sources of random error. Students revised models by developing sampling distributions of model-generated statistics to judge model fit and validity. After instruction, interviews with 12 students were conducted to learn how they conceived of relations among chance, modeling, and inference. Most students' inferences were guided by a hierarchical image of sample, a perspective constituted through their understandings of modeling variability as signal and noise. © International Association for Statistical Education (IASE/ISI), November, 2017.
This chapter synthesizes diverse research investigating the potential of inducting elementary grade children into the statistical practice of modeling variability in light of uncertainty. In doing so, we take a genetic perspective toward the development of knowledge, attempting to locate productive seeds of understandings of variability that can be cultivated during instruction in ways that expand students’ grasp of different aspects and sources of variability. To balance the complexity and tractability of this enterprise, we focus on a framework we refer to as data modeling. This framework suggests the inadvisability of piecewise approaches focusing narrowly on, for instance, computation of statistics, in favor of more systematic and cohesive involvement of children in practices of inquiring, visualizing, and measuring variability in service of informal inference. Modeling variability paves the way for children in the upper elementary grades to make informal inferences in light of probability structures. All of these practices can be elaborated and even transformed with new generations of digital technologies.
(NGSS; NGSS Lead States, 2013) highlight aneducational aim for students to engage in scientific practices. Following its predecessors“scientific method” and “inquiry,” “practice” is the most recent vocabulary choice for ex-pressing an educational aim that students learn how to reason and act scientifically. Onemotivation for this shift is that, although it may be out of reach to think and act scientif-ically exactly as scientists do, students can be taught, in some basic form, scientificallypowerfulwaysofreasoningandactingthatcapturewhatisparticularaboutscience—waysof reasoning and acting that develop reliable knowledge claims and are underpinnings ofscience’s epistemic privilege.Previous attempts to articulate semantically what is particular about the reasoning andacting of scientists have faced challenges. For example, “scientific method" appeals intu-itively to an undeniable truth that something scientists think and do methodically resultsin scientific knowledge. Articulating this ran aground when science education scholarsand others (see Woodcock, 2014 for a recent summary) pointed to ways in which “sci-entific method” does not resemble actual scientific work—for example, scientists do notfollow ordered steps, and no set of steps are necessary or common across different disci-