PreprintPDF Available

Teaching modeling in introductory statistics: A comparison of formula and tidyverse syntaxes

Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

This paper reports on an experiment run in a pair of introductory statistics labs, attempting to determine which of two R syntaxes was better for introductory teaching and learning: formula or tidyverse. One lab was conducted fully in the formula syntax, the other in tidyverse. Analysis of incidental data from YouTube and RStudio Cloud show interesting distinctions. The formula section appeared to watch a larger proportion of pre-lab YouTube videos, but spend less time computing on RStudio Cloud. Conversely, the tidyverse section watched a smaller proportion of the videos and spent more time on RStudio Cloud. Analysis of lab materials showed that tidyverse labs tended to be slightly longer (in terms of lines in the provided RMarkdown materials, as well as minutes of the associated YouTube videos), and the tidyverse labs exposed students to more distinct R functions. However, both labs relied on a quite small vocabulary of consistent functions. Analysis of pre- and post-survey data show no differences between the two labs, so students appeared to have a positive experience regardless of section. This work provides additional evidence for instructors looking to choose between syntaxes for introductory statistics teaching.
Content may be subject to copyright.
Teaching modeling in introductory statistics:
A comparison of formula and tidyverse
Amelia McNamara
Department of Computer & Information Sciences, University of St Thomas
February 1, 2022
This paper reports on an experiment run in a pair of introductory statistics labs,
attempting to determine which of two R syntaxes was better for introductory teach-
ing and learning: formula or tidyverse. One lab was conducted fully in the formula
syntax, the other in tidyverse. Analysis of incidental data from YouTube and RStudio
Cloud show interesting distinctions. The formula section appeared to watch a larger
proportion of pre-lab YouTube videos, but spend less time computing on RStudio
Cloud. Conversely, the tidyverse section watched a smaller proportion of the videos
and spent more time on RStudio Cloud. Analysis of lab materials showed that tidy-
verse labs tended to be slightly longer (in terms of lines in the provided RMarkdown
materials, as well as minutes of the associated YouTube videos), and the tidyverse
labs exposed students to more distinct R functions. However, both labs relied on a
quite small vocabulary of consistent functions. Analysis of pre- and post-survey data
show no differences between the two labs, so students appeared to have a positive ex-
perience regardless of section. This work provides additional evidence for instructors
looking to choose between syntaxes for introductory statistics teaching.
Keywords: R language, instruction, data science, statistical computing
arXiv:2201.12960v1 [stat.CO] 31 Jan 2022
1 Introduction
When teaching statistics and data science, it is crucial for students to engage authentically
with data. The revised Guidelines for Assessment and Instruction in Statistics Education
(GAISE) College Report provides recommendations for instruction, including “Integrate
real data with a context and purpose” and “Use technology to explore concepts and analyze
data” (GAISE College Report ASA Revision Committee 2016). Many instructors have
students engage with data using technology through in-class experiences or separate lab
An important pedagogical decision when choosing to teach data analysis is the choice
of tool. There has long been a divide between ‘tools for learning’ and ‘tools for doing’ data
analysis (McNamara 2015). Tools for learning include applets, and standalone software like
TinkerPlots, Fathom, or their next-generation counterpart CODAP (Konold & Miller 2001,
Finzer 2002,The Concord Consortium 2020). Tools for doing are used by professionals,
and include software packages like SAS as well as programming languages like Julia, R,
and Python.
Many tools for learning were inspired by Rolf Biehler’s 1997 paper, “Software for Learn-
ing and for Doing Statistics” (Biehler 1997). In it, Biehler called for more attention to the
design of tools used for teaching. In particular, he was concerned with on-ramps for stu-
dents (ensuring the tool was not too complex), as well as off-ramps (using one tool through
an entire class, which could also extend further) (Biehler 1997). At the time he wrote the
paper it was quite difficult to teach using an authentic tool for doing, because these tools
lacked technological or pedagogical on-ramps.
However, recent developments in Integrated Development Environments (IDEs) and
pedagogical advances have opened space for a movement to teach even novices statistics
and data science using programming. In particular, curricula using Python and R have
become popular. In these curricula, educators make pedagogical decisions about what code
to show students, and how to scaffold it. In both the Python and R communities, there
have been movements to simplify syntax for students.
For example, the UC Berkeley Data 8 course uses Python, including elements of the
commonly-used matplotlib and numpy libraries as well as a specialized library written to
accompany the curriculum called datascience (Adhikari et al. 2021,DeNero et al. 2020).
The datascience library was designed to reduce complexity in the code. At the K-12 level,
the language Pyret has been developed as a simplified version of Python to accompany the
Bootstrap Data Science curriculum (Krishnamurthi et al. 2020).
In R, the development of less-complex code for students has been under consideration
for even longer. R offers non-standard evaluation, which allows package authors to create
new ‘syntax’ for their packages (Morandat et al. 2012). In human language, syntax is the
set of rules for how words and sentences should be structured. If you use the wrong syntax
in human language, people will probably still understand you, but they will be able to
hear there is something wrong with how you structured your speech or writing. Syntax in
programming languages is even more formal– it governs what code will execute, run, or
compile correctly. Using the wrong syntax means getting an error from the language.
Typically, programming languages have only one valid syntax. For example, an aphorism
about the language Python is “There should be one– and preferably only one –obvious way
to do it” (Peters 2004). But, non-standard evaluation in R has allowed there to be many
obvious ways to do the same task. There is some disagreement over whether syntax is a
precise term for these differences. Other terms suggested for these variations in valid R
code are ‘dialects,’ ‘interfaces,’ and ‘domain specific languages.’ Throughout this paper, we
use the term syntax as a shorthand for these concepts. At present, there are three primary
syntaxes used: base, formula, and tidyverse (McNamara 2018).
The base syntax is used by the base R language (R Core Team 2020), and is characterized
by the use of dollar signs and square brackets. The formula syntax uses the tilde to separate
response and explanatory variable(s) (Pruim et al. 2017). The tidyverse syntax uses a
data-first approach, and the pipe to move data between steps (Wickham et al. 2019).
A comparison of using the three syntaxes for univariate statistics and displays can be seen
in Code Block 1.1. This example code, like the rest in this paper, uses the palmerpenguins
data (Horst et al. 2020). All three pieces of code accomplish the same tasks, and all three
use the R language. But, the syntax varies considerably.
# base syntax
# formula syntax
gf_histogram(~bill_length_mm, data = penguins)
mean(~bill_length_mm, data = penguins)
# tidyverse syntax
ggplot(penguins) +
geom_histogram(aes(x = bill_length_mm))
penguins %>%
Code Block 1.1: Making a histogram of bill length from the penguins dataset, then
taking the mean, using three different R syntaxes. Base syntax is characterized by
the dollar sign, formula by the tilde, and tidyvese is dataframe-first. In order for
this code to run as-is, missing (NA) values need to be dropped before the code is
There is some agreement about pedagogical decisions for teaching R. In particular, most
educators agree that in order to reduce cognitive load, instructors should only teach one
syntax, and to be as consistent as possible about that syntax (McNamara et al. 2021a).
There is also some agreement base R syntax is not the appropriate choice for introduc-
tory statistics, but there is widespread disagreement on whether the formula syntax or
tidyverse syntax is better for novices.
While there are strongly-held opinions on which syntax should be taught (Pruim et al.
2017,C¸ etinkaya-Rundel et al. 2021), there is relatively little empirical evidence to support
these opinions. In the realm of computer science, empirical studies by Andreas Stefik,
et al have shown significant differences in the intuitiveness of languages, as well as error
rates, based on language design choices (Stefik et al. 2011,Stefik & Siebert 2013). Thus, it
seems likely there are language choices that could make data science programming easier
(or harder) for users, particularly novices.
Stefik’s team is working to add data science functionality to their evidence-based pro-
gramming language. As a first step toward understanding which elements of existing lan-
guages might be best to emulate, they ran an experiment comparing the three main R
syntaxes (Rafalski et al. 2019). The study showed no statistically significant difference
between any of the three syntaxes with regard to time to completion or number of er-
rors. However, there were significant interaction effects between syntax and task, which
suggested some syntaxes might be more appropriate for certain tasks (Rafalski et al. 2019).
Beyond this, examining the results from the study with an eye toward data science ped-
agogy showed common errors made by students related to their conceptions of dataframes
and variables. For example, one of the figures from Rafalski et al. (2019) shows real stu-
dent code with errors. In the first line of code, the student gets everything correct using
formula syntax, with the exception of the name of the dataframe. When that code does
not work, they try again using base R syntax, but again get the dataframe name wrong.
After both those failures, they appear to fall back on computer science knowledge and try
syntax quite different from R. This is consistent with other studies of novice behavior in R
(Roberts 2015). It is not clear if this type of error was dependent on the syntax participants
were asked to use.
The other missing element in this study was instruction. The study was a quick inter-
vention showing students examples of a particular syntax, then asking them to duplicate
that syntax in a new situation. But without any instruction about data science concepts
like dataframes, it would be difficult to troubleshoot the syntax error mentioned above.
The work served as the inspiration for the longer comparison of multiple R syntaxes in the
classroom context described in this paper.
The remainder of this paper is organized into three sections. Section 2describes the
setup of the study, the participants (2.1) and their experience (2.2), and the content of
the course under investigation (2.3). Section 3contains results of the analysis, including a
comparison of material lengths between the sections (3.2), the number of unique functions
shown in each section (3.3), results from the pre- and post-survey (3.4), and analysis of
YouTube (3.5) and RStudio Cloud (3.6) data. Finally, Section 4discusses the results and
opportunities for future study.
All materials used for this study are available on GitHub and are Creative Commons
licensed, so they can be used or remixed by anyone who wants to use them. All code
and anonymized data from this paper is also available on GitHub, for reproducibility. Data
analysis was performed in R, and the paper is written in RMarkdown. The categorical color
palette was chosen using Colorgorical (Gramazio et al. 2017), and colors for the Likert scale
plot are from ColorBrewer (Harrower & Brewer 2003). Example data used throughout the
paper is from palmerpenguins (Horst et al. 2020). Packages used for the formula section
were mosaic and ggformula (now loaded automatically with mosaic), for the tidyverse
section the tidyverse and infer packages (Pruim et al. 2017,Kaplan & Pruim 2020,
Wickham et al. 2019,Bray et al. 2021).
2 Methods
The author ran a pilot study in her introductory statistics labs. This study was run twice,
once in the Spring 2020 semester and once in the Fall 2020 semester. The disruption of
COVID-19 to the Spring 2020 semester made the resulting data unusable, so this paper
focuses on just Fall 2020 data.
Data was collected from YouTube analytics for watch times, from RStudio Cloud for
aggregated compute time, and from pre- and post-surveys of students. Participants for
the pre- and post-survey were recruited from this pool after Institutional Research Board
ethics review.
2.1 Participants
Participants in the study were students enrolled in an introductory statistics course at a
mid-sized private university in the upper Midwest. At this university, statistics students
enroll in a lecture (approximately 60-90 students per section), which is broken into several
smaller lab sections for hands-on work in statistical software. Lecture and lab sections are
taught by different instructors, and the lab sections associated with a particular lecture
often use different software. For example, one lab may use Minitab while the other two use
Excel. However, every lab section (no matter what lecture it is associated with, or what
software is used) does the same set of standardized assignments. This structure provides a
consistent basis for comparison.
formula tidyverse
No 10 9
Yes, but not with R 2 4
Table 1: Responses from pre-survey about prior programming experience. The
majority of students in both sections had no prior programming experience.
In Fall 2020, the author taught two labs associated with the same lecture section, so all
students saw the same lecture content. (A third lab was associated with the same lecture,
using a different software, and was not considered.) Using random assignment (coin flip),
the author selected one lab section to be instructed using formula syntax, and one to be
instructed using tidyverse syntax. The goal was to compare syntaxes head-to-head.
Because the lab took place during the coronavirus pandemic, the instructor recorded
YouTube videos of herself working through the pre-lab documents for each lab, and posted
them in advance. Students watched the videos and worked through the associated pre-lab
RMarkdown document on their own time, then came to synchronous class to ask questions
and get help starting on the real lab assignment. Students used R through the online
platform RStudio Cloud (RStudio PBC 2021).
The two labs were of the same size (n= 21 in both sections) and reasonably similar
in terms of student composition. In both sections, approximately half of students were
Business majors, with the other half a mix of other majors.
Participants for the pre- and post-survey were recruited from this pool after Institutional
Research Board ethics review. For the pre-survey, n= 12 and n= 13 students consented to
participate, and in the post-survey n= 8 and n= 13 responded. So, for paired analysis we
have n= 8 for the formula section, and n= 13 for the tidyverse section. These sample
sizes are very small, and because students could opt-in, may suffer from response bias.
However, because we have additional usage data from non-respondents, some elements of
the data analysis include the full class sample sizes of n= 21.
2.2 Prior programming experience
To verify both groups of students had similar backgrounds, we compared the prior program-
ming experience of the two groups of students. Table 1shows results from the pre-survey.
While two additional students in the tidyverse section had prior programming experience,
the overall pattern was the same. The majority of students in both sections had no prior
programming experience.
For the students who had programmed before, none had prior experience with R. Three
students had prior experience with Java, 3 with Javascript, and a smaller number had
experience with other languages, including C++ and Python.
2.3 Materials
Each week, the lab instructor prepared a “pre-lab” document in RMarkdown. The pre-
lab covered the topics necessary to complete the standardized lab assignment done by all
students across lab sections. Pre-lab documents included text explanations of statistical
and R programming concepts, sample code, and blanks (both in the code and the text)
for students to fill in as they worked. The instructor recorded YouTube videos of herself
working through the pre-lab documents for each lab, and posted them in advance. Students
were told to watch the pre-lab video and work through the RMarkdown document on their
own time, then come to synchronous class to ask questions and get help starting on the
real lab assignment.
The topics covered in Fall 2020 were as follows:
1. [No lab, short week]
2. Describing data: determining the number of observations and variables in a dataset,
variable types.
3. Categorical variables: exploratory data analysis for one or two categorical variables.
Frequency tables, relative frequency tables, bar charts, two-way tables, and side-by-
side bar charts.
4. Quantitative variables: exploratory data analysis for one quantitative variable. His-
tograms, dot plots, density plots, and summary statistics like mean, median, and
standard deviation.
5. Correlation and regression: exploratory data analysis for two quantitative variables.
Correlation, scatterplot, simple linear regression as a descriptive technique.
6. Bootstrap intervals: the use of the bootstrap to construct non-parametric confidence
7. Randomization tests: the use of randomization to perform non-parametric hypothesis
8. Inference for a single proportion: use of the normal distribution to construct confi-
dence intervals and perform hypothesis tests for a single proportion.
9. Inference for a single mean: use of the t-distribution to construct confidence intervals
and perform hypothesis tests for a single mean.
10. Inference for two samples: use of distributional approximations (normal or t) to
perform inference for a difference of proportions or a difference of means.
11. [No lab, assessment]
12. [No lab, Thanksgiving]
13. ANOVA: inference for more than two means, using the F distribution.
14. Chi-square: inference for more than two counts, using the χ2distribution
15. Inference for Regression: inference for the slope coefficient in simple linear regression,
prediction and confidence intervals. Multiple regression.
Although this was a 15-week semester, there are only 12 lab topics. Labs were not held
during the first week of classes or during Thanksgiving week. Additionally, there were two
“lab assessments” to gauge student understanding of concepts within the context of their
lab software. One took place during finals week, the other was scheduled in week 11.
3 Results
3.1 Summative assessments
One obvious question arising when considering the comparison of the two syntaxes is
whether students performed better in one section or another. The IRB for this study
did not cover examining student work (an obvious place for improved further research),
so we cannot look at student outcomes on a per-assignment basis. However, running a
randomization test for a difference in overall mean lab grades showed no significant differ-
ence between the two sections. While they may have been interesting differences in grades
depending on the topic of the lab, we at least know these differences averaged out in the
Similarly, it would be interesting to know if student attitudes about the instructor were
different from the summative student evaluations completed by all students at the end of
the semester. These evaluations are anonymous, and the interface only provides summary
statistics. Again, a test for a difference in means showed no difference in mean evaluation
score on the questions “Overall, I rate this instructor an excellent teacher.” and “Overall,
I rate this course as excellent.”
3.2 Lab lengths
The first question we seek to answer is whether the materials presented to students were
of approximately the same length. We can assess this based on the length of the pre-lab
documents (in lines) and of the pre-lab videos (in minutes).
The length of the pre-lab RMarkdown documents can be measured using lines. Figure
1shows the number of lines of code for each section’s pre-lab document, per week.
It indicates RMarkdown documents for the tidyverse section tended to be longer. We
can compute a difference in lab lengths for each week, and compute the mean difference,
which is 19 lines. Because we only have 12 labs worth of data, we used a bootstrap procedure
to generate a confidence interval for the mean of the differences. The 95% interval is (9,
1 3 5 7 9 11 13 15
Week of semester
Length (in lines) of pre−lab
RMarkdown documents
formula tidyverse
Figure 1: Length of pre-lab RMarkdown documents each week, in lines. Data has
been adjusted for the formula section in weeks 8 and 9, because an instructor error
led this section to have only one document combining both weeks’ work.
29), which indicates labs for the tidyverse section were longer, but only by a few lines.
A slightly longer length for these labs makes sense, because tidyverse code is charac-
terized by multiple short lines strung together into a pipeline with %>%, while the formula
syntax typically has single function calls, sometimes with more arguments.
Then the question becomes if the longer lengths of documents lent themselves to longer
pre-lab videos. Figure 2shows the video lengths, which appear more consistent between
sections. Effort was made to ensure the maximum video length was approximately 20
minutes, and some weeks had multiple videos.
Again, we can compute a pairwise difference in total video length (adding together
multiple videos in weeks that had them), and compute the mean of that difference. That
difference is 2 minutes (tidyverse videos being longer). We can then compute a 95%
bootstrap confidence interval for the difference, (0.16, 4). Again, it appears ‘tidyverse
videos are longer, although just by a few minutes.
1 3 5 7 9 11 13 15
Week of semester
Total length of pre−lab
videos (minutes)
Figure 2: Length of pre-lab videos each week. Outlines help delineate multiple videos
for a single week.
3.2.1 Divergent labs
One place where the labs are of particularly different lengths is in week 3, when the topic
was exploratory data analysis for one and two categorical variables. For the formula section
the RMarkdown document was 134 lines long, and the two videos totaled 28 minutes. The
RMarkdown document for the tidyverse section was 180 lines long, and the videos totaled
35 minutes. There is a clear reason why.
In the formula section, students found frequency tables and relative frequency tables
with code as in Code Block 3.1 and Code Block 3.2.
tally(~island, data = penguins)
tally(~island, data = penguins, format = "percent")
tally(species ~island, data = penguins)
Code Block 3.1: Making tables of one and two categorical variables using the formula
syntax and mosaic::tally().
tally(species ~island, data = penguins, format = "percent")
species Biscoe Dream Torgersen
Adelie 26.19048 45.16129 100.00000
Chinstrap 0.00000 54.83871 0.00000
Gentoo 73.80952 0.00000 0.00000
Code Block 3.2: Making a table of two categorical variables using the formula
syntax and mosaic::tally() function, almong with the percent option.
The mosaic::tally() function produces a familiar-looking two-way table, which took
very little explanation, other than to show how reversing the variables in the formula led
to different percentages, as is seen in Code Block 3.3. Compare Code Block 3.2 and Code
Block 3.3 to see the effect of swapping the order of variables.
tally(island ~species, data = penguins, format = "percent")
island Adelie Chinstrap Gentoo
Biscoe 28.94737 0.00000 100.00000
Dream 36.84211 100.00000 0.00000
Torgersen 34.21053 0.00000 0.00000
Code Block 3.3: Making a table of two categorical variables using the formula
syntax and mosaic::tally() function, with variables swapped.
However, in the tidyverse section, both the code and output took longer to explain.
Initial summary statistics for categorical variables are computed in Code Block 3.4, while
the tidy version of a relative frequency table is shown in Code Block 3.5.
penguins %>%
group_by(island) %>%
summarize(n = n())
penguins %>%
group_by(island) %>%
summarize(n = n()) %>%
mutate(prop = n/sum(n))
penguins %>%
group_by(island, species) %>%
summarize(n = n())
Code Block 3.4: Computing summary statistics for one and two categorical variables
in the tidyverse syntax.
penguins %>%
group_by(island, species) %>%
summarize(n = n()) %>%
mutate(prop = n/sum(n))
# A tibble: 5 x 4
# Groups: island [3]
island species n prop
<fct> <fct> <int> <dbl>
1 Biscoe Adelie 44 0.262
2 Biscoe Gentoo 124 0.738
3 Dream Adelie 56 0.452
4 Dream Chinstrap 68 0.548
5 Torgersen Adelie 52 1
Code Block 3.5: Computing summary statistics for two categorical variables in the
tidyverse syntax.
Again, reversing the order of the variables (this time, inside the dplyr::group by())
changed the percentages, but it was more difficult to determine how the percents added up,
because the data was in long format, rather than wide format. Compare Code Block 3.5
and Code Block 3.6 to see the effect of swapping the order of variables.
penguins %>%
group_by(species, island) %>%
summarize(n = n()) %>%
mutate(prop = n/sum(n))
# A tibble: 5 x 4
# Groups: species [3]
species island n prop
<fct> <fct> <int> <dbl>
1 Adelie Biscoe 44 0.289
2 Adelie Dream 56 0.368
3 Adelie Torgersen 52 0.342
4 Chinstrap Dream 68 1
5 Gentoo Biscoe 124 1
Code Block 3.6: Computing summary statistics for two categorical variables in the
tidyverse syntax, with variables swapped.
A similar discrepancy can be seen in week 10, where the formula section’s RMark-
down document was pull(filter(lablines, week == 10, section == "formula"),
lines) lines long, and the videos totaled 19 minutes. That same week the
tidyverse RMarkdown document was pull(filter(lablines, week == 10, section
== "tidyverse"), lines) lines long, and the videos totaled 27 minutes.
The explanation for the varying time is similar, as well. Week 10 focused on inference
for two samples; that is, inference for a difference of proportions or a difference of means.
While a difference of means makes it fairly easy to know which variable should go where
(the quantitative variable is the response variable to take the mean of, and the categorical
variable is the explanatory variable splitting it), with a difference of two proportions the
concept comes back to thinking about two-way tables. Again, the tidyverse presentation
of a “two-way table” made this more difficult to conceptualize.
In the formula section, students saw code like that in Code Block 3.7.
tally(island ~sex, data = penguins, format = "proportion")
prop.test(island ~sex, data = penguins, success = "Biscoe")
Code Block 3.7: Making a two-way table and performing inference for a difference
of proportions using the formula syntax. In order for this code to run as-is, the
Torgerson island has to be removed so there are just two categories in that variable.
The code for finding the point estimate using mosaic::tally() is quite similar to the
code for performing inference using prop.test().
In the tidyverse, the code is not as consistent. Students in this section saw code like
that shown in Code Block 3.8.
penguins %>%
group_by(sex, island) %>%
summarize(n = n()) %>%
mutate(prop = n/sum(n))
penguins %>%
response = island,
explanatory = sex,
alternative = "two-sided",
order = c("female","male")
Code Block 3.8: Making a ‘two-way table’ and performing inference for a difference
of proportions using the tidyverse syntax. Again, the Torgerson island data has been
removed beforehand.
In tidyverse syntax the code for finding the point estimate (dplyr’s group by(),
summarize() and then mutate()) is quite different from the code performing the inference
(the infer::prop test() function). And, the output from the inferential prop test()
function makes it harder to determine the code was correct. In the prop.test() out-
put, sample estimates are provided, which allows you to check your work against a point
estimate computed earlier.
These discrepancies made it take longer to explain code in the tidyverse section. Com-
parisons of RMarkdown document length and YouTube video length, as well as the corre-
sponding reasons for those discrepancies are the first hint of the computing time results to
come in Section 3.6.
3.3 Number of functions
Since both sections relied on the use of RMarkdown documents, there is a wealth of text
data to be explored. The instructor prepared the pre-lab documents with blanks, but also
saved a ‘filled-in’ copy after recording the accompanying video. She also completed each
lab assignment in an RMarkdown document to generate a key.
Students in each section were also given a “All the R you need for intro stats” cheat-
sheet at the beginning of the semester. These cheatsheets (one for formula and one for
tidyverse) were modeled on the cheatsheet of a similar name accompanying the mosaic
package (Pruim et al. 2017). The cheatsheets aimed to include all code necessary for the
entire semester, but were generated a priori.
These varied documents allow us to use automated methods to analyze the number
of unique functions shown in each section, using the getParseData() function from the
built-in utils package.
The cheatsheets given to students at the beginning of the semester contained 34 functions
for the formula section and 42 functions for the tidyverse section. There was an overlap
of 18 functions between the two cheatsheets.
Of course, while teaching a real class, an instructor often has to ad-lib at least a little.
So, it is also interesting to consider the number of functions actually shown throughout
the course of the semester. To do this, we can consider the functions shown in the filled-in
version of pre-lab documents the instructor ended up with after recording the associated
instructional video.
Considering this data, the formula section saw a total of 37 functions and the tidyverse
section saw 50, again with an overlap of 18 functions between the two sections. These
numbers make it appear as if in the formula section the instructor showed all functions
from the cheatsheet, and then a few additional functions. However, there were actually
several functions in the cheatsheet that were never shown in the actual class, and many
more functions that appeared in the class that did not make it onto the cheatsheet. For a
list of the functions used in both sections, see Appendix A.
In the tidyverse section, there were 9 functions shown in class that did not appear on
the cheatsheet, and only 1 function on the cheatsheet that was not discussed in class. In
the formula section, however, there were 10 functions shown in class that did not appear
on the cheatsheet, as well as 7 functions on the cheatsheet that were not discussed in class.
In both classes the majority of functions shown in class were on the cheatsheet.
Interestingly, there was quite a bit of overlap in the functions students saw in both
sections. Considering functions actually used in class, the two sections had 18 functions in
The functions both sections of students saw included helper functions like library(),
set.seed(), and set() (a function in the knitr options included in the top of each RMark-
down document), statistics like mean(),sd(), and cor(), and modeling-related functions
like aov(),lm(),summary() and predict().
Students in the formula section saw 19 functions unique from the set both sections saw,
while the tidyverse section saw 32 unique functions. It makes sense the number of unique
functions in the tidyverse section would be slightly larger. One reason is the ggplot2
helper functions ggplot() and aes().
Students in both sections saw how to make a barchart, boxplot, histogram, and scatter-
plot, but in the formula section they used standalone functions like gf boxplot() whereas
in the tidyverse section they needed to start with ggplot and add on a geom function
like geom boxplot(), while specifying the aesthetic values somewhere.
Similarly, both sections saw several common summary statistics, but in the formula
section they used the function (e.g. mean()) on its own, whereas in the tidyverse section
summary functions needed to be wrapped within summarize(). Students in the tidyverse
section also saw slightly more summary statistic functions, because one lab called for the
five number summary.
In the formula lab, students found the five number summary as shown in Code Block 3.9.
favstats(~bill_length_mm, data = penguins)
Code Block 3.9: The mosaic::favstats() function provides many common summary
statistics for one quantitative variable. The favstats() function automatically drops
missing values.
This approach is particularly attractive because it deals with missing values as part of
the standard output.
In the tidyverse section, the instructor chose to show two approaches. (Probably a
bad pedagogical decision.) Both approaches are in Code Block 3.10, and both needed to
include drop na() to deal with missing values. Past those similarities, the approaches are
penguins %>%
drop_na(bill_length_mm) %>%
min = min(bill_length_mm),
lower_hinge = quantile(bill_length_mm, .25),
median = median(bill_length_mm),
upper_hinge = quantile(bill_length_mm, .75),
max = max(bill_length_mm)
penguins %>%
drop_na(bill_length_mm) %>%
pull(bill_length_mm) %>%
Code Block 3.10: Two approaches for doing summary statistics of one quantitative
variable in tidyverse syntax. The first is quite verbose, the second is more compact
but introduces a function never seen again.
The instructor should have chosen a single solution to present to students, but was faced
with a dilemma. The first tidyverse approach is very verbose, but it follows nicely from
other summary statistics students had already seen, just adding a few more functions like
min,max, and quantile. The second solution is more concise, but it introduces the pull
function, which was never used again in the course.
This brings up an important consideration when teaching coding– how many times
students will see the same function. Because there is some cognitive load associated with
learning a new function, and repetition helps move information from working memory to
long term memory, it is ideal for students to see each function at least twice (?McNamara
et al. 2021b). When analyzing the number of functions shown in each section, we found
there were 7 functions shown only one time in the formula section, and 6 functions only
shown once in the tidyverse section.
The practice of analyzing the number of functions shown over the course of the semester
was eye-opening. It will provide valuable information for the instructor the next time she
teaches the course, as she can attempt to remove functions only shown once, and ensure
the cheatsheets better match what is actually shown throughout the semester.
3.4 Pre- and post-survey
As discussed in 2.1, the number of students who completed both the pre- and post-surveys
were low, so there is limited generalizability of the paired analysis.
The majority of the survey was modeled on a pre- and post-survey used by the Carpen-
tries, a global nonprofit teaching coding skills (Carpentries 2021). Questions ask respon-
dents to use a 5-step Likert scale, from 1 (strongly disagree) to 5 (strongly agree) to rate
their agreement with the following statements:
I am confident in my ability to make use of programming software to work with data
Having access to the original, raw data is important to be able to repeat an analysis
Using a programming language (like R) can make me more efficient at working with
While working on a programming project, if I get stuck, I can find ways of overcoming
the problem
Using a programming language (like R) can make my analysis easier to reproduce
I know how to search for answers to my technical questions online
In Figure 3, you can see a visualization of these Likert-scale questions, split by section.
It is difficult to gather much of a conclusion from this figure. Many categories appear to
have made an improvement, while others seem to show a decrease in agreement from the
pre- to the post-survey. Additionally, the figure shows overall trends in the sections, and
does not utilize the potential for matching pre- and post-responses from the same student
to measure change at the individual level.
To consider this individual-level change, we can compute the difference between a stu-
dent’s response on the pre- and post-survey. We compute post score pre score such that
positive differences mean the student’s attitude on the item improved from the beginning
of the class to the end, and negative differences mean they worsened.
Because the questions were on Likert scales, it is not appropriate to compute an arith-
metic mean of the differences, but median scores can be computed. To provide a broader
picture of the distribution of responses, we also compute the 25th and 75th percentiles
Raw Data
Figure 3: Pre and post responses to Likert-scale questions. Most questions show
some level of improvement, such as the first question, ‘I am confident in my ability
to make use of programming software to work with data.’ but others show no change
or even a decline in agreement.
−2 −1 0 1 2 3 −2 −1 0 1 2 3
Search Online
Analyses Easier
Overcome Problem
Programming Efficient
Raw Data
Programming Confident
Difference in Likert rating between
pre− and post−surveys
Figure 4: Distribution of paired differences for student responses to questions. A
score of 0 means the student responded the same way in the pre- and post-surveys,
whereas a negative score means their agreement was lower at the end of the course,
and a positive score means their agreement was higher. The boxes cross 0 for all
except those for ‘I am confident in my ability to make use of programming software
to work with data’, and boxes appear similar between sections.
for each section and score. This information is most easily displayed as a boxplot. The
boxplots in question can be seen in Figure 4.
Because the sample sizes are so small, we will not attempt to use inferential statistics,
but it is worth noting almost all boxes are centered at 0 (meaning the median response did
not change over the course of the semester).
The one question that is an exception to this rule is “I am confident in my ability to make
use of programming software to work with data.” The boxes for both sections are centered
at a median of 1, meaning the median student answered one level up on the question at
the end of the course. Both boxes (the middle 50% of the data) are fully positive, although
the lower whisker (minimum value) for both includes zero.
It is somewhat heartening to know students improved their confidence in programming
over the course of the semester, but there is no clear difference between the sections, so
this does not provide any strong evidence for one syntax or the other.
Likely, the questions used by The Carpentries was inappropriate for this setting, and
a different set of survey questions would have been more appropriate for this group. For
example, this class did not include any explicit instruction on searching for answers online.
This was an intentional choice, because novices typically struggle to identify which search
results are relevant to their queries and get overwhelmed by the multitude of syntactic
options they run across. Instead, students with questions were referred to the “all the R you
need” cheatsheet they had been given at the beginning of the semester, which attempted to
summarize every function they would encounter. Likely, students still attempted to Google
questions, which may be why the responses to this question got more negative over the
course of the semester.
In addition to the six questions asked on both the pre- and post-survey, the two surveys
also had some unique questions.
The pre-survey also asked students to share what they were most looking forward to,
and most nervous about. Both sections had similar responses. Students wrote they looked
forward to “learning how to code!” and “Gaining a better understanding of how to analyze
data.” Beyond worries related to the pandemic, they expressed apprehension about “getting
stuck,” “using R,” and “Figuring out how to do the programming and typing everything
On the post survey, students were asked to report which syntax they had learned, with
an option to respond “I don’t know.” All students in both sections correctly identified the
syntax associated with their lab. Then, they were asked if they would have preferred to
learn the other syntax. We hypothesized many students would say ‘yes,’ thinking the other
syntax would have been easier or lack some feature they found frustrating. Surprisingly,
though, the majority of students in both sections said ‘no,’ they preferred to learn the
syntax they had been shown. Responses to this question are shown in Table 2.
However, part of the explanation is likely that the students did not know what the
other syntax looked like. Throughout the semester, the instructor was careful to only
expose students to the syntax for the particular section. Several students asked to see the
alternate syntax during office hours, but this was the exception and not the norm.
An optional follow-up question asked students why they had responded the way they
did. Responses to this question are shown in Table 3. Several students suggested a cross-
Section Answer n Proportion
formula No 6 0.86
formula Yes 1 0.14
tidyverse No 10 0.91
tidyverse Yes 1 0.09
Table 2: Responses to the question, ‘Would you have preferred to learn the other
0% 20% 40% 60% 0% 20% 40% 60%
About what I expected −−
in a good way
Not what I expected −−
in a good way
About what I expected −−
in a bad way
Not what I expected −−
in a bad way
How was the experience of learning to program in R?
Figure 5: Responses to the question, “How was the experience of learning to program
in R?”
over design for the experiment would have allowed them to better compare, which is both
a good direction for further work (and a possible indication the students were listening
during the chapter on experimental design).
Another question on the post-survey asked students “How was the experience of learning
to program in R?” Overall, students seem to have positive sentiment toward learning R,
whether in the formula or the tidyverse section. As seen in Figure 5, most students said
either the experience was “not what I expected – in a good way” or “About what I expected
– in a good way.”
Nothing from the survey responses seem to indicate a difference between the two sections.
Section Response
formula I’ve heard that formula was more straightforward
formula I thought the syntax that I learned worked well
formula Because I am not familiar with it
formula I have no idea what the differences are, so I don’t really know how to answer this
formula Do not really know what the difference is, but also Prof. M was a very good teacher.
tidyverse I’m not sure I wish we got to experience both so we could compare, maybe learn one
for one half of the semester and the other for the other half?
tidyverse As per my plan to study data Science in graduate school, I would have preferred
learning both syntaxes
tidyverse I really enjoyed tidyverse, it was super easy to learn, and I liked the simplicity of
the syntax
tidyverse Tidy, is well tidy. When looking online the other syntax seemed more
tidyverse Im not sure what the benefit is.
tidyverse I’m not sure of the difference and I had 0 experience of coding or using anything like
r so I didn’t have a preference as to which one I learned.
tidyverse I really enjoyed this class and have learned a lot.
Table 3: Reasons stated by students for their preference of which syntax to learn.
While the pre- and post-survey results do not suggest interesting results, the incidental data
from YouTube and RStudio Cloud provided some insights.
3.5 YouTube analytics
Because of the format of the class, which was flipped such that students watched videos
of pre-recorded content, we can study overall patterns of YouTube watch time. YouTube
offers a data portal which allows for date targeting. We defined each week of the semester
as running from Sunday to Saturday, which covered the time when videos were released
through to the time finished labs needed to be submitted (Fridays at 11:59 pm). For each
week, we downloaded YouTube analytics data for the channel, and filtered the data to focus
only on the videos related to the introductory statistics labs.
Analytics data includes number of watches for each video, number of unique viewers,
and total watch time. We joined this data with data recording the length of the relevant
videos, which allowed us to calculate the approximate proportion of the videos watched by
each student.
Data from YouTube is aggregated, and since videos were posted publicly, could contain
viewers who were not enrolled in the class. However, when we checked view counts of lab
videos on subsequent weeks (e.g., looking at views for the “describing data” lab in weeks
3-15) there were rarely more than two views accumulated per section per week. While
the public nature of the videos means we do need to view these results with a level of
skepticism, we can be reasonably sure the majority of viewers were students. Studying the
data displays some interesting trends.
First, we can look at the number of unique watchers per video, seen in Figure 6. Inter-
estingly, at the start of the semester there are more unique viewers than enrolled students
in the class, but as time goes on, the number of unique viewers levels out at slightly less
than the number of enrolled students (n= 21 for both sections). The lower numbers later
on make sense because some students were likely unengaged, or found it possible to do
their lab work without watching the video. However, the high numbers at the start of the
semester are puzzling. Perhaps students were viewing the videos from a variety of devices
(phone, laptop, computer at school, etc) when the semester began.
4 8 12 16
Number of unique viewers
formula tidyverse
Figure 6: Average number of unique viewers per video. Horizontal line represents
the 21 students enrolled in each of the sections, a baseline for comparison.
If we assume all viewers were actually students (some students being counted as sepa-
rate viewers because of different devices or cookie settings), we can find an approximate
proportion of video content watched, per student. This is shown in Figure 7. It appears the
proportion of video content watched is larger for the formula videos than for the tidyverse
videos. This can be confirmed by a 95% bootstrap interval, which suggests the formula
section watched between 0 and 0 percentage points more of the videos each week.
The discrepancy in watch proportions could be explained by the fact that videos for
the tidyverse section tended to be longer, as discussed in Section 3.2. Prior research has
shown shorter videos are better for flipped classroom settings, so perhaps the videos for the
tidyverse section were just too long. Literature about flipped classrooms suggests shorter
videos are better, although there is no consensus about the ideal length for videos, with
suggestions ranging from 5 to 20 minutes as a maximum length for a video (Zuber 2016,
Beatty et al. 2019,Guo et al. 2014). Most weeks the total number of minutes of video
content was below 20, and almost every week had video content split into multiple shorter
No matter the explanation, this trend is particularly interesting when considered in
4 8 12 16
Week of semester
Approximate proportion of video content watched, per student
formula tidyverse
Figure 7: Estimated proportion of YouTube video content watched, per student.
This data came from dividing the total amount of time watched by the number of
students in each section and the total length of the video(s) for the section that
conjunction with the RStudio Cloud usage patterns in the following section.
3.6 RStudio Cloud usage
The other source of unexpected data came from RStudio Cloud usage logs. RStudio Cloud
provides summary data per user in a project, aggregated by calendar month. This data
includes all students enrolled in the class.
Since the instructor set up separate projects for each section, it is easy to compare data
between sections. In Figure 8we can see the amount of compute time used by each student
in each section. Lines connect data from a particular student, to allow the reader to trace
over time. For a monthly overview, see Figure 9.
Note that the month of November is missing for the tidyverse section because of an
oversight on the part of the author.
While the tidyverse section seemed to watch less of the provided videos each week (as
September October November December
Hours of compute time on RStudio Cloud
Figure 8: Hours of compute time per student over the course of the semester.
0 10 20 30 40 50 0 10 20 30 40 50
Hours of compute time on RStudio Cloud
formula tidyverse
Figure 9: Hours of compute time on RStudio Cloud, per month of the semester.
Students in the tidyverse section appear to be spending more time on RStudio
Cloud, particularly in the months of October and December.
section September October November December
formula 10.4 (3.3) 13.9 (10.3) 9.4 (6) 7.7 (6)
tidyverse 7.7 (4.7) 17.1 (8.6) missing 11.5 (7.2)
Table 4: Mean student compute time on RStudio Cloud per month in hours (stan-
dard deviation in parentheses), broken down by section. Note different months had
different numbers of assignments, although the number of assignments was consistent
between sections
discussed in Section 3.5), they appear to spend more time on RStudio Cloud per month.
All the distributions are right-skewed, with several students spending many more hours
of compute time than the majority. It is also important to note these numbers are likely
inflated based on the way RStudio Cloud counts usage time. The spaces for both sections
were allocated 1 GB of RAM and 1 CPU, so one hour of clock time on the space counted as
one project hour (spaces with more RAM or CPU may consume more than one project hour
per clock hour), but student usage often includes a fair amount of idle time. RStudio Cloud
will put a project to sleep after 15 minutes without interaction, and based on observation
of student habits it is likely almost every session ends with a 15 minute idle time before
the project sleeps. In a month with four labs, this can add up to at least an hour of project
time that does not correspond to students actually using R.
Nevertheless, because the numbers would be inflated in the same way in both sections,
we can persist in comparing them. Using data over the entire semester, students in the
tidyverse section had an mean number of compute hours per month of 13.5 and students
in the formula section had a mean of 11.5 hours.
We can also study these numbers per month, as seen in Table 4. The mean compute
time for both sections increases from September to October, likely because of the increased
number of labs that month (two labs were due in September, five in October). Compute
time then drops down again for the formula section, and continues downward. November
data is missing for the tidyverse section, but time also appears to decrease in this section
as months progress, although not to the same degree as in the formula section.
Whereas in the pre- and post-surveys we have quite small sample sizes, the RStudio
Cloud data includes all students enrolled in the class. This means we perhaps have a large
effect group term estimate std.error statistic
fixed NA (Intercept) 11.381885 1.556911 7.3105558
fixed NA sectiontidyverse -1.976604 2.175435 -0.9086018
fixed NA monthOctober 4.359535 1.653232 2.6369779
fixed NA monthNovember -1.715090 1.653232 -1.0374167
fixed NA monthDecember -2.300425 1.653232 -1.3914717
fixed NA sectiontidyverse:monthOctober 4.899422 2.310021 2.1209425
fixed NA sectiontidyverse:monthDecember 5.200658 2.310021 2.2513466
ran pars ID sd (Intercept) 4.598662 NA NA
ran pars Residual sd Observation 5.227977 NA NA
Table 5: Linear mixed-effects, using month as a categorical variable.
enough sample to perform inferential statistics.
Data was collected at the student level over time, so it is necessary to use a mixed effects
model to account for clustering within students. We also need to take into account the
longitudinal nature of the data, so we included month as a predictor. We use the lme4
package to fit the linear mixed effect models (Bates et al. 2015).
Initially, we fit an unconditional means model, to determine how much variability in
compute time was due to differences between students, without considering differences over
time or between section. Based on the intraclass correlation coefficient, we can conclude
30% of the total variation in compute time is attributable to differences between students.
After iterating through several candidate models, we arrived at a final model which pre-
dicts compute time per month (in hours) using section and month as fixed effect predictors,
as well as an interaction effect between section and month. Student identifier was used as a
random effect. This final model has the lowest AIC and BIC values of all candidate models.
Results from the model can be seen in Table 5.
The predicted values for each section/month combination match the means computed
in Table 4.
The lme4 package does not provide p-values for model coefficients, but it does provide
a method for confidence intervals. The confidence intervals for each of the coefficients are
shown in Table 6.
2.5 % 97.5 %
.sig01 3.2512430 6.0590086
.sigma 4.4708436 5.8874342
(Intercept) 8.3756022 14.3881678
sectiontidyverse -6.1772116 2.2240035
monthOctober 1.1696135 7.5494564
monthNovember -4.9050115 1.4748314
monthDecember -5.4903465 0.8894964
sectiontidyverse:monthOctober 0.4422206 9.3566237
sectiontidyverse:monthDecember 0.7434568 9.6578598
Table 6: Confidence intervals for coefficient estimates.
The confidence interval on the sectiontidyverse coefficient crosses zero, which sug-
gests the difference in number of hours of compute time between the sections in September
was not statistically significant. The confidence interval on monthOctober does not cross
zero, suggesting students in the formula section spent longer on RStudio Cloud that month
compared to September. But, the intervals for the formula section in November and De-
cember cross zero, which means the number of compute hours is not significantly different
from the number of hours in September for that section. For the tidyverse section it
is a little harder to assess. The intervals for the sectiontidyverse:monthOctober and
sectiontidyverse:monthDecember intervals do not cross zero, but if combined with the
intervals on monthOctober and monthDecember, they would.
As a model assessment strategy, we can use a likelihood ratio test to compare the
unconditional means model with our more complex model. A drop-in-deviance test suggests
the more complex model significantly outperforms the unconditional means model.
Based on the significance of the drop-in-deviance test, and the number of confidence
intervals in the model that did not cross zero, it seems both month and section have some
predictive power for the number of compute hours students used on RStudio Cloud.
It appears students in the tidyverse section spent more time on RStudio Cloud. We
can concoct several different scenarios to explain this difference. In one, students in the
tidyverse section were more engaged with their work, so spent more time playing with
code in R. In another, students in the tidyverse section struggled to complete their work,
so spent more time in R trying to get their lab material to work. Because the usage data
was collected incidentally after the fact, we have no information about which story is closer
to the truth. A follow-up study might conduct semi-structured interviews with students
after the completion of the class, to determine more about student experiences and work
It would also be interesting to know if students who spent more time on RStudio Cloud
received higher or lower grades on their assignments, but as discussed in Section 3.1, the
IRB for this study did not cover graded student work in that way. We do know the two
sections did not have an overall difference in mean grade.
Since these results are from a pilot study, they should not be used without caveats.
However, they do indicate that if instructors are worried about the amount of time assign-
ments take to complete, they may want to consider using the formula syntax rather than
the tidyverse syntax.
Another follow-up study that would be interesting to complete would look at student
success in subsequent courses. Because tidyverse syntax is frequently used for higher-
level courses, students who were in the tidyverse section may have an easier time in
those later courses. However, many students in this study will not go on to take further
statistics courses. So the takeaways about syntax choice may vary depending on the student
population to which they will be applied.
4 Discussion
This pilot study provides a semester-long comparison of two sections of introductory statis-
tics labs using two popular R coding styles, the formula syntax and the tidyverse syntax.
Pre- and post-survey analysis showed limited differences between the two sections, but
analysis of other incidental data, including pre-lab document lengths and YouTube and
RStudio Cloud data presented interesting distinctions.
Materials for the tidyverse section tended to be longer, both in lines of code (likely
because of the convention of linebreaks after %>%) as well as the length of the associated
YouTube videos. Students in the tidyverse section watched a smaller proportion of the
weekly pre-lab videos than students in the formula section, but spent more time computing
on RStudio. Conversely, students in the formula section were watching a larger proportion
of the pre-lab videos each week, but spending less time computing each month.
These two insights are slightly contradictory– perhaps the formula section students found
the concepts more complex as they were watching the videos, but then had an easier time
applying them as they worked on the real lab.
There is much more interesting further work that could be considered. As students
suggested, a cross-over design where students saw one syntax for the first half of the semester
and the other for the second half would allow for better comparisons. However, there are
a few caveats here.
First, anecdotal evidence from many instructors suggests it is best for students to see
only one consistent syntax over the course of the semester. The other challenge is the
formula syntax tends to seep (albeit only minorly) into the tidyverse section. For example,
when doing linear regression both sections saw the lm(y~x, data = data) formula syntax.
If a cross-over design used the existing materials from this study, just swapping the final
few weeks, students in the formula section would likely see more that was familiar to them
than students in the tidyverse section.
By this consideration, the tidyverse students almost did have a cross-over design. This
may be why the number of hours of compute time for the tidyverse section remained
consistent from November to December (even though there were fewer instructional weeks
in December) while the formula section’s hours of compute time decreased.
Another interesting insight from this pilot is the number of unique functions needed to
cover a semester of introductory statistics in R. The tidyverse section saw more unique
functions, but both sections were limited to a small vocabulary of functions for the semester.
We recommend instructors follow this approach regardless of syntax. Instructors should
attempt to reduce the number of functions they expose students to over the course of a
semester, particularly in an introductory class. This will help reduce cognitive load.
One criticism of the tidyverse is how many functions the associated packages contain.
However, while the tidyverse section exposed students to 32, compared to the 19 functions
shown in the formula section, both labs focused on a relatively small number of functions.
Because there were 12 labs in the semester, this averages out to approximately 3 functions
per lab for the tidyverse section compared to an average 2 functions shown in the formula
The exercise of counting R functions in existing materials, using the getParseData()
function, is one we recommend all instructors attempt, particularly before re-teaching a
course. It can be eye-opening to discover how many functions you show students, and
which functions are only used once.
We hope this pilot helps answer some initial questions about the impact of R syntax on
teaching introductory statistics, while also raising further questions for future study. While
some aspects of the analysis in this study suggest the formula syntax is simpler for students
to learn and use, there are still many course scenarios for which we believe the tidyverse
syntax is the most appropriate choice. While formula syntax can be used throughout an
entire semester of introductory statistics, it does not offer functionality for tasks like data
wrangling. This means students who will go on to additional statistics or data science
classes may be better served by an early introduction to tidyverse. However, in order to
determine this conclusively, additional study would be needed.
No matter which syntax an instructor chooses, it appears possible to limit the number
of functions shown in a semester, and provide students with a positive learning experience.
5 Acknowledgements
Thanks to Sean Kross for his guidance about parsing R function data, and Nick Horton
for his useful comments.
A Functions used
(a) Used in both sections
gf bar
gf boxplot
gf histogram
gf point
(b) Used only in formula
as factor
chisq test
drop na
geom bar
geom boxplot
geom histogram
geom point
get ci
get p value
group by
prop test
read csv
t test
(c) Used only in tidyverse
Table 7: Lists of functions, and which section(s) they were used in.
Adhikari, A., DeNero, J. & Jordan, M. I. (2021), ‘Interleaving Computational and Inferen-
tial Thinking: Data Science for Undergraduates at Berkeley’, arXiv:2102.09391 [cs] .
Bates, D., achler, M., Bolker, B. & Walker, S. (2015), ‘Fitting Linear Mixed-Effects
Models Using lme4’, Journal of Statistical Software 67(1).
Beatty, B. J., Merchant, Z. & Albert, M. (2019), ‘Analysis of Student Use of Video in a
Flipped Classroom’, TechTrends 63(4), 376–385.
Biehler, R. (1997), ‘Software for Learning and for Doing Statistics’, International Statistical
Review 65(2), 167–189.
Bray, A., Ismay, C., Chasnovski, E., Baumer, B. & Cetinkaya-Rundel, M. (2021), Infer:
Tidy Statistical Inference.
Carpentries, T. (2021), ‘The Carpentries Survey Archives’.
C¸ etinkaya-Rundel, M., Hardin, J., Baumer, B. S., McNamara, A., Horton, N. J. & Rundel,
C. (2021), ‘An educator’s perspective of the tidyverse’, arXiv:2108.03510 [stat] .
DeNero, J., Culler, D., Wan, A. & Lau, S. (2020), ‘datascience 0.15.7’.
Finzer, W. (2002), ‘Fathom: Dynamic Data Software (version 2.1)’, Key Curriculum Press.
GAISE College Report ASA Revision Committee (2016), Guidelines for Assessment and
Instruction in Statistics Education College Report 2016, American Statistical Associa-
Gramazio, C. C., Laidlaw, D. H. & Schloss, K. B. (2017), ‘Colorgorical: Creating discrim-
inable and preferable color palettes for information visualization’, IEEE Transactions on
Visualization and Computer Graphics 23(1), 521–530.
Guo, P. J., Kim, J. & Rubin, R. (2014), How video production affects student engagement:
An empirical study of MOOC videos, in ‘Proceedings of the First ACM Conference on
Learning @ Scale Conference’, ACM, Atlanta Georgia USA, pp. 41–50.
Harrower, M. & Brewer, C. A. (2003), ‘ An Online Tool for Selecting
Colour Schemes for Maps’, The Cartographic Journal 40(1), 27–37.
Horst, A. M., Hill, A. P. & Gorman, K. B. (2020), ‘Palmerpenguins: Palmer Achipelago
(Antarctica) penguin data. R package version 0.1.0’, Zenodo.
Kaplan, D. & Pruim, R. (2020), Ggformula: Formula Interface to the Grammar of Graph-
Konold, C. & Miller, C. D. (2001), ‘TinkerPlots (version 0.23). Data Analysis Software.’.
Krishnamurthi, S., Schanzer, E., Politz, J. G., Lerner, B. S., Fisler, K. & Dooman, S. (2020),
‘Data Science as a Route to AI for Middle- and High-School Students’, arXiv:2005.01794
[cs] .
McNamara, A. (2015), Bridging the Gap Between Tools for Learning and for Doing Statis-
tics, PhD thesis, University of California, Los Angeles.
McNamara, A. (2018), ‘R Syntax Comparison Cheatsheet’.
McNamara, A., Zieffler, A., Beckman, M., Legacy, C., Butler Basner, E., delMas, R. C. &
Rao, V. V. (2021a), Computing in the Statistics Curriculum: Lessons Learned from the
Educational Sciences, in ‘USCOTS 2021’.
McNamara, A., Zieffler, A., Beckman, M., Legacy, C., Butler Basner, E., delMas, R. &
Rao, V. V. (2021b), ‘Computing in the Statistics Curriculum: Lessons Learned from the
Educational Sciences’.
Morandat, F., Hill, B., Osvald, L. & Vitek, J. (2012), Evaluating the Design of the R
Language: Objects and Functions For Data Analysis, in ‘ECOOP’12 Proceedings of the
26th European Conference on Object-Oriented Programming’.
Peters, T. (2004), ‘PEP 20 – The Zen of Python’.
Pruim, R., Kaplan, D. & Horton, N. J. (2017), ‘The mosaic package: Helping students
‘think with data’ using R’, The R Journal 9(1).
R Core Team (2020), R: A Language and Environment for Statistical Computing, R Foun-
dation for Statistical Computing, Vienna, Austria.
Rafalski, T., Uesbeck, P. M., Panks-Meloney, C., Daleiden, P., Allee, W., McNamara, A.
& Stefik, A. (2019), A Randomized Controlled Trial on the Wild Wild West of Scientific
Computing with Student Learners, in ‘Proceedings of the 2019 ACM Conference on
International Computing Education Research’, pp. 239–247.
Roberts, S. (2015), Measuring Formative Learning Behaviors of Introductory Statistical
Programming in R via Content Clustering, PhD thesis, University of California, Los
RStudio PBC (2021), ‘RStudio Cloud - Do, Share, Teach, and Learn Data Science’.
Stefik, A. & Siebert, S. (2013), ‘An Empirical Investigation into Programming Language
Syntax’, ACM Transactions on Computing Education 13(4).
Stefik, A., Siebert, S., Stefik, M. & Slattery, K. (2011), An Empirical Comparison of
the Accuracy Rates of Novices using the Quorum, Perl and Randomo Programming
Languages, in ‘PLATAEU 2011’.
The Concord Consortium (2020), ‘CODAP - Common Online Data Analysis Platform’.
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., Fran¸cois, R., Grole-
mund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache,
S. M., M¨uller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., Takahashi, K.,
Vaughan, D., Wilke, C., Woo, K. & Yutani, H. (2019), ‘Welcome to the Tidyverse’,
Journal of Open Source Software 4(43), 1686.
Zuber, W. J. (2016), ‘The flipped classroom, a review of the literature’, Industrial and
Commercial Training 48(2), 97–103.
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
The mosaic package provides a simplified and systematic introduction to the core functionality related to descriptive statistics, visualization, modeling, and simulation-based inference required in first and second courses in statistics. This introduction to the package describes some of the guiding principles behind the design of the package and provides illustrative examples of several of the most important functions it implements. These can be combined to help students "think with data" using R in their early course work, starting with simple, yet powerful, declarative commands.
Full-text available
Maximum likelihood or restricted maximum likelihood (REML) estimates of the parameters in linear mixed-effects models can be determined using the lmer function in the lme4 package for R. As for most model-fitting functions in R, the model is described in an lmer call by a formula, in this case including both fixed- and random-effects terms. The formula and data together determine a numerical representation of the model from which the profiled deviance or the profiled REML criterion can be evaluated as a function of some of the model parameters. The appropriate criterion is optimized, using one of the constrained optimization functions in R, to provide the parameter estimates. We describe the structure of the model, the steps in evaluating the profiled deviance or REML criterion, and the structure of classes or types that represents such a model. Sufficient detail is included to allow specialization of these structures by users who wish to write functions to fit specialized linear mixed models, such as models incorporating pedigrees or smoothing splines, that are not easily expressible in the formula language used by lmer.
Conference Paper
Full-text available
Videos are a widely-used kind of resource for online learning. This paper presents an empirical study of how video production decisions affect student engagement in online educational videos. To our knowledge, ours is the largest-scale study of video engagement to date, using data from 6.9 million video watching sessions across four courses on the edX MOOC platform. We measure engagement by how long students are watching each video, and whether they attempt to answer post-video assessment problems. Our main findings are that shorter videos are much more engaging, that informal talking-head videos are more engaging, that Khan-style tablet drawings are more engaging, that even high-quality pre-recorded classroom lectures might not make for engaging online videos, and that students engage differently with lecture and tutorial videos. Based upon these quantitative findings and qualitative insights from interviews with edX staff, we developed a set of recommendations to help instructors and video producers take better advantage of the online video format. Finally, to enable researchers to reproduce and build upon our findings, we have made our anonymized video watching data set and analysis scripts public. To our knowledge, ours is one of the first public data sets on MOOC resource usage.
Conference Paper
Scientific computing has become an area of growing importance. Across fields such as biology, education, physics, or others, people are increasingly using scientific computing to model and understand the world around them. Despite the clear need, almost no systematic analysis has been conducted on how students in fields outside of computer science learn to program in the context of scientific computing. Given that many fields do not explicitly teach much programming to their students, they may have to learn this important skill on their own. To help, using rigorous quantitative and qualitative methods, we looked at the process 154 students followed in the context of a randomized controlled trial on alternative styles of programming that can be used in R. Our results suggest that the barriers students face in scientific computing are non-trivial and this work has two core implications: 1) students learning scientific computing on their own struggle significantly in many different ways, even if they have had prior programming training, and 2) the design of the current generation of scientific computing feels like the wild-wild west and the designs can be improved in ways we will enumerate.
For this study the authors explored the use of pre-recorded lecture videos by students in a large introductory management course using a flipped classroom design to determine patterns of video use and to explore the potential relationship between student use of videos (video viewing) and student learning measured by grades and student satisfaction as measured by an end-of-course survey. A quantitative study approach was used, using frequency counts of student video use, student grades on three exams, and student responses to a 20-question survey. Correlation analysis results indicate that student use of videos varied significantly along several aspects – across the multiple video segments addressing a course topic, by time of day and day of week, and during the full course term. Implications for effective use of pre-recorded videos when implementing the flipped classroom model are discussed along with directions for much-needed future research.
We present an evaluation of Colorgorical, a web-based tool for creating discriminable and aesthetically preferable categorical color palettes. Colorgorical uses iterative semi-random sampling to pick colors from CIELAB space based on user-defined discriminability and preference importances. Colors are selected by assigning each a weighted sum score that applies the user-defined importances to Perceptual Distance, Name Difference, Name Uniqueness, and Pair Preference scoring functions, which compare a potential sample to already-picked palette colors. After, a color is added to the palette by randomly sampling from the highest scoring palettes. Users can also specify hue ranges or build off their own starting palettes. This procedure differs from previous approaches that do not allow customization (e.g., pre-made ColorBrewer palettes) or do not consider visualization design constraints (e.g., Adobe Color and ACE). In a Palette Score Evaluation, we verified that each scoring function measured different color information. Experiment 1 demonstrated that slider manipulation generates palettes that are consistent with the expected balance of discriminability and aesthetic preference for 3-, 5-, and 8-color palettes, and also shows that the number of colors may change the effectiveness of pair-based discriminability and preference scores. For instance, if the Pair Preference slider were upweighted, users would judge the palettes as more preferable on average. Experiment 2 compared Colorgorical palettes to benchmark palettes (ColorBrewer, Microsoft, Tableau, Random). Colorgorical palettes are as discriminable and are at least as preferable or more preferable than the alternative palette sets. In sum, Colorgorical allows users to make customized color palettes that are, on average, as effective as current industry standards by balancing the importance of discriminability and aesthetic preference.
Purpose – The purpose of this paper is to explore a set of literature in order to clarify the flipped classroom methods (FCM) theoretical frameworks and to determine if the evidence shows improvements in learning for students in comparison with traditional teaching methods. Design/methodology/approach – The paper took a literature review approach and explored five articles selected with specific criteria of being published within 2013-2014 and that used comparisons with flipped and traditional classroom methods that employed analysis of student assessment outcomes. Findings – The paper shows inconsistent theoretical frameworks and inconclusive evidence of an improvement in assessment outcomes for students. It finds the research undertaken in the literature is limited in scope and suggests further research into the FCM is required to determine consistent theoretical frameworks and methods. Research limitations/implications – The findings of the paper may be limited by the selection of literature reviewed and generalisability therefore researchers are encouraged to explore further. Practical implications – The paper holds potential implications to question the consistency, validity and benefits of the flipped classroom. Social implications – Many anecdotal articles herald the flipped classroom as a method of improving learning outcomes for students, however, academic literature suggests the evidence is inconclusive and there are implications on using educational methods based on technology. Originality/value – The paper identifies the need for further research into the flipped classroom and supports the advancement of educational methodology.
Conference Paper
R is a dynamic language for statistical computing that combines lazy functional features and object-oriented programming. This rather unlikely linguistic cocktail would probably never have been prepared by computer scientists, yet the language has become surprisingly popular. With millions of lines of R code available in repositories, we have an opportunity to evaluate the fundamental choices underlying the R language design. Using a combination of static and dynamic program analysis we assess the success of different language features.
Recent studies in the literature have shown that syntax remains a significant barrier to novice computer science students in the field. While this syntax barrier is known to exist, whether and how it varies across programming languages has not been carefully investigated. For this article, we conducted four empirical studies on programming language syntax as part of a larger analysis into the, so called, programming language wars. We first present two surveys conducted with students on the intuitiveness of syntax, which we used to garner formative clues on what words and symbols might be easy for novices to understand. We followed up with two studies on the accuracy rates of novices using a total of six programming languages: Ruby, Java, Perl, Python, Randomo, and Quorum. Randomo was designed by randomly choosing some keywords from the ASCII table (a metaphorical placebo). To our surprise, we found that languages using a more traditional C-style syntax (both Perl and Java) did not afford accuracy rates significantly higher than a language with randomly generated keywords, but that languages which deviate (Quorum, Python, and Ruby) did. These results, including the specifics of syntax that are particularly problematic for novices, may help teachers of introductory programming courses in choosing appropriate first languages and in helping students to overcome the challenges they face with syntax.