Available via license: CC BY-SA 4.0

Content may be subject to copyright.

Teaching modeling in introductory statistics:

A comparison of formula and tidyverse

syntaxes

Amelia McNamara ∗

Department of Computer & Information Sciences, University of St Thomas

February 1, 2022

Abstract

This paper reports on an experiment run in a pair of introductory statistics labs,

attempting to determine which of two R syntaxes was better for introductory teach-

ing and learning: formula or tidyverse. One lab was conducted fully in the formula

syntax, the other in tidyverse. Analysis of incidental data from YouTube and RStudio

Cloud show interesting distinctions. The formula section appeared to watch a larger

proportion of pre-lab YouTube videos, but spend less time computing on RStudio

Cloud. Conversely, the tidyverse section watched a smaller proportion of the videos

and spent more time on RStudio Cloud. Analysis of lab materials showed that tidy-

verse labs tended to be slightly longer (in terms of lines in the provided RMarkdown

materials, as well as minutes of the associated YouTube videos), and the tidyverse

labs exposed students to more distinct R functions. However, both labs relied on a

quite small vocabulary of consistent functions. Analysis of pre- and post-survey data

show no diﬀerences between the two labs, so students appeared to have a positive ex-

perience regardless of section. This work provides additional evidence for instructors

looking to choose between syntaxes for introductory statistics teaching.

Keywords: R language, instruction, data science, statistical computing

∗amelia.mcnamara@stthomas.edu

1

arXiv:2201.12960v1 [stat.CO] 31 Jan 2022

1 Introduction

When teaching statistics and data science, it is crucial for students to engage authentically

with data. The revised Guidelines for Assessment and Instruction in Statistics Education

(GAISE) College Report provides recommendations for instruction, including “Integrate

real data with a context and purpose” and “Use technology to explore concepts and analyze

data” (GAISE College Report ASA Revision Committee 2016). Many instructors have

students engage with data using technology through in-class experiences or separate lab

activities.

An important pedagogical decision when choosing to teach data analysis is the choice

of tool. There has long been a divide between ‘tools for learning’ and ‘tools for doing’ data

analysis (McNamara 2015). Tools for learning include applets, and standalone software like

TinkerPlots, Fathom, or their next-generation counterpart CODAP (Konold & Miller 2001,

Finzer 2002,The Concord Consortium 2020). Tools for doing are used by professionals,

and include software packages like SAS as well as programming languages like Julia, R,

and Python.

Many tools for learning were inspired by Rolf Biehler’s 1997 paper, “Software for Learn-

ing and for Doing Statistics” (Biehler 1997). In it, Biehler called for more attention to the

design of tools used for teaching. In particular, he was concerned with on-ramps for stu-

dents (ensuring the tool was not too complex), as well as oﬀ-ramps (using one tool through

an entire class, which could also extend further) (Biehler 1997). At the time he wrote the

paper it was quite diﬃcult to teach using an authentic tool for doing, because these tools

lacked technological or pedagogical on-ramps.

However, recent developments in Integrated Development Environments (IDEs) and

pedagogical advances have opened space for a movement to teach even novices statistics

and data science using programming. In particular, curricula using Python and R have

become popular. In these curricula, educators make pedagogical decisions about what code

to show students, and how to scaﬀold it. In both the Python and R communities, there

have been movements to simplify syntax for students.

For example, the UC Berkeley Data 8 course uses Python, including elements of the

commonly-used matplotlib and numpy libraries as well as a specialized library written to

2

accompany the curriculum called datascience (Adhikari et al. 2021,DeNero et al. 2020).

The datascience library was designed to reduce complexity in the code. At the K-12 level,

the language Pyret has been developed as a simpliﬁed version of Python to accompany the

Bootstrap Data Science curriculum (Krishnamurthi et al. 2020).

In R, the development of less-complex code for students has been under consideration

for even longer. R oﬀers non-standard evaluation, which allows package authors to create

new ‘syntax’ for their packages (Morandat et al. 2012). In human language, syntax is the

set of rules for how words and sentences should be structured. If you use the wrong syntax

in human language, people will probably still understand you, but they will be able to

hear there is something wrong with how you structured your speech or writing. Syntax in

programming languages is even more formal– it governs what code will execute, run, or

compile correctly. Using the wrong syntax means getting an error from the language.

Typically, programming languages have only one valid syntax. For example, an aphorism

about the language Python is “There should be one– and preferably only one –obvious way

to do it” (Peters 2004). But, non-standard evaluation in R has allowed there to be many

obvious ways to do the same task. There is some disagreement over whether syntax is a

precise term for these diﬀerences. Other terms suggested for these variations in valid R

code are ‘dialects,’ ‘interfaces,’ and ‘domain speciﬁc languages.’ Throughout this paper, we

use the term syntax as a shorthand for these concepts. At present, there are three primary

syntaxes used: base, formula, and tidyverse (McNamara 2018).

The base syntax is used by the base R language (R Core Team 2020), and is characterized

by the use of dollar signs and square brackets. The formula syntax uses the tilde to separate

response and explanatory variable(s) (Pruim et al. 2017). The tidyverse syntax uses a

data-ﬁrst approach, and the pipe to move data between steps (Wickham et al. 2019).

A comparison of using the three syntaxes for univariate statistics and displays can be seen

in Code Block 1.1. This example code, like the rest in this paper, uses the palmerpenguins

data (Horst et al. 2020). All three pieces of code accomplish the same tasks, and all three

use the R language. But, the syntax varies considerably.

3

# base syntax

hist(penguins$bill_length_mm)

mean(penguins$bill_length_mm)

# formula syntax

gf_histogram(~bill_length_mm, data = penguins)

mean(~bill_length_mm, data = penguins)

# tidyverse syntax

ggplot(penguins) +

geom_histogram(aes(x = bill_length_mm))

penguins %>%

summarize(mean(bill_length_mm))

Code Block 1.1: Making a histogram of bill length from the penguins dataset, then

taking the mean, using three diﬀerent R syntaxes. Base syntax is characterized by

the dollar sign, formula by the tilde, and tidyvese is dataframe-ﬁrst. In order for

this code to run as-is, missing (NA) values need to be dropped before the code is

run.

There is some agreement about pedagogical decisions for teaching R. In particular, most

educators agree that in order to reduce cognitive load, instructors should only teach one

syntax, and to be as consistent as possible about that syntax (McNamara et al. 2021a).

There is also some agreement base R syntax is not the appropriate choice for introduc-

tory statistics, but there is widespread disagreement on whether the formula syntax or

tidyverse syntax is better for novices.

While there are strongly-held opinions on which syntax should be taught (Pruim et al.

2017,C¸ etinkaya-Rundel et al. 2021), there is relatively little empirical evidence to support

these opinions. In the realm of computer science, empirical studies by Andreas Steﬁk,

et al have shown signiﬁcant diﬀerences in the intuitiveness of languages, as well as error

rates, based on language design choices (Steﬁk et al. 2011,Steﬁk & Siebert 2013). Thus, it

seems likely there are language choices that could make data science programming easier

(or harder) for users, particularly novices.

Steﬁk’s team is working to add data science functionality to their evidence-based pro-

gramming language. As a ﬁrst step toward understanding which elements of existing lan-

guages might be best to emulate, they ran an experiment comparing the three main R

4

syntaxes (Rafalski et al. 2019). The study showed no statistically signiﬁcant diﬀerence

between any of the three syntaxes with regard to time to completion or number of er-

rors. However, there were signiﬁcant interaction eﬀects between syntax and task, which

suggested some syntaxes might be more appropriate for certain tasks (Rafalski et al. 2019).

Beyond this, examining the results from the study with an eye toward data science ped-

agogy showed common errors made by students related to their conceptions of dataframes

and variables. For example, one of the ﬁgures from Rafalski et al. (2019) shows real stu-

dent code with errors. In the ﬁrst line of code, the student gets everything correct using

formula syntax, with the exception of the name of the dataframe. When that code does

not work, they try again using base R syntax, but again get the dataframe name wrong.

After both those failures, they appear to fall back on computer science knowledge and try

syntax quite diﬀerent from R. This is consistent with other studies of novice behavior in R

(Roberts 2015). It is not clear if this type of error was dependent on the syntax participants

were asked to use.

The other missing element in this study was instruction. The study was a quick inter-

vention showing students examples of a particular syntax, then asking them to duplicate

that syntax in a new situation. But without any instruction about data science concepts

like dataframes, it would be diﬃcult to troubleshoot the syntax error mentioned above.

The work served as the inspiration for the longer comparison of multiple R syntaxes in the

classroom context described in this paper.

The remainder of this paper is organized into three sections. Section 2describes the

setup of the study, the participants (2.1) and their experience (2.2), and the content of

the course under investigation (2.3). Section 3contains results of the analysis, including a

comparison of material lengths between the sections (3.2), the number of unique functions

shown in each section (3.3), results from the pre- and post-survey (3.4), and analysis of

YouTube (3.5) and RStudio Cloud (3.6) data. Finally, Section 4discusses the results and

opportunities for future study.

All materials used for this study are available on GitHub and are Creative Commons

licensed, so they can be used or remixed by anyone who wants to use them. All code

and anonymized data from this paper is also available on GitHub, for reproducibility. Data

5

analysis was performed in R, and the paper is written in RMarkdown. The categorical color

palette was chosen using Colorgorical (Gramazio et al. 2017), and colors for the Likert scale

plot are from ColorBrewer (Harrower & Brewer 2003). Example data used throughout the

paper is from palmerpenguins (Horst et al. 2020). Packages used for the formula section

were mosaic and ggformula (now loaded automatically with mosaic), for the tidyverse

section the tidyverse and infer packages (Pruim et al. 2017,Kaplan & Pruim 2020,

Wickham et al. 2019,Bray et al. 2021).

2 Methods

The author ran a pilot study in her introductory statistics labs. This study was run twice,

once in the Spring 2020 semester and once in the Fall 2020 semester. The disruption of

COVID-19 to the Spring 2020 semester made the resulting data unusable, so this paper

focuses on just Fall 2020 data.

Data was collected from YouTube analytics for watch times, from RStudio Cloud for

aggregated compute time, and from pre- and post-surveys of students. Participants for

the pre- and post-survey were recruited from this pool after Institutional Research Board

ethics review.

2.1 Participants

Participants in the study were students enrolled in an introductory statistics course at a

mid-sized private university in the upper Midwest. At this university, statistics students

enroll in a lecture (approximately 60-90 students per section), which is broken into several

smaller lab sections for hands-on work in statistical software. Lecture and lab sections are

taught by diﬀerent instructors, and the lab sections associated with a particular lecture

often use diﬀerent software. For example, one lab may use Minitab while the other two use

Excel. However, every lab section (no matter what lecture it is associated with, or what

software is used) does the same set of standardized assignments. This structure provides a

consistent basis for comparison.

6

formula tidyverse

No 10 9

Yes, but not with R 2 4

Table 1: Responses from pre-survey about prior programming experience. The

majority of students in both sections had no prior programming experience.

In Fall 2020, the author taught two labs associated with the same lecture section, so all

students saw the same lecture content. (A third lab was associated with the same lecture,

using a diﬀerent software, and was not considered.) Using random assignment (coin ﬂip),

the author selected one lab section to be instructed using formula syntax, and one to be

instructed using tidyverse syntax. The goal was to compare syntaxes head-to-head.

Because the lab took place during the coronavirus pandemic, the instructor recorded

YouTube videos of herself working through the pre-lab documents for each lab, and posted

them in advance. Students watched the videos and worked through the associated pre-lab

RMarkdown document on their own time, then came to synchronous class to ask questions

and get help starting on the real lab assignment. Students used R through the online

platform RStudio Cloud (RStudio PBC 2021).

The two labs were of the same size (n= 21 in both sections) and reasonably similar

in terms of student composition. In both sections, approximately half of students were

Business majors, with the other half a mix of other majors.

Participants for the pre- and post-survey were recruited from this pool after Institutional

Research Board ethics review. For the pre-survey, n= 12 and n= 13 students consented to

participate, and in the post-survey n= 8 and n= 13 responded. So, for paired analysis we

have n= 8 for the formula section, and n= 13 for the tidyverse section. These sample

sizes are very small, and because students could opt-in, may suﬀer from response bias.

However, because we have additional usage data from non-respondents, some elements of

the data analysis include the full class sample sizes of n= 21.

7

2.2 Prior programming experience

To verify both groups of students had similar backgrounds, we compared the prior program-

ming experience of the two groups of students. Table 1shows results from the pre-survey.

While two additional students in the tidyverse section had prior programming experience,

the overall pattern was the same. The majority of students in both sections had no prior

programming experience.

For the students who had programmed before, none had prior experience with R. Three

students had prior experience with Java, 3 with Javascript, and a smaller number had

experience with other languages, including C++ and Python.

2.3 Materials

Each week, the lab instructor prepared a “pre-lab” document in RMarkdown. The pre-

lab covered the topics necessary to complete the standardized lab assignment done by all

students across lab sections. Pre-lab documents included text explanations of statistical

and R programming concepts, sample code, and blanks (both in the code and the text)

for students to ﬁll in as they worked. The instructor recorded YouTube videos of herself

working through the pre-lab documents for each lab, and posted them in advance. Students

were told to watch the pre-lab video and work through the RMarkdown document on their

own time, then come to synchronous class to ask questions and get help starting on the

real lab assignment.

The topics covered in Fall 2020 were as follows:

1. [No lab, short week]

2. Describing data: determining the number of observations and variables in a dataset,

variable types.

3. Categorical variables: exploratory data analysis for one or two categorical variables.

Frequency tables, relative frequency tables, bar charts, two-way tables, and side-by-

side bar charts.

4. Quantitative variables: exploratory data analysis for one quantitative variable. His-

8

tograms, dot plots, density plots, and summary statistics like mean, median, and

standard deviation.

5. Correlation and regression: exploratory data analysis for two quantitative variables.

Correlation, scatterplot, simple linear regression as a descriptive technique.

6. Bootstrap intervals: the use of the bootstrap to construct non-parametric conﬁdence

intervals.

7. Randomization tests: the use of randomization to perform non-parametric hypothesis

tests.

8. Inference for a single proportion: use of the normal distribution to construct conﬁ-

dence intervals and perform hypothesis tests for a single proportion.

9. Inference for a single mean: use of the t-distribution to construct conﬁdence intervals

and perform hypothesis tests for a single mean.

10. Inference for two samples: use of distributional approximations (normal or t) to

perform inference for a diﬀerence of proportions or a diﬀerence of means.

11. [No lab, assessment]

12. [No lab, Thanksgiving]

13. ANOVA: inference for more than two means, using the F distribution.

14. Chi-square: inference for more than two counts, using the χ2distribution

15. Inference for Regression: inference for the slope coeﬃcient in simple linear regression,

prediction and conﬁdence intervals. Multiple regression.

Although this was a 15-week semester, there are only 12 lab topics. Labs were not held

during the ﬁrst week of classes or during Thanksgiving week. Additionally, there were two

“lab assessments” to gauge student understanding of concepts within the context of their

lab software. One took place during ﬁnals week, the other was scheduled in week 11.

9

3 Results

3.1 Summative assessments

One obvious question arising when considering the comparison of the two syntaxes is

whether students performed better in one section or another. The IRB for this study

did not cover examining student work (an obvious place for improved further research),

so we cannot look at student outcomes on a per-assignment basis. However, running a

randomization test for a diﬀerence in overall mean lab grades showed no signiﬁcant diﬀer-

ence between the two sections. While they may have been interesting diﬀerences in grades

depending on the topic of the lab, we at least know these diﬀerences averaged out in the

end.

Similarly, it would be interesting to know if student attitudes about the instructor were

diﬀerent from the summative student evaluations completed by all students at the end of

the semester. These evaluations are anonymous, and the interface only provides summary

statistics. Again, a test for a diﬀerence in means showed no diﬀerence in mean evaluation

score on the questions “Overall, I rate this instructor an excellent teacher.” and “Overall,

I rate this course as excellent.”

3.2 Lab lengths

The ﬁrst question we seek to answer is whether the materials presented to students were

of approximately the same length. We can assess this based on the length of the pre-lab

documents (in lines) and of the pre-lab videos (in minutes).

The length of the pre-lab RMarkdown documents can be measured using lines. Figure

1shows the number of lines of code for each section’s pre-lab document, per week.

It indicates RMarkdown documents for the tidyverse section tended to be longer. We

can compute a diﬀerence in lab lengths for each week, and compute the mean diﬀerence,

which is 19 lines. Because we only have 12 labs worth of data, we used a bootstrap procedure

to generate a conﬁdence interval for the mean of the diﬀerences. The 95% interval is (9,

10

0

50

100

150

200

1 3 5 7 9 11 13 15

Week of semester

Length (in lines) of pre−lab

RMarkdown documents

formula tidyverse

Figure 1: Length of pre-lab RMarkdown documents each week, in lines. Data has

been adjusted for the formula section in weeks 8 and 9, because an instructor error

led this section to have only one document combining both weeks’ work.

29), which indicates labs for the tidyverse section were longer, but only by a few lines.

A slightly longer length for these labs makes sense, because tidyverse code is charac-

terized by multiple short lines strung together into a pipeline with %>%, while the formula

syntax typically has single function calls, sometimes with more arguments.

Then the question becomes if the longer lengths of documents lent themselves to longer

pre-lab videos. Figure 2shows the video lengths, which appear more consistent between

sections. Eﬀort was made to ensure the maximum video length was approximately 20

minutes, and some weeks had multiple videos.

Again, we can compute a pairwise diﬀerence in total video length (adding together

multiple videos in weeks that had them), and compute the mean of that diﬀerence. That

diﬀerence is 2 minutes (tidyverse videos being longer). We can then compute a 95%

bootstrap conﬁdence interval for the diﬀerence, (0.16, 4). Again, it appears ‘tidyverse

videos are longer, although just by a few minutes.

11

tidyverse

formula

1 3 5 7 9 11 13 15

0

10

20

30

0

10

20

30

Week of semester

Total length of pre−lab

videos (minutes)

Figure 2: Length of pre-lab videos each week. Outlines help delineate multiple videos

for a single week.

3.2.1 Divergent labs

One place where the labs are of particularly diﬀerent lengths is in week 3, when the topic

was exploratory data analysis for one and two categorical variables. For the formula section

the RMarkdown document was 134 lines long, and the two videos totaled 28 minutes. The

RMarkdown document for the tidyverse section was 180 lines long, and the videos totaled

35 minutes. There is a clear reason why.

In the formula section, students found frequency tables and relative frequency tables

with code as in Code Block 3.1 and Code Block 3.2.

tally(~island, data = penguins)

tally(~island, data = penguins, format = "percent")

tally(species ~island, data = penguins)

Code Block 3.1: Making tables of one and two categorical variables using the formula

syntax and mosaic::tally().

12

tally(species ~island, data = penguins, format = "percent")

island

species Biscoe Dream Torgersen

Adelie 26.19048 45.16129 100.00000

Chinstrap 0.00000 54.83871 0.00000

Gentoo 73.80952 0.00000 0.00000

Code Block 3.2: Making a table of two categorical variables using the formula

syntax and mosaic::tally() function, almong with the percent option.

The mosaic::tally() function produces a familiar-looking two-way table, which took

very little explanation, other than to show how reversing the variables in the formula led

to diﬀerent percentages, as is seen in Code Block 3.3. Compare Code Block 3.2 and Code

Block 3.3 to see the eﬀect of swapping the order of variables.

tally(island ~species, data = penguins, format = "percent")

species

island Adelie Chinstrap Gentoo

Biscoe 28.94737 0.00000 100.00000

Dream 36.84211 100.00000 0.00000

Torgersen 34.21053 0.00000 0.00000

Code Block 3.3: Making a table of two categorical variables using the formula

syntax and mosaic::tally() function, with variables swapped.

However, in the tidyverse section, both the code and output took longer to explain.

Initial summary statistics for categorical variables are computed in Code Block 3.4, while

the tidy version of a relative frequency table is shown in Code Block 3.5.

penguins %>%

group_by(island) %>%

summarize(n = n())

penguins %>%

group_by(island) %>%

summarize(n = n()) %>%

mutate(prop = n/sum(n))

13

penguins %>%

group_by(island, species) %>%

summarize(n = n())

Code Block 3.4: Computing summary statistics for one and two categorical variables

in the tidyverse syntax.

penguins %>%

group_by(island, species) %>%

summarize(n = n()) %>%

mutate(prop = n/sum(n))

# A tibble: 5 x 4

# Groups: island [3]

island species n prop

<fct> <fct> <int> <dbl>

1 Biscoe Adelie 44 0.262

2 Biscoe Gentoo 124 0.738

3 Dream Adelie 56 0.452

4 Dream Chinstrap 68 0.548

5 Torgersen Adelie 52 1

Code Block 3.5: Computing summary statistics for two categorical variables in the

tidyverse syntax.

Again, reversing the order of the variables (this time, inside the dplyr::group by())

changed the percentages, but it was more diﬃcult to determine how the percents added up,

because the data was in long format, rather than wide format. Compare Code Block 3.5

and Code Block 3.6 to see the eﬀect of swapping the order of variables.

penguins %>%

group_by(species, island) %>%

summarize(n = n()) %>%

mutate(prop = n/sum(n))

# A tibble: 5 x 4

# Groups: species [3]

species island n prop

<fct> <fct> <int> <dbl>

14

1 Adelie Biscoe 44 0.289

2 Adelie Dream 56 0.368

3 Adelie Torgersen 52 0.342

4 Chinstrap Dream 68 1

5 Gentoo Biscoe 124 1

Code Block 3.6: Computing summary statistics for two categorical variables in the

tidyverse syntax, with variables swapped.

A similar discrepancy can be seen in week 10, where the formula section’s RMark-

down document was pull(filter(lablines, week == 10, section == "formula"),

lines) lines long, and the videos totaled 19 minutes. That same week the

tidyverse RMarkdown document was pull(filter(lablines, week == 10, section

== "tidyverse"), lines) lines long, and the videos totaled 27 minutes.

The explanation for the varying time is similar, as well. Week 10 focused on inference

for two samples; that is, inference for a diﬀerence of proportions or a diﬀerence of means.

While a diﬀerence of means makes it fairly easy to know which variable should go where

(the quantitative variable is the response variable to take the mean of, and the categorical

variable is the explanatory variable splitting it), with a diﬀerence of two proportions the

concept comes back to thinking about two-way tables. Again, the tidyverse presentation

of a “two-way table” made this more diﬃcult to conceptualize.

In the formula section, students saw code like that in Code Block 3.7.

tally(island ~sex, data = penguins, format = "proportion")

prop.test(island ~sex, data = penguins, success = "Biscoe")

Code Block 3.7: Making a two-way table and performing inference for a diﬀerence

of proportions using the formula syntax. In order for this code to run as-is, the

Torgerson island has to be removed so there are just two categories in that variable.

The code for ﬁnding the point estimate using mosaic::tally() is quite similar to the

code for performing inference using prop.test().

In the tidyverse, the code is not as consistent. Students in this section saw code like

that shown in Code Block 3.8.

15

penguins %>%

group_by(sex, island) %>%

summarize(n = n()) %>%

mutate(prop = n/sum(n))

penguins %>%

prop_test(

response = island,

explanatory = sex,

alternative = "two-sided",

order = c("female","male")

)

Code Block 3.8: Making a ‘two-way table’ and performing inference for a diﬀerence

of proportions using the tidyverse syntax. Again, the Torgerson island data has been

removed beforehand.

In tidyverse syntax the code for ﬁnding the point estimate (dplyr’s group by(),

summarize() and then mutate()) is quite diﬀerent from the code performing the inference

(the infer::prop test() function). And, the output from the inferential prop test()

function makes it harder to determine the code was correct. In the prop.test() out-

put, sample estimates are provided, which allows you to check your work against a point

estimate computed earlier.

These discrepancies made it take longer to explain code in the tidyverse section. Com-

parisons of RMarkdown document length and YouTube video length, as well as the corre-

sponding reasons for those discrepancies are the ﬁrst hint of the computing time results to

come in Section 3.6.

3.3 Number of functions

Since both sections relied on the use of RMarkdown documents, there is a wealth of text

data to be explored. The instructor prepared the pre-lab documents with blanks, but also

saved a ‘ﬁlled-in’ copy after recording the accompanying video. She also completed each

lab assignment in an RMarkdown document to generate a key.

Students in each section were also given a “All the R you need for intro stats” cheat-

16

sheet at the beginning of the semester. These cheatsheets (one for formula and one for

tidyverse) were modeled on the cheatsheet of a similar name accompanying the mosaic

package (Pruim et al. 2017). The cheatsheets aimed to include all code necessary for the

entire semester, but were generated a priori.

These varied documents allow us to use automated methods to analyze the number

of unique functions shown in each section, using the getParseData() function from the

built-in utils package.

The cheatsheets given to students at the beginning of the semester contained 34 functions

for the formula section and 42 functions for the tidyverse section. There was an overlap

of 18 functions between the two cheatsheets.

Of course, while teaching a real class, an instructor often has to ad-lib at least a little.

So, it is also interesting to consider the number of functions actually shown throughout

the course of the semester. To do this, we can consider the functions shown in the ﬁlled-in

version of pre-lab documents the instructor ended up with after recording the associated

instructional video.

Considering this data, the formula section saw a total of 37 functions and the tidyverse

section saw 50, again with an overlap of 18 functions between the two sections. These

numbers make it appear as if in the formula section the instructor showed all functions

from the cheatsheet, and then a few additional functions. However, there were actually

several functions in the cheatsheet that were never shown in the actual class, and many

more functions that appeared in the class that did not make it onto the cheatsheet. For a

list of the functions used in both sections, see Appendix A.

In the tidyverse section, there were 9 functions shown in class that did not appear on

the cheatsheet, and only 1 function on the cheatsheet that was not discussed in class. In

the formula section, however, there were 10 functions shown in class that did not appear

on the cheatsheet, as well as 7 functions on the cheatsheet that were not discussed in class.

In both classes the majority of functions shown in class were on the cheatsheet.

Interestingly, there was quite a bit of overlap in the functions students saw in both

sections. Considering functions actually used in class, the two sections had 18 functions in

common.

17

The functions both sections of students saw included helper functions like library(),

set.seed(), and set() (a function in the knitr options included in the top of each RMark-

down document), statistics like mean(),sd(), and cor(), and modeling-related functions

like aov(),lm(),summary() and predict().

Students in the formula section saw 19 functions unique from the set both sections saw,

while the tidyverse section saw 32 unique functions. It makes sense the number of unique

functions in the tidyverse section would be slightly larger. One reason is the ggplot2

helper functions ggplot() and aes().

Students in both sections saw how to make a barchart, boxplot, histogram, and scatter-

plot, but in the formula section they used standalone functions like gf boxplot() whereas

in the tidyverse section they needed to start with ggplot and add on a geom function

like geom boxplot(), while specifying the aesthetic values somewhere.

Similarly, both sections saw several common summary statistics, but in the formula

section they used the function (e.g. mean()) on its own, whereas in the tidyverse section

summary functions needed to be wrapped within summarize(). Students in the tidyverse

section also saw slightly more summary statistic functions, because one lab called for the

ﬁve number summary.

In the formula lab, students found the ﬁve number summary as shown in Code Block 3.9.

favstats(~bill_length_mm, data = penguins)

Code Block 3.9: The mosaic::favstats() function provides many common summary

statistics for one quantitative variable. The favstats() function automatically drops

missing values.

This approach is particularly attractive because it deals with missing values as part of

the standard output.

In the tidyverse section, the instructor chose to show two approaches. (Probably a

bad pedagogical decision.) Both approaches are in Code Block 3.10, and both needed to

include drop na() to deal with missing values. Past those similarities, the approaches are

divergent.

18

penguins %>%

drop_na(bill_length_mm) %>%

summarize(

min = min(bill_length_mm),

lower_hinge = quantile(bill_length_mm, .25),

median = median(bill_length_mm),

upper_hinge = quantile(bill_length_mm, .75),

max = max(bill_length_mm)

)

penguins %>%

drop_na(bill_length_mm) %>%

pull(bill_length_mm) %>%

fivenum()

Code Block 3.10: Two approaches for doing summary statistics of one quantitative

variable in tidyverse syntax. The ﬁrst is quite verbose, the second is more compact

but introduces a function never seen again.

The instructor should have chosen a single solution to present to students, but was faced

with a dilemma. The ﬁrst tidyverse approach is very verbose, but it follows nicely from

other summary statistics students had already seen, just adding a few more functions like

min,max, and quantile. The second solution is more concise, but it introduces the pull

function, which was never used again in the course.

This brings up an important consideration when teaching coding– how many times

students will see the same function. Because there is some cognitive load associated with

learning a new function, and repetition helps move information from working memory to

long term memory, it is ideal for students to see each function at least twice (?McNamara

et al. 2021b). When analyzing the number of functions shown in each section, we found

there were 7 functions shown only one time in the formula section, and 6 functions only

shown once in the tidyverse section.

The practice of analyzing the number of functions shown over the course of the semester

was eye-opening. It will provide valuable information for the instructor the next time she

teaches the course, as she can attempt to remove functions only shown once, and ensure

the cheatsheets better match what is actually shown throughout the semester.

19

3.4 Pre- and post-survey

As discussed in 2.1, the number of students who completed both the pre- and post-surveys

were low, so there is limited generalizability of the paired analysis.

The majority of the survey was modeled on a pre- and post-survey used by the Carpen-

tries, a global nonproﬁt teaching coding skills (Carpentries 2021). Questions ask respon-

dents to use a 5-step Likert scale, from 1 (strongly disagree) to 5 (strongly agree) to rate

their agreement with the following statements:

•I am conﬁdent in my ability to make use of programming software to work with data

•Having access to the original, raw data is important to be able to repeat an analysis

•Using a programming language (like R) can make me more eﬃcient at working with

data

•While working on a programming project, if I get stuck, I can ﬁnd ways of overcoming

the problem

•Using a programming language (like R) can make my analysis easier to reproduce

•I know how to search for answers to my technical questions online

In Figure 3, you can see a visualization of these Likert-scale questions, split by section.

It is diﬃcult to gather much of a conclusion from this ﬁgure. Many categories appear to

have made an improvement, while others seem to show a decrease in agreement from the

pre- to the post-survey. Additionally, the ﬁgure shows overall trends in the sections, and

does not utilize the potential for matching pre- and post-responses from the same student

to measure change at the individual level.

To consider this individual-level change, we can compute the diﬀerence between a stu-

dent’s response on the pre- and post-survey. We compute post score −pre score such that

positive diﬀerences mean the student’s attitude on the item improved from the beginning

of the class to the end, and negative diﬀerences mean they worsened.

Because the questions were on Likert scales, it is not appropriate to compute an arith-

metic mean of the diﬀerences, but median scores can be computed. To provide a broader

picture of the distribution of responses, we also compute the 25th and 75th percentiles

20

formula

tidyverse

Programming

Confident

Search

Online

Analyses

Easier

Overcome

Problem

Programming

Efficient

Raw Data

100 50 0 50 100100 50 0 50 100

post

pre

post

pre

post

pre

post

pre

post

pre

post

pre

Percentage

1 − strongly disagree 2 3 − neutral 4 5 − strongly agree

Figure 3: Pre and post responses to Likert-scale questions. Most questions show

some level of improvement, such as the ﬁrst question, ‘I am conﬁdent in my ability

to make use of programming software to work with data.’ but others show no change

or even a decline in agreement.

21

formula

tidyverse

−2 −1 0 1 2 3 −2 −1 0 1 2 3

Search Online

Analyses Easier

Overcome Problem

Programming Efficient

Raw Data

Programming Confident

Difference in Likert rating between

pre− and post−surveys

Figure 4: Distribution of paired diﬀerences for student responses to questions. A

score of 0 means the student responded the same way in the pre- and post-surveys,

whereas a negative score means their agreement was lower at the end of the course,

and a positive score means their agreement was higher. The boxes cross 0 for all

except those for ‘I am conﬁdent in my ability to make use of programming software

to work with data’, and boxes appear similar between sections.

for each section and score. This information is most easily displayed as a boxplot. The

boxplots in question can be seen in Figure 4.

Because the sample sizes are so small, we will not attempt to use inferential statistics,

but it is worth noting almost all boxes are centered at 0 (meaning the median response did

not change over the course of the semester).

The one question that is an exception to this rule is “I am conﬁdent in my ability to make

use of programming software to work with data.” The boxes for both sections are centered

at a median of 1, meaning the median student answered one level up on the question at

the end of the course. Both boxes (the middle 50% of the data) are fully positive, although

the lower whisker (minimum value) for both includes zero.

It is somewhat heartening to know students improved their conﬁdence in programming

over the course of the semester, but there is no clear diﬀerence between the sections, so

this does not provide any strong evidence for one syntax or the other.

Likely, the questions used by The Carpentries was inappropriate for this setting, and

22

a diﬀerent set of survey questions would have been more appropriate for this group. For

example, this class did not include any explicit instruction on searching for answers online.

This was an intentional choice, because novices typically struggle to identify which search

results are relevant to their queries and get overwhelmed by the multitude of syntactic

options they run across. Instead, students with questions were referred to the “all the R you

need” cheatsheet they had been given at the beginning of the semester, which attempted to

summarize every function they would encounter. Likely, students still attempted to Google

questions, which may be why the responses to this question got more negative over the

course of the semester.

In addition to the six questions asked on both the pre- and post-survey, the two surveys

also had some unique questions.

The pre-survey also asked students to share what they were most looking forward to,

and most nervous about. Both sections had similar responses. Students wrote they looked

forward to “learning how to code!” and “Gaining a better understanding of how to analyze

data.” Beyond worries related to the pandemic, they expressed apprehension about “getting

stuck,” “using R,” and “Figuring out how to do the programming and typing everything

out.”

On the post survey, students were asked to report which syntax they had learned, with

an option to respond “I don’t know.” All students in both sections correctly identiﬁed the

syntax associated with their lab. Then, they were asked if they would have preferred to

learn the other syntax. We hypothesized many students would say ‘yes,’ thinking the other

syntax would have been easier or lack some feature they found frustrating. Surprisingly,

though, the majority of students in both sections said ‘no,’ they preferred to learn the

syntax they had been shown. Responses to this question are shown in Table 2.

However, part of the explanation is likely that the students did not know what the

other syntax looked like. Throughout the semester, the instructor was careful to only

expose students to the syntax for the particular section. Several students asked to see the

alternate syntax during oﬃce hours, but this was the exception and not the norm.

An optional follow-up question asked students why they had responded the way they

did. Responses to this question are shown in Table 3. Several students suggested a cross-

23

Section Answer n Proportion

formula No 6 0.86

formula Yes 1 0.14

tidyverse No 10 0.91

tidyverse Yes 1 0.09

Table 2: Responses to the question, ‘Would you have preferred to learn the other

syntax?’

formula

tidyverse

0% 20% 40% 60% 0% 20% 40% 60%

About what I expected −−

in a good way

Not what I expected −−

in a good way

About what I expected −−

in a bad way

Not what I expected −−

in a bad way

How was the experience of learning to program in R?

Figure 5: Responses to the question, “How was the experience of learning to program

in R?”

over design for the experiment would have allowed them to better compare, which is both

a good direction for further work (and a possible indication the students were listening

during the chapter on experimental design).

Another question on the post-survey asked students “How was the experience of learning

to program in R?” Overall, students seem to have positive sentiment toward learning R,

whether in the formula or the tidyverse section. As seen in Figure 5, most students said

either the experience was “not what I expected – in a good way” or “About what I expected

– in a good way.”

Nothing from the survey responses seem to indicate a diﬀerence between the two sections.

24

Section Response

formula I’ve heard that formula was more straightforward

formula I thought the syntax that I learned worked well

formula Because I am not familiar with it

formula I have no idea what the diﬀerences are, so I don’t really know how to answer this

question.

formula Do not really know what the diﬀerence is, but also Prof. M was a very good teacher.

tidyverse I’m not sure I wish we got to experience both so we could compare, maybe learn one

for one half of the semester and the other for the other half?

tidyverse As per my plan to study data Science in graduate school, I would have preferred

learning both syntaxes

tidyverse I really enjoyed tidyverse, it was super easy to learn, and I liked the simplicity of

the syntax

tidyverse Tidy, is well tidy. When looking online the other syntax seemed more

complex/abnormal

tidyverse Im not sure what the beneﬁt is.

tidyverse I’m not sure of the diﬀerence and I had 0 experience of coding or using anything like

r so I didn’t have a preference as to which one I learned.

tidyverse I really enjoyed this class and have learned a lot.

Table 3: Reasons stated by students for their preference of which syntax to learn.

25

While the pre- and post-survey results do not suggest interesting results, the incidental data

from YouTube and RStudio Cloud provided some insights.

3.5 YouTube analytics

Because of the format of the class, which was ﬂipped such that students watched videos

of pre-recorded content, we can study overall patterns of YouTube watch time. YouTube

oﬀers a data portal which allows for date targeting. We deﬁned each week of the semester

as running from Sunday to Saturday, which covered the time when videos were released

through to the time ﬁnished labs needed to be submitted (Fridays at 11:59 pm). For each

week, we downloaded YouTube analytics data for the channel, and ﬁltered the data to focus

only on the videos related to the introductory statistics labs.

Analytics data includes number of watches for each video, number of unique viewers,

and total watch time. We joined this data with data recording the length of the relevant

videos, which allowed us to calculate the approximate proportion of the videos watched by

each student.

Data from YouTube is aggregated, and since videos were posted publicly, could contain

viewers who were not enrolled in the class. However, when we checked view counts of lab

videos on subsequent weeks (e.g., looking at views for the “describing data” lab in weeks

3-15) there were rarely more than two views accumulated per section per week. While

the public nature of the videos means we do need to view these results with a level of

skepticism, we can be reasonably sure the majority of viewers were students. Studying the

data displays some interesting trends.

First, we can look at the number of unique watchers per video, seen in Figure 6. Inter-

estingly, at the start of the semester there are more unique viewers than enrolled students

in the class, but as time goes on, the number of unique viewers levels out at slightly less

than the number of enrolled students (n= 21 for both sections). The lower numbers later

on make sense because some students were likely unengaged, or found it possible to do

their lab work without watching the video. However, the high numbers at the start of the

semester are puzzling. Perhaps students were viewing the videos from a variety of devices

(phone, laptop, computer at school, etc) when the semester began.

26

0

10

20

30

4 8 12 16

Number of unique viewers

formula tidyverse

Figure 6: Average number of unique viewers per video. Horizontal line represents

the 21 students enrolled in each of the sections, a baseline for comparison.

If we assume all viewers were actually students (some students being counted as sepa-

rate viewers because of diﬀerent devices or cookie settings), we can ﬁnd an approximate

proportion of video content watched, per student. This is shown in Figure 7. It appears the

proportion of video content watched is larger for the formula videos than for the tidyverse

videos. This can be conﬁrmed by a 95% bootstrap interval, which suggests the formula

section watched between 0 and 0 percentage points more of the videos each week.

The discrepancy in watch proportions could be explained by the fact that videos for

the tidyverse section tended to be longer, as discussed in Section 3.2. Prior research has

shown shorter videos are better for ﬂipped classroom settings, so perhaps the videos for the

tidyverse section were just too long. Literature about ﬂipped classrooms suggests shorter

videos are better, although there is no consensus about the ideal length for videos, with

suggestions ranging from 5 to 20 minutes as a maximum length for a video (Zuber 2016,

Beatty et al. 2019,Guo et al. 2014). Most weeks the total number of minutes of video

content was below 20, and almost every week had video content split into multiple shorter

videos.

No matter the explanation, this trend is particularly interesting when considered in

27

0.0

0.4

0.8

1.2

4 8 12 16

Week of semester

Approximate proportion of video content watched, per student

formula tidyverse

Figure 7: Estimated proportion of YouTube video content watched, per student.

This data came from dividing the total amount of time watched by the number of

students in each section and the total length of the video(s) for the section that

week.

conjunction with the RStudio Cloud usage patterns in the following section.

3.6 RStudio Cloud usage

The other source of unexpected data came from RStudio Cloud usage logs. RStudio Cloud

provides summary data per user in a project, aggregated by calendar month. This data

includes all students enrolled in the class.

Since the instructor set up separate projects for each section, it is easy to compare data

between sections. In Figure 8we can see the amount of compute time used by each student

in each section. Lines connect data from a particular student, to allow the reader to trace

over time. For a monthly overview, see Figure 9.

Note that the month of November is missing for the tidyverse section because of an

oversight on the part of the author.

While the tidyverse section seemed to watch less of the provided videos each week (as

28

tidyverse

formula

September October November December

0

10

20

30

40

50

0

10

20

30

40

50

Hours of compute time on RStudio Cloud

Figure 8: Hours of compute time per student over the course of the semester.

November

December

September

October

0 10 20 30 40 50 0 10 20 30 40 50

Hours of compute time on RStudio Cloud

formula tidyverse

Figure 9: Hours of compute time on RStudio Cloud, per month of the semester.

Students in the tidyverse section appear to be spending more time on RStudio

Cloud, particularly in the months of October and December.

29

section September October November December

formula 10.4 (3.3) 13.9 (10.3) 9.4 (6) 7.7 (6)

tidyverse 7.7 (4.7) 17.1 (8.6) missing 11.5 (7.2)

Table 4: Mean student compute time on RStudio Cloud per month in hours (stan-

dard deviation in parentheses), broken down by section. Note diﬀerent months had

diﬀerent numbers of assignments, although the number of assignments was consistent

between sections

discussed in Section 3.5), they appear to spend more time on RStudio Cloud per month.

All the distributions are right-skewed, with several students spending many more hours

of compute time than the majority. It is also important to note these numbers are likely

inﬂated based on the way RStudio Cloud counts usage time. The spaces for both sections

were allocated 1 GB of RAM and 1 CPU, so one hour of clock time on the space counted as

one project hour (spaces with more RAM or CPU may consume more than one project hour

per clock hour), but student usage often includes a fair amount of idle time. RStudio Cloud

will put a project to sleep after 15 minutes without interaction, and based on observation

of student habits it is likely almost every session ends with a 15 minute idle time before

the project sleeps. In a month with four labs, this can add up to at least an hour of project

time that does not correspond to students actually using R.

Nevertheless, because the numbers would be inﬂated in the same way in both sections,

we can persist in comparing them. Using data over the entire semester, students in the

tidyverse section had an mean number of compute hours per month of 13.5 and students

in the formula section had a mean of 11.5 hours.

We can also study these numbers per month, as seen in Table 4. The mean compute

time for both sections increases from September to October, likely because of the increased

number of labs that month (two labs were due in September, ﬁve in October). Compute

time then drops down again for the formula section, and continues downward. November

data is missing for the tidyverse section, but time also appears to decrease in this section

as months progress, although not to the same degree as in the formula section.

Whereas in the pre- and post-surveys we have quite small sample sizes, the RStudio

Cloud data includes all students enrolled in the class. This means we perhaps have a large

30

eﬀect group term estimate std.error statistic

ﬁxed NA (Intercept) 11.381885 1.556911 7.3105558

ﬁxed NA sectiontidyverse -1.976604 2.175435 -0.9086018

ﬁxed NA monthOctober 4.359535 1.653232 2.6369779

ﬁxed NA monthNovember -1.715090 1.653232 -1.0374167

ﬁxed NA monthDecember -2.300425 1.653232 -1.3914717

ﬁxed NA sectiontidyverse:monthOctober 4.899422 2.310021 2.1209425

ﬁxed NA sectiontidyverse:monthDecember 5.200658 2.310021 2.2513466

ran pars ID sd (Intercept) 4.598662 NA NA

ran pars Residual sd Observation 5.227977 NA NA

Table 5: Linear mixed-eﬀects, using month as a categorical variable.

enough sample to perform inferential statistics.

Data was collected at the student level over time, so it is necessary to use a mixed eﬀects

model to account for clustering within students. We also need to take into account the

longitudinal nature of the data, so we included month as a predictor. We use the lme4

package to ﬁt the linear mixed eﬀect models (Bates et al. 2015).

Initially, we ﬁt an unconditional means model, to determine how much variability in

compute time was due to diﬀerences between students, without considering diﬀerences over

time or between section. Based on the intraclass correlation coeﬃcient, we can conclude

30% of the total variation in compute time is attributable to diﬀerences between students.

After iterating through several candidate models, we arrived at a ﬁnal model which pre-

dicts compute time per month (in hours) using section and month as ﬁxed eﬀect predictors,

as well as an interaction eﬀect between section and month. Student identiﬁer was used as a

random eﬀect. This ﬁnal model has the lowest AIC and BIC values of all candidate models.

Results from the model can be seen in Table 5.

The predicted values for each section/month combination match the means computed

in Table 4.

The lme4 package does not provide p-values for model coeﬃcients, but it does provide

a method for conﬁdence intervals. The conﬁdence intervals for each of the coeﬃcients are

shown in Table 6.

31

2.5 % 97.5 %

.sig01 3.2512430 6.0590086

.sigma 4.4708436 5.8874342

(Intercept) 8.3756022 14.3881678

sectiontidyverse -6.1772116 2.2240035

monthOctober 1.1696135 7.5494564

monthNovember -4.9050115 1.4748314

monthDecember -5.4903465 0.8894964

sectiontidyverse:monthOctober 0.4422206 9.3566237

sectiontidyverse:monthDecember 0.7434568 9.6578598

Table 6: Conﬁdence intervals for coeﬃcient estimates.

The conﬁdence interval on the sectiontidyverse coeﬃcient crosses zero, which sug-

gests the diﬀerence in number of hours of compute time between the sections in September

was not statistically signiﬁcant. The conﬁdence interval on monthOctober does not cross

zero, suggesting students in the formula section spent longer on RStudio Cloud that month

compared to September. But, the intervals for the formula section in November and De-

cember cross zero, which means the number of compute hours is not signiﬁcantly diﬀerent

from the number of hours in September for that section. For the tidyverse section it

is a little harder to assess. The intervals for the sectiontidyverse:monthOctober and

sectiontidyverse:monthDecember intervals do not cross zero, but if combined with the

intervals on monthOctober and monthDecember, they would.

As a model assessment strategy, we can use a likelihood ratio test to compare the

unconditional means model with our more complex model. A drop-in-deviance test suggests

the more complex model signiﬁcantly outperforms the unconditional means model.

Based on the signiﬁcance of the drop-in-deviance test, and the number of conﬁdence

intervals in the model that did not cross zero, it seems both month and section have some

predictive power for the number of compute hours students used on RStudio Cloud.

It appears students in the tidyverse section spent more time on RStudio Cloud. We

can concoct several diﬀerent scenarios to explain this diﬀerence. In one, students in the

tidyverse section were more engaged with their work, so spent more time playing with

code in R. In another, students in the tidyverse section struggled to complete their work,

32

so spent more time in R trying to get their lab material to work. Because the usage data

was collected incidentally after the fact, we have no information about which story is closer

to the truth. A follow-up study might conduct semi-structured interviews with students

after the completion of the class, to determine more about student experiences and work

patterns.

It would also be interesting to know if students who spent more time on RStudio Cloud

received higher or lower grades on their assignments, but as discussed in Section 3.1, the

IRB for this study did not cover graded student work in that way. We do know the two

sections did not have an overall diﬀerence in mean grade.

Since these results are from a pilot study, they should not be used without caveats.

However, they do indicate that if instructors are worried about the amount of time assign-

ments take to complete, they may want to consider using the formula syntax rather than

the tidyverse syntax.

Another follow-up study that would be interesting to complete would look at student

success in subsequent courses. Because tidyverse syntax is frequently used for higher-

level courses, students who were in the tidyverse section may have an easier time in

those later courses. However, many students in this study will not go on to take further

statistics courses. So the takeaways about syntax choice may vary depending on the student

population to which they will be applied.

4 Discussion

This pilot study provides a semester-long comparison of two sections of introductory statis-

tics labs using two popular R coding styles, the formula syntax and the tidyverse syntax.

Pre- and post-survey analysis showed limited diﬀerences between the two sections, but

analysis of other incidental data, including pre-lab document lengths and YouTube and

RStudio Cloud data presented interesting distinctions.

Materials for the tidyverse section tended to be longer, both in lines of code (likely

because of the convention of linebreaks after %>%) as well as the length of the associated

YouTube videos. Students in the tidyverse section watched a smaller proportion of the

33

weekly pre-lab videos than students in the formula section, but spent more time computing

on RStudio. Conversely, students in the formula section were watching a larger proportion

of the pre-lab videos each week, but spending less time computing each month.

These two insights are slightly contradictory– perhaps the formula section students found

the concepts more complex as they were watching the videos, but then had an easier time

applying them as they worked on the real lab.

There is much more interesting further work that could be considered. As students

suggested, a cross-over design where students saw one syntax for the ﬁrst half of the semester

and the other for the second half would allow for better comparisons. However, there are

a few caveats here.

First, anecdotal evidence from many instructors suggests it is best for students to see

only one consistent syntax over the course of the semester. The other challenge is the

formula syntax tends to seep (albeit only minorly) into the tidyverse section. For example,

when doing linear regression both sections saw the lm(y~x, data = data) formula syntax.

If a cross-over design used the existing materials from this study, just swapping the ﬁnal

few weeks, students in the formula section would likely see more that was familiar to them

than students in the tidyverse section.

By this consideration, the tidyverse students almost did have a cross-over design. This

may be why the number of hours of compute time for the tidyverse section remained

consistent from November to December (even though there were fewer instructional weeks

in December) while the formula section’s hours of compute time decreased.

Another interesting insight from this pilot is the number of unique functions needed to

cover a semester of introductory statistics in R. The tidyverse section saw more unique

functions, but both sections were limited to a small vocabulary of functions for the semester.

We recommend instructors follow this approach regardless of syntax. Instructors should

attempt to reduce the number of functions they expose students to over the course of a

semester, particularly in an introductory class. This will help reduce cognitive load.

One criticism of the tidyverse is how many functions the associated packages contain.

However, while the tidyverse section exposed students to 32, compared to the 19 functions

shown in the formula section, both labs focused on a relatively small number of functions.

34

Because there were 12 labs in the semester, this averages out to approximately 3 functions

per lab for the tidyverse section compared to an average 2 functions shown in the formula

section.

The exercise of counting R functions in existing materials, using the getParseData()

function, is one we recommend all instructors attempt, particularly before re-teaching a

course. It can be eye-opening to discover how many functions you show students, and

which functions are only used once.

We hope this pilot helps answer some initial questions about the impact of R syntax on

teaching introductory statistics, while also raising further questions for future study. While

some aspects of the analysis in this study suggest the formula syntax is simpler for students

to learn and use, there are still many course scenarios for which we believe the tidyverse

syntax is the most appropriate choice. While formula syntax can be used throughout an

entire semester of introductory statistics, it does not oﬀer functionality for tasks like data

wrangling. This means students who will go on to additional statistics or data science

classes may be better served by an early introduction to tidyverse. However, in order to

determine this conclusively, additional study would be needed.

No matter which syntax an instructor chooses, it appears possible to limit the number

of functions shown in a semester, and provide students with a positive learning experience.

5 Acknowledgements

Thanks to Sean Kross for his guidance about parsing R function data, and Nick Horton

for his useful comments.

35

36

A Functions used

•aov

•cor

•data.frame

•ﬁlter

•library

•lm

•mean

•pnorm

•predict

•pt

•qnorm

•qt

•sd

•set

•set.seed

•sqrt

•summary

•TukeyHSD

(a) Used in both sections

•chisq.test

•conﬁnt

•diﬀ

•do

•factorize

•gf bar

•gf boxplot

•gf histogram

•gf point

•options

•pdata

•prop.test

•read.csv

•resample

•rsquared

•shuﬄe

•t.test

•tally

•transform

(b) Used only in formula

•aes

•as factor

•c

•calculate

•chisq test

•drop na

•ﬁvenum

•generate

•geom bar

•geom boxplot

•geom histogram

•geom point

•get ci

•get p value

•ggplot

•group by

•help

•hypothesize

•IQR

•max

•median

•min

•mutate

•n

•prop test

•pull

•quantile

•read csv

•specify

•sum

•summarize

•t test

(c) Used only in tidyverse

Table 7: Lists of functions, and which section(s) they were used in.

37

References

Adhikari, A., DeNero, J. & Jordan, M. I. (2021), ‘Interleaving Computational and Inferen-

tial Thinking: Data Science for Undergraduates at Berkeley’, arXiv:2102.09391 [cs] .

URL: http://arxiv.org/abs/2102.09391

Bates, D., M¨achler, M., Bolker, B. & Walker, S. (2015), ‘Fitting Linear Mixed-Eﬀects

Models Using lme4’, Journal of Statistical Software 67(1).

URL: http://www.jstatsoft.org/v67/i01/

Beatty, B. J., Merchant, Z. & Albert, M. (2019), ‘Analysis of Student Use of Video in a

Flipped Classroom’, TechTrends 63(4), 376–385.

URL: http://link.springer.com/10.1007/s11528-017-0169-1

Biehler, R. (1997), ‘Software for Learning and for Doing Statistics’, International Statistical

Review 65(2), 167–189.

Bray, A., Ismay, C., Chasnovski, E., Baumer, B. & Cetinkaya-Rundel, M. (2021), Infer:

Tidy Statistical Inference.

URL: https://CRAN.R-project.org/package=infer

Carpentries, T. (2021), ‘The Carpentries Survey Archives’.

URL: https://carpentries.github.io/assessment-archives/

C¸ etinkaya-Rundel, M., Hardin, J., Baumer, B. S., McNamara, A., Horton, N. J. & Rundel,

C. (2021), ‘An educator’s perspective of the tidyverse’, arXiv:2108.03510 [stat] .

URL: http://arxiv.org/abs/2108.03510

DeNero, J., Culler, D., Wan, A. & Lau, S. (2020), ‘datascience 0.15.7’.

URL: http://data8.org/datascience/

Finzer, W. (2002), ‘Fathom: Dynamic Data Software (version 2.1)’, Key Curriculum Press.

GAISE College Report ASA Revision Committee (2016), Guidelines for Assessment and

Instruction in Statistics Education College Report 2016, American Statistical Associa-

38

tion.

URL: http://www.amstat.org/education/gaise

Gramazio, C. C., Laidlaw, D. H. & Schloss, K. B. (2017), ‘Colorgorical: Creating discrim-

inable and preferable color palettes for information visualization’, IEEE Transactions on

Visualization and Computer Graphics 23(1), 521–530.

URL: http://ieeexplore.ieee.org/document/7539386/

Guo, P. J., Kim, J. & Rubin, R. (2014), How video production aﬀects student engagement:

An empirical study of MOOC videos, in ‘Proceedings of the First ACM Conference on

Learning @ Scale Conference’, ACM, Atlanta Georgia USA, pp. 41–50.

URL: https://dl.acm.org/doi/10.1145/2556325.2566239

Harrower, M. & Brewer, C. A. (2003), ‘ColorBrewer.org: An Online Tool for Selecting

Colour Schemes for Maps’, The Cartographic Journal 40(1), 27–37.

URL: https://www.tandfonline.com/doi/abs/10.1179/000870403235002042

Horst, A. M., Hill, A. P. & Gorman, K. B. (2020), ‘Palmerpenguins: Palmer Achipelago

(Antarctica) penguin data. R package version 0.1.0’, Zenodo.

URL: https://allisonhorst.github.io/palmerpenguins/

Kaplan, D. & Pruim, R. (2020), Ggformula: Formula Interface to the Grammar of Graph-

ics.

URL: https://CRAN.R-project.org/package=ggformula

Konold, C. & Miller, C. D. (2001), ‘TinkerPlots (version 0.23). Data Analysis Software.’.

Krishnamurthi, S., Schanzer, E., Politz, J. G., Lerner, B. S., Fisler, K. & Dooman, S. (2020),

‘Data Science as a Route to AI for Middle- and High-School Students’, arXiv:2005.01794

[cs] .

URL: http://arxiv.org/abs/2005.01794

McNamara, A. (2015), Bridging the Gap Between Tools for Learning and for Doing Statis-

tics, PhD thesis, University of California, Los Angeles.

39

McNamara, A. (2018), ‘R Syntax Comparison Cheatsheet’.

URL: https://osf.io/2k8fw/

McNamara, A., Zieﬄer, A., Beckman, M., Legacy, C., Butler Basner, E., delMas, R. C. &

Rao, V. V. (2021a), Computing in the Statistics Curriculum: Lessons Learned from the

Educational Sciences, in ‘USCOTS 2021’.

URL: https://www.causeweb.org/cause/uscots/uscots21/tu-03-computing-statistics-

curriculum-lessons-learned-educational-sciences

McNamara, A., Zieﬄer, A., Beckman, M., Legacy, C., Butler Basner, E., delMas, R. &

Rao, V. V. (2021b), ‘Computing in the Statistics Curriculum: Lessons Learned from the

Educational Sciences’.

Morandat, F., Hill, B., Osvald, L. & Vitek, J. (2012), Evaluating the Design of the R

Language: Objects and Functions For Data Analysis, in ‘ECOOP’12 Proceedings of the

26th European Conference on Object-Oriented Programming’.

Peters, T. (2004), ‘PEP 20 – The Zen of Python’.

URL: https://www.python.org/dev/peps/pep-0020/

Pruim, R., Kaplan, D. & Horton, N. J. (2017), ‘The mosaic package: Helping students

‘think with data’ using R’, The R Journal 9(1).

R Core Team (2020), R: A Language and Environment for Statistical Computing, R Foun-

dation for Statistical Computing, Vienna, Austria.

URL: http://www.R-project.org

Rafalski, T., Uesbeck, P. M., Panks-Meloney, C., Daleiden, P., Allee, W., McNamara, A.

& Steﬁk, A. (2019), A Randomized Controlled Trial on the Wild Wild West of Scientiﬁc

Computing with Student Learners, in ‘Proceedings of the 2019 ACM Conference on

International Computing Education Research’, pp. 239–247.

Roberts, S. (2015), Measuring Formative Learning Behaviors of Introductory Statistical

Programming in R via Content Clustering, PhD thesis, University of California, Los

Angeles.

40

RStudio PBC (2021), ‘RStudio Cloud - Do, Share, Teach, and Learn Data Science’.

URL: https://rstudio.cloud/

Steﬁk, A. & Siebert, S. (2013), ‘An Empirical Investigation into Programming Language

Syntax’, ACM Transactions on Computing Education 13(4).

Steﬁk, A., Siebert, S., Steﬁk, M. & Slattery, K. (2011), An Empirical Comparison of

the Accuracy Rates of Novices using the Quorum, Perl and Randomo Programming

Languages, in ‘PLATAEU 2011’.

The Concord Consortium (2020), ‘CODAP - Common Online Data Analysis Platform’.

URL: https://codap.concord.org/

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., Fran¸cois, R., Grole-

mund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache,

S. M., M¨uller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., Takahashi, K.,

Vaughan, D., Wilke, C., Woo, K. & Yutani, H. (2019), ‘Welcome to the Tidyverse’,

Journal of Open Source Software 4(43), 1686.

Zuber, W. J. (2016), ‘The ﬂipped classroom, a review of the literature’, Industrial and

Commercial Training 48(2), 97–103.

URL: https://doi.org/10.1108/ICT-05-2015-0039

41