Content uploaded by Nirmal Patel
Author content
All content in this area was uploaded by Nirmal Patel on Feb 26, 2019
Content may be subject to copyright.
Using Curriculum Pacing in LearnSphere to Visualize
Student Learning Trajectories
Nirmal Patel
Playpower Labs
India
nirmal@playpowerlabs.com
Parth Agrawal
Playpower Labs
India
Dhaval Prajapati
Playpower Labs
India
Derek Lomas
Delft Institute of Technology
Netherlands
j.d.lomas@tudelft.nl
Saumya Mehta
Playpower Labs
India
ABSTRACT
We propose to build a Curriculum Pacing workflow component in
the LearnSphere environment. Curriculum Pacing is a way to
visualize student learning trajectories through curriculum data. It
is a visual learning analytic method that allows its users to
observe how students interacted with curriculum topics over time,
which modules of the curriculum were visited by students over
and over, and when in time students interacted with previously
seen content. The pacing visualization is useful for data-driven
decision making for multiple stakeholders in education. EDM
researchers can use pacing plots to build hypotheses about student
learning behavior. Instructors and curriculum coordinators can
ensure that their students are moving at an expected pace and
identify content areas that are being difficult for their students,
instructional designers can look at how students are moving
through the curriculum and compare it to their expectations, and
potentially, data scientists and machine learning engineers can see
if there is enough variation in data to drive content
recommendation algorithms.
Keywords
Sequence Visualization, Learning Analytics, Curriculum Pacing
1.!INTRODUCTION
Learning analytics researchers are increasing their use of temporal
student data to understand what patterns of student behavior are
correlated with desirable outcomes. Analysis of student learning
processes is becoming easier by using tools such as frequent
pattern mining [1], process mining [1], and very recently, a new
method called Curriculum Pacing [2]. There are many challenges
when it comes to understanding student learning trajectories,
because of the combinatorial explosion of the possible learning
sequences in simple settings. For example, if an intelligent tutor
allows for 10 possible student actions, and if students can take up
to 100 different actions, this permits 10010 different possible
student learning trajectories. Although in practice, we find that
only a small fraction of these possibilities occur. Even then,
approaches such as sequence clustering have to be used to
aggregate similar student behavior [2, 3]. In a nutshell, temporal
student data are complex and difficult to make sense of.
Data visualization is one of the most widely used ways of making
sense of complex datasets. There are many reasons behind
visualizing data, but the most prominent reason which is often
cited is that the summary statistics of data can easily hide the
actual structure of the data [4]. Graph-based visualizations are one
of the easiest methods to see structure in temporal data, but these
visualizations often become complex and hard to interpret [3].
When it comes to educational data, interpretability of data is as
important as data’s ability to predict outcomes because we need to
know what makes difference for student outcomes, not just an
accurate prediction of them. So, if we are visualizing complex
educational datasets, we want to be able to make sense of the data
visualizations.
Curriculum Pacing visualization allows us to visualize student
trajectories through a curriculum in an interpretable way. We get
to see what students are doing over time, and these activities tie to
different parts of the curriculum. We can see data of many
students in the same visualization, without highly increasing the
visual complexity. Instructors can see how students are moving
through their curriculum, whether any parts of their course are
being difficult for students, and when students are revisiting
specific content areas. Using the pacing visualization,
instructional designers can see whether student movement through
the curriculum matches their expectations. Education
administrators can also use these visuals to identify classrooms
that are lagging behind others, and offer help. Last but not least, if
data scientists are using student trajectory data to drive
recommendation algorithms, they can see whether the data have
the desired variability and properties to give meaningful
recommendations.
2.!WORKFLOW METHOD
2.1!Data Inputs
Column Name
Description
Anon Student Id
Anonymous ID of the student
Problem/Step Start Time
The Start time of the problem or
step (depends on the type of the
data)
Problem Hierarchy
The location in the curriculum
hierarchy where this problem
occurs
Table 1: Data columns of DataShop student-step or student-
problem [5] data used for Curriculum Pacing workflow
component.
The Curriculum Pacing visualization workflow component in
LearnSphere will take DataShop student-step or student-problem
level datasets as input. Only a handful of columns will be used
from these datasets to produce the pacing visualization.
Apart from the standard DataShop columns, the workflow
component will also take a few input parameters to customize the
visualization.
Parameter
Description
Problem Hierarchy
Order (optional)
A CSV file with two columns that
assigns each Problem Hierarchy value
an integer that locates the Problem
Hierarchy value in the curriculum. If
not provided, Problem Hierarchy
column will be sorted alphabetically
using the gtools::mixedsort() function
in R. In other words, this input defines
the ordinal or factor levels of the
Problem Hierarchy column from the
DataShop student-step or student-
problem data.
Time Scale Type
Relative or Absolute. Relative time
normalizes Problem/Step Start Time
1, and absolute time preserves the
actual timestamps of student
interactions.
Time Scale
Resolution
Hour, Day, Week, or Month. Student
data will be aggregated at the level of
the provided resolution.
Minimum Time
Unit
An integer or a timestamp in YYYY-
MM-DD HH:MM:SS format. If Time
Scale Type is Relative, then the
component will remove the student
data before the given normalize
integer time unit. If the Time Scale
Type is Absolute, then the component
will remove the student data before
the given timestamp.
Maximum Time
Unit
Similar to the Minimum Time Unit
Plot Type
Usage (Number of Students) – plots
student usage over time
Usage and Performance (Number of
Students and Percent Correct) – plots
student usage and performance over
time
Table 2: Parameters besides the primary input file for the
Curriculum Pacing workflow component.
Using these inputs, the workflow component program will
generate the necessary output.
2.2!Workflow Model
Curriculum Pacing is a visual learning analytic method so it
operates mainly by transforming the input data into a certain
format and producing a data visualization out of them.
The visualization will be produced as a 2D plot with an X and a
Y-axis. The X-axis will represent time and the Y-axis will
represent the position in the curriculum. Input data of all of the
students will be aggregated to produce the output.
The X-axis will be a continuous axis and will represent either
relative or absolute time. Relative time will be in the units as
defined by the Time Scale Resolution parameter. For example, if
the Time Scale Type is ‘Relative’ and the Time Scale Resolution
is ‘Week,’ then the values 1, 2, 3 etc. on X-axis will represent the
1st week of student usage, 2nd week of student usage etc. Absolute
time will be binned by the units as defined by the Time Scale
Resolution parameter. For example, if the Time Scale Type is
‘Absolute’ and the Time Scale Resolution is ‘Week,’ every
Problem/Step Start Time will be changed to the preceding
Monday. Similarly, if the Time Scale Resolution parameter is set
to ‘Month,’ every Problem/Step Start Time will be changed to the
1st of the month. The range of the X-axis will be limited from the
Minimum Time Unit to the Maximum Time Unit.
The Y axis will be an ordered discrete axis (or an ordinal axis)
and will contain Problem Hierarchy. This will represent where the
student is in a curriculum at a given point in time (which can be
relative or absolute.) By default, Y-axis will be sorted
alphabetically using the gtools::mixedsort() function in R. It the
users desire a different order, they will be able to modify the order
of Y-axis values by providing an optional input parameter
‘Problem Hierarchy Order.’
If the Plot Type is set to ‘Usage (Number of Students),’ the plot
will be produced as a 2D heatmap. Each cell of the heatmap will
be filled with the hue representing the number of students at a
given point in time and a position in the curriculum. If the Plot
Type is set to ‘Usage and Performance (Number of Students and
Percent Correct),’ the plot will be produced as a 2D scatterplot
with the size of the dots representing the number of students and
color of the dots representing the average percent correct across
all of the problems at a given point in time and a position in the
curriculum.
2.3!Workflow Outputs
The workflow will output a single data visualization combining
data of all of the students in the input data, in an SVG format.
Besides this, a raw data file that produced the data visualization
will also be exported. Figure 1 shows two examples of the
visualization output, one for each of the possible Plot Type
parameter options.
3.!DISCUSSION
Looking at the example shown above, we can infer multiple
things about the students and the course. First of all, we can see
that one big group of students started at beginning of the
curriculum, and went through the course material as time went on.
This can be inferred from the near 45-degree diagonal band in the
plots. We can also see that there is another band that starts
halfway of the Y-axis, which shows that data for a group of
students starts from the middle of the curriculum. Within this
small group, we can also see that a subset of students went ahead
to complete the course faster than other students. This can be
inferred from the vertical band that branches out from the upper
diagonal band. Other important features of pacing plots are
vertical and horizontal lines. Vertical lines typically indicate parts
of the curriculum that students interact with in a short timespan,
and horizontal lines usually show parts of the curriculum that
students repeatedly interact with as time goes on. Although, the
horizontal lines can also appear if students are following different
time schedules while going through the curriculum.
Figure 1: Examples of Curriculum Pacing plots. The plots are
using DataShop Elementary Chinese course data from 212
anonymized students. The first plot shows student trajectories
through the course units over time, while the second plot shows
both the trajectories and average student performance at different
time points. Both plots are on a relative week time scale.
The second plot has one marked difference, it shows student
performance using a color scale. The performance is measured
using average percent correct of all of the students and all of the
outcomes (except hints) at a point in curriculum and a point in
time. The green color shows 100% correct, yellow 50%, and red
0%. Using these colors as cues, we can find topics that students
found difficult (red dots.)
There are multiple goals that can be achieved using curriculum
pacing visualization:
•!EDM researchers can use pacing plots to build
hypotheses about why students might be going through
the curriculum in a certain way versus the other.
•!An instructor can easily find out whether all of the
students are moving at an expected pace or not. To
identify students who are not able to follow the
schedule, we can also make the visualization interactive
so that by hovering over different parts of the
visualization, we can know the related students.
•!An instructional designer can compare the student
learning trajectory to expected trajectory, and find out
whether there is a subset of student that is following a
different learning trajectory and see how it can be better
supported.
•!An instructional designer can look for difficult topics,
and see if there are frequent visits to previous topics for
the difficult topics. If a lot of the students are revisiting
similar previous topics, the instructor can find out by
talking with them whether reviewing previously seen
content was helpful for the students to understand the
difficult topic.
•!A data scientist can see if there are any topics where
students might be applying different learning strategies
such as spaced practice, mass practice, revisiting
specific topics after a difficult topic, and whether there
is enough variation in the data to model successful
student learning strategies.
Curriculum Pacing visualization can be used in many different
ways and can act as a starting point of further inquiry in student
learning.
4.!REFERENCES
[1]!Romero, C., Ventura, S., Pechenizkiy, M. and Baker, R.S.
eds., 2010. Handbook of educational data mining. CRC
press.
[2]!Patel, N., Sharma, A., Sellman, C. and Lomas, D., 2018,
June. Curriculum Pacing: A New Approach to Discover
Instructional Practices in Classrooms. In International
Conference on Intelligent Tutoring Systems (pp. 345-351).
Springer, Cham.
[3]!Patel, N., Sellman, C., Lomas, D., 2017, July. Mining
frequent learning pathways from a large educational dataset.
In Proceedings of the 3rd International Workshop on Graph
Educational Data Mining (pp. 27–30).
[4]!Anscombe, F. J., 1973. Graphs in Statistical Analysis.
American Statistician 27 (1) (pp. 17–21).
[5]!PSLC DataShop Documentation.
https://pslcdatashop.web.cmu.edu/help?page=export