Content uploaded by Nirmal Patel

Author content

All content in this area was uploaded by Nirmal Patel on Feb 26, 2019

Content may be subject to copyright.

Using Curriculum Pacing in LearnSphere to Visualize

Student Learning Trajectories

Nirmal Patel

Playpower Labs

India

nirmal@playpowerlabs.com

Parth Agrawal

Playpower Labs

India

Dhaval Prajapati

Playpower Labs

India

Derek Lomas

Delft Institute of Technology

Netherlands

j.d.lomas@tudelft.nl

Saumya Mehta

Playpower Labs

India

ABSTRACT

We propose to build a Curriculum Pacing workflow component in

the LearnSphere environment. Curriculum Pacing is a way to

visualize student learning trajectories through curriculum data. It

is a visual learning analytic method that allows its users to

observe how students interacted with curriculum topics over time,

which modules of the curriculum were visited by students over

and over, and when in time students interacted with previously

seen content. The pacing visualization is useful for data-driven

decision making for multiple stakeholders in education. EDM

researchers can use pacing plots to build hypotheses about student

learning behavior. Instructors and curriculum coordinators can

ensure that their students are moving at an expected pace and

identify content areas that are being difficult for their students,

instructional designers can look at how students are moving

through the curriculum and compare it to their expectations, and

potentially, data scientists and machine learning engineers can see

if there is enough variation in data to drive content

recommendation algorithms.

Keywords

Sequence Visualization, Learning Analytics, Curriculum Pacing

1.!INTRODUCTION

Learning analytics researchers are increasing their use of temporal

student data to understand what patterns of student behavior are

correlated with desirable outcomes. Analysis of student learning

processes is becoming easier by using tools such as frequent

pattern mining [1], process mining [1], and very recently, a new

method called Curriculum Pacing [2]. There are many challenges

when it comes to understanding student learning trajectories,

because of the combinatorial explosion of the possible learning

sequences in simple settings. For example, if an intelligent tutor

allows for 10 possible student actions, and if students can take up

to 100 different actions, this permits 10010 different possible

student learning trajectories. Although in practice, we find that

only a small fraction of these possibilities occur. Even then,

approaches such as sequence clustering have to be used to

aggregate similar student behavior [2, 3]. In a nutshell, temporal

student data are complex and difficult to make sense of.

Data visualization is one of the most widely used ways of making

sense of complex datasets. There are many reasons behind

visualizing data, but the most prominent reason which is often

cited is that the summary statistics of data can easily hide the

actual structure of the data [4]. Graph-based visualizations are one

of the easiest methods to see structure in temporal data, but these

visualizations often become complex and hard to interpret [3].

When it comes to educational data, interpretability of data is as

important as data’s ability to predict outcomes because we need to

know what makes difference for student outcomes, not just an

accurate prediction of them. So, if we are visualizing complex

educational datasets, we want to be able to make sense of the data

visualizations.

Curriculum Pacing visualization allows us to visualize student

trajectories through a curriculum in an interpretable way. We get

to see what students are doing over time, and these activities tie to

different parts of the curriculum. We can see data of many

students in the same visualization, without highly increasing the

visual complexity. Instructors can see how students are moving

through their curriculum, whether any parts of their course are

being difficult for students, and when students are revisiting

specific content areas. Using the pacing visualization,

instructional designers can see whether student movement through

the curriculum matches their expectations. Education

administrators can also use these visuals to identify classrooms

that are lagging behind others, and offer help. Last but not least, if

data scientists are using student trajectory data to drive

recommendation algorithms, they can see whether the data have

the desired variability and properties to give meaningful

recommendations.

2.!WORKFLOW METHOD

2.1!Data Inputs

Column Name

Description

Anon Student Id

Anonymous ID of the student

Problem/Step Start Time

The Start time of the problem or

step (depends on the type of the

data)

Problem Hierarchy

The location in the curriculum

hierarchy where this problem

occurs

Table 1: Data columns of DataShop student-step or student-

problem [5] data used for Curriculum Pacing workflow

component.

The Curriculum Pacing visualization workflow component in

LearnSphere will take DataShop student-step or student-problem

level datasets as input. Only a handful of columns will be used

from these datasets to produce the pacing visualization.

Apart from the standard DataShop columns, the workflow

component will also take a few input parameters to customize the

visualization.

Parameter

Description

Problem Hierarchy

Order (optional)

A CSV file with two columns that

assigns each Problem Hierarchy value

an integer that locates the Problem

Hierarchy value in the curriculum. If

not provided, Problem Hierarchy

column will be sorted alphabetically

using the gtools::mixedsort() function

in R. In other words, this input defines

the ordinal or factor levels of the

Problem Hierarchy column from the

DataShop student-step or student-

problem data.

Time Scale Type

Relative or Absolute. Relative time

normalizes Problem/Step Start Time

1, and absolute time preserves the

actual timestamps of student

interactions.

Time Scale

Resolution

Hour, Day, Week, or Month. Student

data will be aggregated at the level of

the provided resolution.

Minimum Time

Unit

An integer or a timestamp in YYYY-

MM-DD HH:MM:SS format. If Time

Scale Type is Relative, then the

component will remove the student

data before the given normalize

integer time unit. If the Time Scale

Type is Absolute, then the component

will remove the student data before

the given timestamp.

Maximum Time

Unit

Similar to the Minimum Time Unit

Plot Type

Usage (Number of Students) – plots

student usage over time

Usage and Performance (Number of

Students and Percent Correct) – plots

student usage and performance over

time

Table 2: Parameters besides the primary input file for the

Curriculum Pacing workflow component.

Using these inputs, the workflow component program will

generate the necessary output.

2.2!Workflow Model

Curriculum Pacing is a visual learning analytic method so it

operates mainly by transforming the input data into a certain

format and producing a data visualization out of them.

The visualization will be produced as a 2D plot with an X and a

Y-axis. The X-axis will represent time and the Y-axis will

represent the position in the curriculum. Input data of all of the

students will be aggregated to produce the output.

The X-axis will be a continuous axis and will represent either

relative or absolute time. Relative time will be in the units as

defined by the Time Scale Resolution parameter. For example, if

the Time Scale Type is ‘Relative’ and the Time Scale Resolution

is ‘Week,’ then the values 1, 2, 3 etc. on X-axis will represent the

1st week of student usage, 2nd week of student usage etc. Absolute

time will be binned by the units as defined by the Time Scale

Resolution parameter. For example, if the Time Scale Type is

‘Absolute’ and the Time Scale Resolution is ‘Week,’ every

Problem/Step Start Time will be changed to the preceding

Monday. Similarly, if the Time Scale Resolution parameter is set

to ‘Month,’ every Problem/Step Start Time will be changed to the

1st of the month. The range of the X-axis will be limited from the

Minimum Time Unit to the Maximum Time Unit.

The Y axis will be an ordered discrete axis (or an ordinal axis)

and will contain Problem Hierarchy. This will represent where the

student is in a curriculum at a given point in time (which can be

relative or absolute.) By default, Y-axis will be sorted

alphabetically using the gtools::mixedsort() function in R. It the

users desire a different order, they will be able to modify the order

of Y-axis values by providing an optional input parameter

‘Problem Hierarchy Order.’

If the Plot Type is set to ‘Usage (Number of Students),’ the plot

will be produced as a 2D heatmap. Each cell of the heatmap will

be filled with the hue representing the number of students at a

given point in time and a position in the curriculum. If the Plot

Type is set to ‘Usage and Performance (Number of Students and

Percent Correct),’ the plot will be produced as a 2D scatterplot

with the size of the dots representing the number of students and

color of the dots representing the average percent correct across

all of the problems at a given point in time and a position in the

curriculum.

2.3!Workflow Outputs

The workflow will output a single data visualization combining

data of all of the students in the input data, in an SVG format.

Besides this, a raw data file that produced the data visualization

will also be exported. Figure 1 shows two examples of the

visualization output, one for each of the possible Plot Type

parameter options.

3.!DISCUSSION

Looking at the example shown above, we can infer multiple

things about the students and the course. First of all, we can see

that one big group of students started at beginning of the

curriculum, and went through the course material as time went on.

This can be inferred from the near 45-degree diagonal band in the

plots. We can also see that there is another band that starts

halfway of the Y-axis, which shows that data for a group of

students starts from the middle of the curriculum. Within this

small group, we can also see that a subset of students went ahead

to complete the course faster than other students. This can be

inferred from the vertical band that branches out from the upper

diagonal band. Other important features of pacing plots are

vertical and horizontal lines. Vertical lines typically indicate parts

of the curriculum that students interact with in a short timespan,

and horizontal lines usually show parts of the curriculum that

students repeatedly interact with as time goes on. Although, the

horizontal lines can also appear if students are following different

time schedules while going through the curriculum.

Figure 1: Examples of Curriculum Pacing plots. The plots are

using DataShop Elementary Chinese course data from 212

anonymized students. The first plot shows student trajectories

through the course units over time, while the second plot shows

both the trajectories and average student performance at different

time points. Both plots are on a relative week time scale.

The second plot has one marked difference, it shows student

performance using a color scale. The performance is measured

using average percent correct of all of the students and all of the

outcomes (except hints) at a point in curriculum and a point in

time. The green color shows 100% correct, yellow 50%, and red

0%. Using these colors as cues, we can find topics that students

found difficult (red dots.)

There are multiple goals that can be achieved using curriculum

pacing visualization:

•!EDM researchers can use pacing plots to build

hypotheses about why students might be going through

the curriculum in a certain way versus the other.

•!An instructor can easily find out whether all of the

students are moving at an expected pace or not. To

identify students who are not able to follow the

schedule, we can also make the visualization interactive

so that by hovering over different parts of the

visualization, we can know the related students.

•!An instructional designer can compare the student

learning trajectory to expected trajectory, and find out

whether there is a subset of student that is following a

different learning trajectory and see how it can be better

supported.

•!An instructional designer can look for difficult topics,

and see if there are frequent visits to previous topics for

the difficult topics. If a lot of the students are revisiting

similar previous topics, the instructor can find out by

talking with them whether reviewing previously seen

content was helpful for the students to understand the

difficult topic.

•!A data scientist can see if there are any topics where

students might be applying different learning strategies

such as spaced practice, mass practice, revisiting

specific topics after a difficult topic, and whether there

is enough variation in the data to model successful

student learning strategies.

Curriculum Pacing visualization can be used in many different

ways and can act as a starting point of further inquiry in student

learning.

4.!REFERENCES

[1]!Romero, C., Ventura, S., Pechenizkiy, M. and Baker, R.S.

eds., 2010. Handbook of educational data mining. CRC

press.

[2]!Patel, N., Sharma, A., Sellman, C. and Lomas, D., 2018,

June. Curriculum Pacing: A New Approach to Discover

Instructional Practices in Classrooms. In International

Conference on Intelligent Tutoring Systems (pp. 345-351).

Springer, Cham.

[3]!Patel, N., Sellman, C., Lomas, D., 2017, July. Mining

frequent learning pathways from a large educational dataset.

In Proceedings of the 3rd International Workshop on Graph

Educational Data Mining (pp. 27–30).

[4]!Anscombe, F. J., 1973. Graphs in Statistical Analysis.

American Statistician 27 (1) (pp. 17–21).

[5]!PSLC DataShop Documentation.

https://pslcdatashop.web.cmu.edu/help?page=export