Content uploaded by Warren E. Lacefield
Author content
All content in this area was uploaded by Warren E. Lacefield on Apr 27, 2018
Content may be subject to copyright.
Content uploaded by Warren E. Lacefield
Author content
All content in this area was uploaded by Warren E. Lacefield on Apr 27, 2018
Content may be subject to copyright.
Data Visualization in Public Education:
Longitudinal Student-, Intervention-, School-,
and District-Level Performance Modeling
Warren E. Lacefield, Ph.D.
Academic Software, Inc., Lexington, KY
E. Brooks Applegate, Ph.D.
Western Michigan University, Kalamazoo, MI
Copyright ASI © 2018 All rights reserved
2018 ANNUAL MEETING OF THE
AMERICAN EDUCATION RESEARCH ASSOCIATION:
The Dreams, Possibilities, and Necessity of Public Education
New York, NY, USA
April 13-17, 2018
Lacefield & Applegate AERA 2018
2
Data Visualization in Public Education:
Longitudinal Student-, Intervention-, School-,
and District-Level Performance Modeling
ABSTRACT
Accountability seems forever engrained into the K-12 environment, as
has been the expectation of delivering quality education to school
aged children and adolescents. Yet, repeated failure of this
expectation has focused the public’s and policy maker’s attention on
the limitations of major accountability systems. This paper explores
applications of machine learning, predictive analytics, and data
visualization to student information available to educational decision
makers. In particular, we demonstrate how to use individual academic
performance histories to identify “at-risk” students in real time for
advising, academic coaching, and other support services and how to
aggregate longitudinal data at the school or district level for system
modeling, profiling, comparison, and intervention evaluation.
OBJECTIVE
This research demonstrates how predictive analytics applied to school student
information system (SIS) records can be used to (1) greatly benefit student advising and
activities such as academic coaching/mentoring for at-risk students, (2) assess and
evaluate the impact of newly introduced educational interventions, and (3) provide tools
for longitudinal assessment and evaluation of schools and/or school districts.
THEORETICAL PERSPECTIVE
Better decision making is desired at all levels of public and private education and
district/school accountability and student information systems (SIS) are being developed
and deployed rapidly throughout K-12 and higher education to meet this need (Salpeter,
2004; Bowers, 2017). Sometimes simple, sometimes elaborate, SIS platforms are notable
in their data capture capability but often limited in their predictive analytic capability.
Moreover, data from SIS records – often most easily available only for the current year -
coupled with specific and local reporting needs of building administrators discourages SIS
developers from providing anything more than rudimentary data analytics. This is offset
by the growing number of educational predictive analytic companies and services.
Effective, informed decision making requires timely information based on data. That data
must be systematically collected, organized, and reduced around a problem before it can
have potential impact. Even a SIS platform that provides systematic data capture and
organization is of limited utility in the absence of analytical (data reduction) systems.
Lacefield & Applegate AERA 2018
3
Visual analytics are sought for their ability to clarify both problems and solutions. When
coupled with extensive data, these tools can be very powerful mechanisms for isolating
problems as well as showing solution paths. Through predictive analytics and innovative
data visualization, longitudinal SIS data can be leveraged to facilitate teacher-, school-,
and district-level decision-making (Van Kannel-Ray, et.al., 2008, 2009; Lacefield, et.al.,
2010, 2011; Applegate, et.al., 2012).
METHODOLOGY
Participants, Data Sources, and Procedures
Our research and evaluation team at a large regional state university worked with other
staff to conduct two major, 6-year overlapping, school improvement projects involving
partner schools and districts in southwestern Michigan, northern Ohio, and Illinois from
2002 through 2012. Throughout this time period, we had SIS data sharing agreements,
approved by the respective IRBs for annual reporting, school staff decision-making, and
assistance and intervention evaluation.
SIS information was gathered at the beginning, middle, and end of each school year,
including academic, attendance, behavioral, and demographic data; then collated and
entered into MS Access databases eventually containing historical records for more than
26,000 students over the 10-year period, from 6th through 12th grade. The projects
provided data analysis and information to collaborating school system staff as well as a
variety of other student and school-level support and enhancement functions through a
network of project site coordinators. Following the completion of the projects, the
databases were de-identified for future research, demonstration, and dissemination
purposes.
Harvested SIS data included the following constructs/variables: academic grade
information (every grade datum from every teacher in every course every marking period);
attendance data (summarized by marking period); behavioral data (also summarized by
marking period), and student demographic data. Due to participating schools operating
with substantially different policies related to behavior and attendance, we are only
presenting findings related to achievement measures. Moreover, participating schools
organized the school year differently, e.g., 4 or 6 marking periods, quarters, 2 or 3
semesters, etc. These patterns were frequently changed over short intervals of only two
or three years.
Academic Grade Data. Our operational definition is that grades are crude evaluations of
the degree to which teacher expectations are being met in typical but highly variable
performance contexts. Pooled, averaged, and smoothed over time, they provide a basis
by which people make decisions (parents, teachers, administrators, admissions officers,
employers, etc.). Therefore, they are important within the public school system and
beyond.
Lacefield & Applegate AERA 2018
4
Course Content and Subject Area. The complete course catalogue for each participating
school was simplified by aggregating courses into broad content areas for Mathematics,
Science, Social Studies, and Language Arts. For example, high school mathematics
classes such as Algebra I, II, Geometry, Calculus I, Business Math, etc. were classified
as “Mathematics.” Further aggregation levels included All-Core or Non-Core or All-
Coursework. Course classification was tracked and reconsidered annually as schools
added and retired courses through normal curriculum evaluation processes.
Student Performance Trajectories. Due to different schools operating under different
reporting time-lines and irregularities such as repeated grades, student mobility (in- or
out-bound transfers), and student attrition (drop-outs), we developed techniques to “fit”
everyone and every school situation into a common historical timeline modelled on the
typical grade level system. This timeline begins at 6.0, the entry point into middle school.
It extends to 9.0, the entry point into high school and beyond to 13.0, the point of
graduation and entry into post-secondary experiences.
DATA ANALYSIS
Data Smoothing
Grade data is extremely noisy and arrives at many different marking period points on a
timeline. There are many techniques for data smoothing, one we illustrate here, e.g.,
Bezier curves (Bourke, 1996), as well as different statistical estimation and visualization
techniques. Bezier curve trajectories stay in-between the data points in a sorted time-
series. They also pass through the first and the last data point. Between the end points,
the curves are continuous and can be differentiated as well as evaluated at any
intermediate point. Thus, Bezier curves smooth large data point fluctuations and improve
the visibility of the patterns unfolding. They also meet a key requirement which is the
ability to estimate student performance in-between marking period grading points,
however many there are or when they occur.
Machine Classification of Student Performance
To determine whether students were experiencing academic difficulty, we needed
techniques to identify at-risk students. Typically, this is accomplished by word-of-mouth
and staff recommendations. Prior research, however, suggests these communications
show weak linkages to empirical measures (Lacefield, et.al., 2012). We trained and
evaluated several machine learning algorithms (e.g., a [24,12,6,4] node, back-
propagation neural network and a 24 feature, 4 class, 100 tree, random decision forest
using the ALGLIB statistical library (Bochkanov, 2017) with N=14,617 student marking
period grade histories from several school districts. These histories were fitted by smooth
Bezier curves evaluated at 24 equally spaced points. These student performance
trajectories were then hand-classified as: (1) Successful, (2) At-Risk Falling, (3) At-Risk
Rising, or (4) At-Risk Failing. Once trained by supervised learning and validated at 98%
accuracy, either solution can be used to classify a student performance trajectory from
any starting point to any ending point in that student’s grade history (Table 1).
Lacefield & Applegate AERA 2018
5
RESULTS
Our objective here is to demonstrate how predictive analytics and innovative data
visualization techniques, when combined with longitudinal SIS data, can produce
powerful decision-making aids for educators and school administrators. Here we focus
on applications rather than methodologies. Our examples include: (1) empirical data
dashboards for academic advising and support services, (2) use of cohort analysis and
longitudinal methods for evaluating educational interventions, and (3) whole school or
district longitudinal performance visualizations.
Student Academic Performance Dashboards
Dashboards showing student grade histories and status classification points can be useful
visualizations for identifying students who appear to be doing well or to be at-risk in
particular or general course content areas. Student progress in the past and during the
current school year can provide teachers, advisors, and academic coaches with empirical
information for action-oriented decision making and timely educational intervention at the
individual level.
We show several different student dashboards in Figure 1, while also demonstrating the
use of Bezier curves to smooth raw grading data into academic performance trajectories
in specific as well as aggregated course subject areas. In addition, we show how a school
can group and visualize the academic histories of an incoming grade-level cohort of
students – e.g., the 12th grade incoming class at a large urban high school. In so doing,
educators can identify students who individually might benefit by receiving extra support
services in their final year to avoid failure and graduate successfully (Lacefield, et.al.,
2012)
Cohort Analysis for Evaluating Educational Interventions
In Figure 2 we borrow from our previous research (Zeller, et.al., 2013) to demonstrate
how SIS data and predictive analytics can be used to evaluate school interventions in a
longitudinal cohort design. A graduation coaching intervention for students identified as
At-Risk entering 9th grade was implemented in a rural school district in 2010. All core
course performance histories through 8th grade for all incoming students were examined,
leading to the identification of N=30 students who appeared At-Risk (treatment group)
entering 9th grade. Those students participated in the graduation coaching intervention.
We similarly analyzed data from three earlier student cohorts entering 9th grade in 2007,
2008, and 2009, identifying N=127 students also classified as at-risk but did not receive
graduation coaching (control group). Subsequent outcome data for the two At-Risk
student groups were compared using a longitudinal statistical model. Means plots and
significant statistical results comparing academic performance trajectories clearly showed
the benefit of graduation coaching for At-Risk student groups.
Lacefield & Applegate AERA 2018
6
Longitudinal Multiple-Cohort School and School District Performance Visualizations
Having 10 years of SIS data for all students from point of entry into a district to point of
graduation or disappearance and being able to classify students’ academic performances
in content areas at any time point during their in-district schooling experiences allows a
cohort visualization model of the district over time. In our data, most students who were
enrolled in the 8th grade in the years 2004 through 2008 had performance trajectories
beginning in district middle schools and ending upon graduation or departure from district
high schools.
In Figure 3 we show how SIS data can be used to model and profile individual schools
and/or whole school districts in terms of student academic performance from admittance
to graduation or departure across multi-cohorts and multi-year time windows. We show
two different districts (large urban and small rural) in terms of how middle school students
with different risk profiles go on to perform in their high schools.
We also can visually explore hypotheses such as “To what degree is middle school
success in Language Arts pre-requisite for success in high school Science subjects?”
Figure 3-C shows a definite association (also noticed in other STEM subject areas).
Indicators of Stationary School Climate for Student Academic Performance
For each student at every point from entry into a district to graduation or disappearance,
that student’s particular performance as measured by class grades and teacher
judgments occurs in a socio-educational context comprised, among other factors, of
fellow students’ performances at those time points. If student status can be estimated by
point classifications and those classifications expected to have predictive validity in the
absence of intervening factors (such as deliberate interventions), this “static” curricular
context needs to be taken into account. A step in that direction can be seen in the “status
heat maps” that characterize school districts shown in Figure 4.
For example, in Figure 4-A, a large urban district had several middle schools feeding a
central high school. Data from 5 student cohorts (2004-2008) is presented. The graph
shows a “heatmap” describing the environment or “context” students found themselves
within at each time point from middle school entry to high school graduation.
Each vertical “bin” represents an average GPA range in all core courses. The height of
each bin reflects the number of students falling into that bin at that time point in their
curriculum. The color reflects a weighted combination of the current Status classifications
of students in each bin based on their data prior to and including that Status Estimation
Point.
In 6th and 7th grade, students with GPAs above 2.0 generally were classified Successful.
Students with lower GPAs were mostly Falling or beginning to Fail.
In 8th and 9th grade, some At-Risk students were consistently Failing. However, others
with somewhat higher current GPAs were Rising.
Lacefield & Applegate AERA 2018
7
10th and 11th grade was a time when many At-Risk students left the school district, while
those that stayed were experiencing Falling grades.
By 12th grade, most students were being classified as Successful or Rising.
Figures 4-C and 4-D show that the longitudinal classroom climate differed substantially
for students who entered high school classified as Successful compared with those who
were classified as At-Risk.
A second example, shown in Figure 4-B, depicts a small rural district where students
experience a somewhat different classroom climate. In middle school, many students
initially At-Risk were Rising. Transition into high school appears more difficult for many
students who began to Fall, some to the point of Failing. Still, a much higher proportion
of these students remained in school and most were either Successful or Rising at the
point of graduation.
The degree to which such stationary maps (e.g., 5 years of full 8th grade student cohorts
passing though the 6th-12th grade curriculum) differ by student academic performance
status, by school or school district, and by course content area is surprising and a
promising subject for future research. These maps reflect the contexts in which
expectations for students are formed.
On the other hand, it would not be difficult to use a cohort GLM model and examine
statistically whether and/or how a district school climate might be changing over time.
One also, of course, could statistically compare schools or whole districts cross-
sectionally and longitudinally if desired. Few school agencies that assess school and
district “performance” have the tools to do that sort of thing at present.
SCIENTIFIC OR SCHOLARLY SIGNIFICANCE
In the past it has not been feasible for schools and school districts to gather, analyze, and
visualize student record information the way we have described in this paper. Few if any
of the SIS platforms we are familiar with provide predictive analytics help to school
administrators beyond basic data collection and standard report generating functions.
School and district level performance visualizations are rudimentary; typically, cross-
sectional views, perhaps disaggregated by categorical variables. Thus, individualized
data-driven decision-making is actually quite rare.
Today, tools and methodologies are available to harvest the rich information gathered in
a SIS. Furthermore, in more and more places, school SIS data is being uploaded and
aggregated at the district, regional, or state-level (CEPI, 2010). Resources and data
analytics are available for improving our understanding of student learning outcomes,
evaluating the effectiveness of educational interventions, and modeling school and school
district performance for quality control and system change purposes.
Lacefield & Applegate AERA 2018
8
REFERENCES
Bochkanov, S. (2017). ALGLIB [Computer software] (Version 3.11 for CSharp). Nizhny
Novgorod, Russian Federation: ALGIL Project. Retrieved from http://www.alglib.net.
Bowers, A.J. (2017). Quantitative research methods training in education leadership and
administration preparation programs as disciplined inquiry for building school improvement
capacity. Journal of Research on Leadership and Leadership Education, 12(1), 72-96. DOI:
https://doi.org/10.1177/1942775116659462
Bourke, P. (1996). Bezier curves. Retrieved July 24, 2017, from
http://paulbourke.net/geometry/bezier.
CEPI Michigan Student Data System (2010). State of Michigan, Center for Educational
Performance and Information. http://www.michigan.gov/cepi
Van Kannel-Ray, N., Lacefield, W.E., & Zeller, P.J. (2008). Academic case managers:
Evaluating a middle school intervention for children at-risk. Journal of Multi-Disciplinary
Evaluation, 5(10), 21-29.
Van Kannel-Ray, N., Zeller, P.J., & Lacefield, W.E. (2009). Academic case management:
Promising interventions for closing achievement gaps in multicultural urban settings.
ERS Spectrum, 27(3), 19-30.
Lacefield, W.E., Zeller, P.J., & Van Kannel-Ray, N. (May, 2010). Graduation coaching in high-
need urban high schools. AERA, Denver, CO, ERIC ED509289.
Lacefield, W.E., Applegate, E.B., Zeller, P.J., & Van Kannel-Ray, N. (April, 2011). Data driven
identification and selection algorithms for at-risk students likely to benefit from high school
academic support services. AERA, New Orleans. ERIC ED518121.
Lacefield, WE & Applegate, EB (Nov, 2011). Modeling and visualizing student performance
data: Academics and behaviors. Multi-paper session: Data feedback loops in educational:
Intervention process and student progress monitoring data visualization tools and procedures.
AEA, Anaheim, CA.
Applegate, EB & Lacefield, WE (Nov, 2011) Analytical basis for modeling of student
performance data: Validity, automation, updating, and interactive evaluation processes.
Multi-paper session: Data feedback loops in educational: Intervention process and student
progress monitoring data visualization tools and procedures. AEA, Anaheim, CA.
Zeller, P.J. Carpenter, S., Lacefield, W.E., & Applegate, E.B. (2013) Graduation coaching in a
rural district school. International Journal for Leadership in Learning, 1[1].
Lacefield, W.E. & Applegate, E.B. (April, 2012). Tracking students' academic progress in data
rich but analytically poor environments. AERA, Vancouver, BC, Canada.
Salpeter, J. (2004, March 15). Data: Mining with a mission. Retrieved July 24, 2017, from
http://uhaweb.hartford.edu/schatz/D3M/Great_Data__Mining_with_a_Mission.pdf
Lacefield & Applegate AERA 2018
9
FIGURES AND TABLES
Table 1: Machine Learning Models and Training Statistics
Feature variables were 24 equally spaced time points, including starting and ending
points, representing a Bezier curve academic performance trajectory. The classification
or Label variable, Status (i.e., the shape of each trajectory), was coded: (1) Successful,
(2) At-Risk Falling, (3) At-Risk Rising, or (4) At-Risk Failing. Several real and Monte Carlo
datasets with 14,617 and 10,640 labeled cases respectively were examined, using
multiple models and methods, with 60% of the data for training and 40% for model mini-
batch validation and batch testing.
Models
Methods
• Logistic Regr.
• Discrim. Analysis
• KNN
• Decision Trees
• SVM
• 24:12:6:4 NN+BP
• TensorFlow
• KERAS-SciKit
• MS CNTK
• ALGLIB C#
MS CNTK for C# Mini-Batch
Training/Validation (98%)
Lacefield & Applegate AERA 2018
10
Figure 1: Example Student Dashboards
Figure 1-A shows a student classified as “Successful” in all core courses entering the 8th
grade in middle school and continuing to succeed through the 8th, 9th, and 10th grades.
Lacefield & Applegate AERA 2018
11
Figure 1-B shows a student classified as “At-Risk Falling” in all core courses entering the
8th grade in middle school. This student had been doing well in Language Arts courses
and Social Studies but not in Math or Science courses. Her future did not turn out well in
high school, at least up to the middle of 10th grade.
Lacefield & Applegate AERA 2018
12
Figure 1-C shows a third student who was classified as “Successful” in all core courses
entering the 8th grade in middle school. This student went on to do very well in 8th grade
but eventually encountered difficulties in high school, particularly in Math courses.
Lacefield & Applegate AERA 2018
13
Figure 1-D shows a summary by status classification in all core courses for the entire
incoming cohort entering the 12th grade in a large urban high school in 2011 (Lacefield,
W.E., et.al., 2012). According to the source:
“… patterns for the incoming 12th grade students, almost all of whom will shortly graduate and enter the
competitive job market or post-secondary education world. … Successful students continue to succeed. At-
Risk: Falling 12th grade students really have not fallen very far. Some students really have begun to fail
almost entirely in the past year. But the Rising students appear very similar to the newly Failing students,
until 11th grade when something happened in their lives to turn that performance around. What was that?
Why did only a portion of the students who were failing do that? How can this newly won success be
sustained until graduation and beyond? These are fundamental questions for individualizing pedagogy.”
Lacefield & Applegate AERA 2018
14
Figure 1-E shows the same data re-analyzed using Bezier curve smoothing rather than
raw student data.
Lacefield & Applegate AERA 2018
15
Figure 2: Using Student Information System Data and Predictive Analytics to
Evaluate School Interventions (Zeller, P.J., et.al., 2012).
Using Bezier Curve Smoothing
Baseline Un-Coached At-Risk Students Graduation Coached At-Risk Students
Lacefield & Applegate AERA 2018
16
Figure 3: Individual Schools and Entire School Districts Can Be Modelled and
Profiled Using SIS Data.
Figure 3-A is an example of a large urban school district with several middle schools
feeding a high school. Students in 5 consecutive cohorts are grouped by their status
classification as would be known at the point of 9th grade entry. Each group is then
followed through their futures in high school in terms of all-core course performance and,
for example, math course performance. These school districts experienced an attrition
rate (due somewhat to transfer out but mostly to drop-out) in high school above 50%.
All Core Courses Math Courses
Figure 3-B is an example of 5 consecutive student cohorts in a small rural school district
with one middle school feeding a high school.
All Core Courses Math Courses
Lacefield & Applegate AERA 2018
17
Figure 3-C visually explores the relationship between middle school success in Language
Arts and success in high school Science subjects, again using 5 consecutive student
cohorts. Few students did well in Science in middle school while also being at-risk in
Language Arts, but those students (who remained in school) did far less well in science
courses in high school than their peers with adequate preparation in language arts skills
in middle school. (This pattern appears in other STEM subjects as well.)
Students
Successful
In Middle
School
Language
Arts
Students
At-Risk
In Middle
School
Language
Arts
Lacefield & Applegate AERA 2018
18
Figure 4: The Context for Student Academic Performance in Public Schools
Figure 4-A is an example of a large urban school district with several middle schools
feeding a high school. Five consecutive student cohorts are grouped into bins by their
moment-to-moment academic performance as measured by estimated all-core GPA
following their trajectories up to each moment. Their status classification as would then
be known at each moment as typical futures unwind is reflected by the bin colors.
Lacefield & Applegate AERA 2018
19
Figure 4-B is a similar example of 5 consecutive student cohorts in a small rural school
district with one middle school feeding a high school.
Lacefield & Applegate AERA 2018
20
Figures 4-C and 4-D return to the example of a large urban school district with several
middle schools feeding a high school. Here in 4-C, among the 2122 students in the 5-
cohort sample, we see the how the contextual pattern looked in the past in middle school
and in the future in high school for a subset of 1104 students who were classified as
“Successful” at the point of high school entry. And we also can see the shape of that
pattern for the other 1018 students who were classified as “At-Risk” in Core courses at
entry into high school. These maps reflect the context in which expectations are formed.
Figure 4-C:
Successful
Students
Figure 4-D:
At-Risk
Students